2022-03-31 05:03:31

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

The ath9k driver was not properly clearing the status area in the
ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
it was manually filling in fields, which meant that fields introduced later
were left as-is.

Conveniently, mac80211 actually provides a helper to zero out the status
area, so use that to make sure we zero everything.

The last commit touching the driver function writing the status information
seems to have actually been fixing an issue that was also caused by the
area being uninitialised; but it only added clearing of a single field
instead of the whole struct. That is now redundant, though, so revert that
commit and use it as a convenient Fixes tag.

Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
Reported-by: Bagas Sanjaya <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index d0caf1de2bde..cbcf96ac303e 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
struct ath_hw *ah = sc->sc_ah;
u8 i, tx_rateindex;

+ ieee80211_tx_info_clear_status(tx_info);
+
if (txok)
tx_info->status.ack_signal = ts->ts_rssi;

@@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
}

tx_info->status.rates[tx_rateindex].count = ts->ts_longretry + 1;
-
- /* we report airtime in ath_tx_count_airtime(), don't report twice */
- tx_info->status.tx_time = 0;
}

static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)
--
2.35.1


2022-03-31 05:35:58

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

On 30/03/22 23.44, Toke Høiland-Jørgensen wrote:
> The ath9k driver was not properly clearing the status area in the
> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> it was manually filling in fields, which meant that fields introduced later
> were left as-is.
>
> Conveniently, mac80211 actually provides a helper to zero out the status
> area, so use that to make sure we zero everything.
>
> The last commit touching the driver function writing the status information
> seems to have actually been fixing an issue that was also caused by the
> area being uninitialised; but it only added clearing of a single field
> instead of the whole struct. That is now redundant, though, so revert that
> commit and use it as a convenient Fixes tag.
>
> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> Reported-by: Bagas Sanjaya <[email protected]>
> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>

No regressions and UBSAN warning [1] reported on dmesg.

Tested-by: Bagas Sanjaya <[email protected]>

However, there is something missing. I don't see Cc: [email protected]
trailer in this patch. I think it should, because I reported that this issue
first occurred on v5.17, then still appeared on v5.17.1.

--
An old man doll... just what I always wanted! - Clara

2022-03-31 05:45:37

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

On 31/03/22 12.06, Bagas Sanjaya wrote:
> No regressions and UBSAN warning [1] reported on dmesg.
>
> Tested-by: Bagas Sanjaya <[email protected]>
>
> However, there is something missing. I don't see Cc: [email protected]
> trailer in this patch. I think it should, because I reported that this issue
> first occurred on v5.17, then still appeared on v5.17.1.
>

Oops, missing link for [1].

[1]: https://lore.kernel.org/linux-wireless/[email protected]/T/#mc2ef5bbdf51fd4ead05b1115cedd58153675365c

--
An old man doll... just what I always wanted! - Clara

2022-03-31 05:48:46

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Bagas Sanjaya <[email protected]> writes:

> On 30/03/22 23.44, Toke Høiland-Jørgensen wrote:
>> The ath9k driver was not properly clearing the status area in the
>> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> it was manually filling in fields, which meant that fields introduced later
>> were left as-is.
>>
>> Conveniently, mac80211 actually provides a helper to zero out the status
>> area, so use that to make sure we zero everything.
>>
>> The last commit touching the driver function writing the status information
>> seems to have actually been fixing an issue that was also caused by the
>> area being uninitialised; but it only added clearing of a single field
>> instead of the whole struct. That is now redundant, though, so revert that
>> commit and use it as a convenient Fixes tag.
>>
>> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> Reported-by: Bagas Sanjaya <[email protected]>
>> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
>
> No regressions and UBSAN warning [1] reported on dmesg.
>
> Tested-by: Bagas Sanjaya <[email protected]>
>
> However, there is something missing. I don't see Cc: [email protected]
> trailer in this patch. I think it should, because I reported that this issue
> first occurred on v5.17, then still appeared on v5.17.1.

I can add that during commit.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2022-03-31 08:49:47

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Kalle Valo <[email protected]> writes:

> Bagas Sanjaya <[email protected]> writes:
>
>> On 30/03/22 23.44, Toke Høiland-Jørgensen wrote:
>>> The ath9k driver was not properly clearing the status area in the
>>> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>>> it was manually filling in fields, which meant that fields introduced later
>>> were left as-is.
>>>
>>> Conveniently, mac80211 actually provides a helper to zero out the status
>>> area, so use that to make sure we zero everything.
>>>
>>> The last commit touching the driver function writing the status information
>>> seems to have actually been fixing an issue that was also caused by the
>>> area being uninitialised; but it only added clearing of a single field
>>> instead of the whole struct. That is now redundant, though, so revert that
>>> commit and use it as a convenient Fixes tag.
>>>
>>> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>>> Reported-by: Bagas Sanjaya <[email protected]>
>>> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
>>
>> No regressions and UBSAN warning [1] reported on dmesg.
>>
>> Tested-by: Bagas Sanjaya <[email protected]>
>>
>> However, there is something missing. I don't see Cc: [email protected]
>> trailer in this patch. I think it should, because I reported that this issue
>> first occurred on v5.17, then still appeared on v5.17.1.
>
> I can add that during commit.

Thanks! And sorry about that, I have gotten so used to the netdev policy
of not including an explicit stable Cc that I totally forgot that this
doesn't apply to the wireless tree.

In any case I think the stable tree autoselection bit should pick it up
from the Fixes tag, though...

-Toke

2022-03-31 11:59:15

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Toke Høiland-Jørgensen <[email protected]> writes:

> Kalle Valo <[email protected]> writes:
>
>> Bagas Sanjaya <[email protected]> writes:
>>
>>> On 30/03/22 23.44, Toke Høiland-Jørgensen wrote:
>>>> The ath9k driver was not properly clearing the status area in the
>>>> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>>>> it was manually filling in fields, which meant that fields introduced later
>>>> were left as-is.
>>>>
>>>> Conveniently, mac80211 actually provides a helper to zero out the status
>>>> area, so use that to make sure we zero everything.
>>>>
>>>> The last commit touching the driver function writing the status information
>>>> seems to have actually been fixing an issue that was also caused by the
>>>> area being uninitialised; but it only added clearing of a single field
>>>> instead of the whole struct. That is now redundant, though, so revert that
>>>> commit and use it as a convenient Fixes tag.
>>>>
>>>> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>>>> Reported-by: Bagas Sanjaya <[email protected]>
>>>> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
>>>
>>> No regressions and UBSAN warning [1] reported on dmesg.
>>>
>>> Tested-by: Bagas Sanjaya <[email protected]>
>>>
>>> However, there is something missing. I don't see Cc: [email protected]
>>> trailer in this patch. I think it should, because I reported that this issue
>>> first occurred on v5.17, then still appeared on v5.17.1.
>>
>> I can add that during commit.
>
> Thanks! And sorry about that, I have gotten so used to the netdev policy
> of not including an explicit stable Cc that I totally forgot that this
> doesn't apply to the wireless tree.

No worries!

> In any case I think the stable tree autoselection bit should pick it up
> from the Fixes tag, though...

Yeah, I agree. I also only add the Fixes tag to my patches, and let the
stable team to choose if they want the commit or not. But if people ask
to add cc stable I'll do that, it's easy enough as I have the edit
command in my patchwork script :)

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2022-04-01 14:29:46

by Kalle Valo

[permalink] [raw]
Subject: Re: [v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Toke Høiland-Jørgensen <[email protected]> wrote:

> The ath9k driver was not properly clearing the status area in the
> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> it was manually filling in fields, which meant that fields introduced later
> were left as-is.
>
> Conveniently, mac80211 actually provides a helper to zero out the status
> area, so use that to make sure we zero everything.
>
> The last commit touching the driver function writing the status information
> seems to have actually been fixing an issue that was also caused by the
> area being uninitialised; but it only added clearing of a single field
> instead of the whole struct. That is now redundant, though, so revert that
> commit and use it as a convenient Fixes tag.
>
> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> Reported-by: Bagas Sanjaya <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
> Tested-by: Bagas Sanjaya <[email protected]>

Patch applied to wireless.git, thanks.

037250f0a45c ath9k: Properly clear TX status area before reporting to mac80211

--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2022-04-03 12:27:33

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Peter Seiderer <[email protected]> writes:

> Hello Toke,
>
> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:
>
>> The ath9k driver was not properly clearing the status area in the
>> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> it was manually filling in fields, which meant that fields introduced later
>> were left as-is.
>>
>> Conveniently, mac80211 actually provides a helper to zero out the status
>> area, so use that to make sure we zero everything.
>>
>> The last commit touching the driver function writing the status information
>> seems to have actually been fixing an issue that was also caused by the
>> area being uninitialised; but it only added clearing of a single field
>> instead of the whole struct. That is now redundant, though, so revert that
>> commit and use it as a convenient Fixes tag.
>>
>> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> Reported-by: Bagas Sanjaya <[email protected]>
>> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
>> ---
>> drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> index d0caf1de2bde..cbcf96ac303e 100644
>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> struct ath_hw *ah = sc->sc_ah;
>> u8 i, tx_rateindex;
>>
>> + ieee80211_tx_info_clear_status(tx_info);
>> +
>
> As this also clears the status.rates[].count, see include/net/mac80211.h:
>
> 1195 static inline void
> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
> 1197 {
> 1198 int i;
> 1199
> 1200 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 1201 offsetof(struct ieee80211_tx_info, control.rates));
> 1202 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 1203 offsetof(struct ieee80211_tx_info, driver_rates));
> 1204 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8) ;
> 1205 /* clear the rate counts */
> 1206 for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> 1207 info->status.rates[i].count = 0;
> 1208 memset_after(&info->status, 0, rates);
> 1209 }
>
> I would have expected some lines added to restore the count (for the
> rates with index < tx_rateindex), e.g. as done in
> drivers/net/wireless/ath/ath5k/base.c:
>
> 1731 ieee80211_tx_info_clear_status(info);
> 1732
> 1733 for (i = 0; i < ts->ts_final_idx; i++) {
> 1734 struct ieee80211_tx_rate *r =
> 1735 &info->status.rates[i];
> 1736
> 1737 r->count = tries[i];
> 1738 }
>
> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
> the tx_rateindex index (which is often zero in case the first suggested rate
> succeeds, but in noisy environment is sometimes > 0)...

Ah, you're right! I looked at that code, and somehow parsed that as "OK,
it's setting all the rate counts, we're fine". But obviously that's not
what that code is doing, so we'll lose some information now. Bugger :(

I'll send a follow-up, thanks for flagging this!

-Toke

2022-04-03 12:45:57

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Peter Seiderer <[email protected]> writes:

> Hello Toke,
>
> On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <[email protected]> wrote:
>
>> Hello Toke,
>>
>> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:
>>
>> > The ath9k driver was not properly clearing the status area in the
>> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> > it was manually filling in fields, which meant that fields introduced later
>> > were left as-is.
>> >
>> > Conveniently, mac80211 actually provides a helper to zero out the status
>> > area, so use that to make sure we zero everything.
>> >
>> > The last commit touching the driver function writing the status information
>> > seems to have actually been fixing an issue that was also caused by the
>> > area being uninitialised; but it only added clearing of a single field
>> > instead of the whole struct. That is now redundant, though, so revert that
>> > commit and use it as a convenient Fixes tag.
>> >
>> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> > Reported-by: Bagas Sanjaya <[email protected]>
>> > Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
>> > ---
>> > drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
>> > 1 file changed, 2 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> > index d0caf1de2bde..cbcf96ac303e 100644
>> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > struct ath_hw *ah = sc->sc_ah;
>> > u8 i, tx_rateindex;
>> >
>> > + ieee80211_tx_info_clear_status(tx_info);
>> > +
>>
>> As this also clears the status.rates[].count, see include/net/mac80211.h:
>>
>> 1195 static inline void
>> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
>> 1197 {
>> 1198 int i;
>> 1199
>> 1200 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> 1201 offsetof(struct ieee80211_tx_info, control.rates));
>> 1202 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> 1203 offsetof(struct ieee80211_tx_info, driver_rates));
>> 1204 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8) ;
>> 1205 /* clear the rate counts */
>> 1206 for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>> 1207 info->status.rates[i].count = 0;
>> 1208 memset_after(&info->status, 0, rates);
>> 1209 }
>>
>> I would have expected some lines added to restore the count (for the
>> rates with index < tx_rateindex), e.g. as done in
>> drivers/net/wireless/ath/ath5k/base.c:
>>
>> 1731 ieee80211_tx_info_clear_status(info);
>> 1732
>> 1733 for (i = 0; i < ts->ts_final_idx; i++) {
>> 1734 struct ieee80211_tx_rate *r =
>> 1735 &info->status.rates[i];
>> 1736
>> 1737 r->count = tries[i];
>> 1738 }
>>
>> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
>> the tx_rateindex index (which is often zero in case the first suggested rate
>> succeeds, but in noisy environment is sometimes > 0)...
>>
>> > if (txok)
>> > tx_info->status.ack_signal = ts->ts_rssi;
>> >
>> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > }
>> >
>> > tx_info->status.rates[)].count = ts->ts_longretry + 1;
>> > -
>> > - /* we report airtime in ath_tx_count_airtime(), don't report twice */
>> > - tx_info->status.tx_time = 0;
>> > }
>> >
>> > static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)
>>
>>
>> And from drivers/net/wireless/ath/ath9k/xmit.c:
>>
>> 2592 for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
>> 2593 tx_info->status.rates[i].count = 0;
>> 2594 tx_info->status.rates[i].idx = -1;
>> 2595 }
>>
>> Line 2593 can be deleted as status.rates[].count is already zeroed through the
>> ieee80211_tx_info_clear_status() call...
>>
>> And it should be sufficient to do:
>>
>> if (tx_rateindex + 1 < hw->max_rates)
>> tx_info->status.rates[tx_rateindex + 1].idx = -1;
>>
>> Regards,
>> Peter
>
> And one additional problem found with your patch applied (to 5.16.18), the log
> get spammed by:
>
> [ 445.489139] ------------[ cut here ]------------
> [ 445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
> [ 445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
> [ 445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G W 5.16.18+ #2
> [ 445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
> [ 445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
> [ 445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
> [ 445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
> [ 445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
> [ 445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
> [ 445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
> [ 445.489362] Call Trace:
> [ 445.489366] <SOFTIRQ>
> [ 445.489372] ath_tx_complete_buf+0x100/0x130 [ath9k]
> [ 445.489390] ath_tx_process_buffer+0x187/0xb40 [ath9k]
> [ 445.489407] ? resched_curr+0x1d/0xc0
> [ 445.489419] ? check_preempt_wakeup+0x115/0x250
> [ 445.489429] ? task_fork_fair+0xc0/0x170
> [ 445.489436] ? put_prev_task_fair+0x40/0x40
> [ 445.489444] ? check_preempt_curr+0x62/0x70
> [ 445.489452] ? __local_bh_enable_ip+0x33/0x80
> [ 445.489460] ? _raw_spin_unlock_bh+0x13/0x20
> [ 445.489468] ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
> [ 445.489485] ? _raw_spin_unlock_irqrestore+0x16/0x30
> [ 445.489492] ? try_to_wake_up+0x7c/0x550
> [ 445.489499] ath_tx_tasklet+0x1de/0x2e0 [ath9k]
> [ 445.489519] ath9k_tasklet+0x22f/0x330 [ath9k]
> [ 445.489535] tasklet_action_common.constprop.0+0x89/0xb0
> [ 445.489544] tasklet_action+0x21/0x30
> [ 445.489551] __do_softirq+0xc5/0x28b
> [ 445.489559] ? __entry_text_end+0x4/0x4
> [ 445.489567] call_on_stack+0x40/0x50
> [ 445.489576] </SOFTIRQ>
> [ 445.489580] ? irq_exit_rcu+0x92/0x100
> [ 445.489587] ? common_interrupt+0x27/0x40
> [ 445.489595] ? asm_common_interrupt+0x102/0x140
> [ 445.489605] ---[ end trace 5c176b666255bca1 ]---
> [ 445.607206] ------------[ cut here ]------------
>
>
> Could be fixed by the following additional change:
>
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> struct ieee80211_hw *hw = sc->hw;
> struct ath_hw *ah = sc->sc_ah;
> u8 i, tx_rateindex;
> + bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
>
> ieee80211_tx_info_clear_status(tx_info);
>
> + tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
> +
> if (txok)

That doesn't seem right, zeroing is_valid_ack_signal was the whole point
of the first patch... Something seems off in that backtrace; could you
try running it through ./scripts/decode_stacktrace.sh please?

-Toke

2022-04-04 07:45:49

by Peter Seiderer

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Hello Toke,

On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:

> The ath9k driver was not properly clearing the status area in the
> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> it was manually filling in fields, which meant that fields introduced later
> were left as-is.
>
> Conveniently, mac80211 actually provides a helper to zero out the status
> area, so use that to make sure we zero everything.
>
> The last commit touching the driver function writing the status information
> seems to have actually been fixing an issue that was also caused by the
> area being uninitialised; but it only added clearing of a single field
> instead of the whole struct. That is now redundant, though, so revert that
> commit and use it as a convenient Fixes tag.
>
> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> Reported-by: Bagas Sanjaya <[email protected]>
> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
> ---
> drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> index d0caf1de2bde..cbcf96ac303e 100644
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> struct ath_hw *ah = sc->sc_ah;
> u8 i, tx_rateindex;
>
> + ieee80211_tx_info_clear_status(tx_info);
> +

As this also clears the status.rates[].count, see include/net/mac80211.h:

1195 static inline void
1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
1197 {
1198 int i;
1199
1200 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
1201 offsetof(struct ieee80211_tx_info, control.rates));
1202 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
1203 offsetof(struct ieee80211_tx_info, driver_rates));
1204 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8) ;
1205 /* clear the rate counts */
1206 for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
1207 info->status.rates[i].count = 0;
1208 memset_after(&info->status, 0, rates);
1209 }

I would have expected some lines added to restore the count (for the
rates with index < tx_rateindex), e.g. as done in
drivers/net/wireless/ath/ath5k/base.c:

1731 ieee80211_tx_info_clear_status(info);
1732
1733 for (i = 0; i < ts->ts_final_idx; i++) {
1734 struct ieee80211_tx_rate *r =
1735 &info->status.rates[i];
1736
1737 r->count = tries[i];
1738 }

In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
the tx_rateindex index (which is often zero in case the first suggested rate
succeeds, but in noisy environment is sometimes > 0)...

> if (txok)
> tx_info->status.ack_signal = ts->ts_rssi;
>
> @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> }
>
> tx_info->status.rates[)].count = ts->ts_longretry + 1;
> -
> - /* we report airtime in ath_tx_count_airtime(), don't report twice */
> - tx_info->status.tx_time = 0;
> }
>
> static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)


And from drivers/net/wireless/ath/ath9k/xmit.c:

2592 for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
2593 tx_info->status.rates[i].count = 0;
2594 tx_info->status.rates[i].idx = -1;
2595 }

Line 2593 can be deleted as status.rates[].count is already zeroed through the
ieee80211_tx_info_clear_status() call...

And it should be sufficient to do:

if (tx_rateindex + 1 < hw->max_rates)
tx_info->status.rates[tx_rateindex + 1].idx = -1;

Regards,
Peter

2022-04-04 13:41:05

by Peter Seiderer

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Hello Toke,

On Sat, 02 Apr 2022 17:11:54 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:

> Peter Seiderer <[email protected]> writes:
>
> > Hello Toke,
> >
> > On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <[email protected]> wrote:
> >
> >> Hello Toke,
> >>
> >> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:
> >>
> >> > The ath9k driver was not properly clearing the status area in the
> >> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> >> > it was manually filling in fields, which meant that fields introduced later
> >> > were left as-is.
> >> >
> >> > Conveniently, mac80211 actually provides a helper to zero out the status
> >> > area, so use that to make sure we zero everything.
> >> >
> >> > The last commit touching the driver function writing the status information
> >> > seems to have actually been fixing an issue that was also caused by the
> >> > area being uninitialised; but it only added clearing of a single field
> >> > instead of the whole struct. That is now redundant, though, so revert that
> >> > commit and use it as a convenient Fixes tag.
> >> >
> >> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> >> > Reported-by: Bagas Sanjaya <[email protected]>
> >> > Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
> >> > ---
> >> > drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
> >> > 1 file changed, 2 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> >> > index d0caf1de2bde..cbcf96ac303e 100644
> >> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> >> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> >> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> >> > struct ath_hw *ah = sc->sc_ah;
> >> > u8 i, tx_rateindex;
> >> >
> >> > + ieee80211_tx_info_clear_status(tx_info);
> >> > +
> >>
> >> As this also clears the status.rates[].count, see include/net/mac80211.h:
> >>
> >> 1195 static inline void
> >> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
> >> 1197 {
> >> 1198 int i;
> >> 1199
> >> 1200 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> >> 1201 offsetof(struct ieee80211_tx_info, control.rates));
> >> 1202 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> >> 1203 offsetof(struct ieee80211_tx_info, driver_rates));
> >> 1204 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8) ;
> >> 1205 /* clear the rate counts */
> >> 1206 for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> >> 1207 info->status.rates[i].count = 0;
> >> 1208 memset_after(&info->status, 0, rates);
> >> 1209 }
> >>
> >> I would have expected some lines added to restore the count (for the
> >> rates with index < tx_rateindex), e.g. as done in
> >> drivers/net/wireless/ath/ath5k/base.c:
> >>
> >> 1731 ieee80211_tx_info_clear_status(info);
> >> 1732
> >> 1733 for (i = 0; i < ts->ts_final_idx; i++) {
> >> 1734 struct ieee80211_tx_rate *r =
> >> 1735 &info->status.rates[i];
> >> 1736
> >> 1737 r->count = tries[i];
> >> 1738 }
> >>
> >> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
> >> the tx_rateindex index (which is often zero in case the first suggested rate
> >> succeeds, but in noisy environment is sometimes > 0)...
> >>
> >> > if (txok)
> >> > tx_info->status.ack_signal = ts->ts_rssi;
> >> >
> >> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> >> > }
> >> >
> >> > tx_info->status.rates[)].count = ts->ts_longretry + 1;
> >> > -
> >> > - /* we report airtime in ath_tx_count_airtime(), don't report twice */
> >> > - tx_info->status.tx_time = 0;
> >> > }
> >> >
> >> > static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)
> >>
> >>
> >> And from drivers/net/wireless/ath/ath9k/xmit.c:
> >>
> >> 2592 for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
> >> 2593 tx_info->status.rates[i].count = 0;
> >> 2594 tx_info->status.rates[i].idx = -1;
> >> 2595 }
> >>
> >> Line 2593 can be deleted as status.rates[].count is already zeroed through the
> >> ieee80211_tx_info_clear_status() call...
> >>
> >> And it should be sufficient to do:
> >>
> >> if (tx_rateindex + 1 < hw->max_rates)
> >> tx_info->status.rates[tx_rateindex + 1].idx = -1;
> >>
> >> Regards,
> >> Peter
> >
> > And one additional problem found with your patch applied (to 5.16.18), the log
> > get spammed by:
> >
> > [ 445.489139] ------------[ cut here ]------------
> > [ 445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
> > [ 445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
> > [ 445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G W 5.16.18+ #2
> > [ 445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
> > [ 445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
> > [ 445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
> > [ 445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
> > [ 445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
> > [ 445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
> > [ 445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
> > [ 445.489362] Call Trace:
> > [ 445.489366] <SOFTIRQ>
> > [ 445.489372] ath_tx_complete_buf+0x100/0x130 [ath9k]
> > [ 445.489390] ath_tx_process_buffer+0x187/0xb40 [ath9k]
> > [ 445.489407] ? resched_curr+0x1d/0xc0
> > [ 445.489419] ? check_preempt_wakeup+0x115/0x250
> > [ 445.489429] ? task_fork_fair+0xc0/0x170
> > [ 445.489436] ? put_prev_task_fair+0x40/0x40
> > [ 445.489444] ? check_preempt_curr+0x62/0x70
> > [ 445.489452] ? __local_bh_enable_ip+0x33/0x80
> > [ 445.489460] ? _raw_spin_unlock_bh+0x13/0x20
> > [ 445.489468] ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
> > [ 445.489485] ? _raw_spin_unlock_irqrestore+0x16/0x30
> > [ 445.489492] ? try_to_wake_up+0x7c/0x550
> > [ 445.489499] ath_tx_tasklet+0x1de/0x2e0 [ath9k]
> > [ 445.489519] ath9k_tasklet+0x22f/0x330 [ath9k]
> > [ 445.489535] tasklet_action_common.constprop.0+0x89/0xb0
> > [ 445.489544] tasklet_action+0x21/0x30
> > [ 445.489551] __do_softirq+0xc5/0x28b
> > [ 445.489559] ? __entry_text_end+0x4/0x4
> > [ 445.489567] call_on_stack+0x40/0x50
> > [ 445.489576] </SOFTIRQ>
> > [ 445.489580] ? irq_exit_rcu+0x92/0x100
> > [ 445.489587] ? common_interrupt+0x27/0x40
> > [ 445.489595] ? asm_common_interrupt+0x102/0x140
> > [ 445.489605] ---[ end trace 5c176b666255bca1 ]---
> > [ 445.607206] ------------[ cut here ]------------
> >
> >
> > Could be fixed by the following additional change:
> >
> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > struct ieee80211_hw *hw = sc->hw;
> > struct ath_hw *ah = sc->sc_ah;
> > u8 i, tx_rateindex;
> > + bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
> >
> > ieee80211_tx_info_clear_status(tx_info);
> >
> > + tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
> > +
> > if (txok)
>
> That doesn't seem right, zeroing is_valid_ack_signal was the whole point
> of the first patch... Something seems off in that backtrace; could you
> try running it through ./scripts/decode_stacktrace.sh please?

Will do (and re-run the test with an fresh compiled kernel as CONFIG_DEBUG_KERNEL
was not set - but need to wait until next week as I need physical access to
the test system for kernel update)...

Regards,
Peter

>
> -Toke

2022-04-04 21:17:29

by Peter Seiderer

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Hello Toke,

On Sat, 2 Apr 2022 18:19:10 +0200, Peter Seiderer <[email protected]> wrote:

> Hello Toke,
>
> On Sat, 02 Apr 2022 17:11:54 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:
>
> > Peter Seiderer <[email protected]> writes:
> >
> > > Hello Toke,
> > >
> > > On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <[email protected]> wrote:
> > >
> > >> Hello Toke,
> > >>
> > >> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:
> > >>
> > >> > The ath9k driver was not properly clearing the status area in the
> > >> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> > >> > it was manually filling in fields, which meant that fields introduced later
> > >> > were left as-is.
> > >> >
> > >> > Conveniently, mac80211 actually provides a helper to zero out the status
> > >> > area, so use that to make sure we zero everything.
> > >> >
> > >> > The last commit touching the driver function writing the status information
> > >> > seems to have actually been fixing an issue that was also caused by the
> > >> > area being uninitialised; but it only added clearing of a single field
> > >> > instead of the whole struct. That is now redundant, though, so revert that
> > >> > commit and use it as a convenient Fixes tag.
> > >> >
> > >> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> > >> > Reported-by: Bagas Sanjaya <[email protected]>
> > >> > Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
> > >> > ---
> > >> > drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
> > >> > 1 file changed, 2 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> > >> > index d0caf1de2bde..cbcf96ac303e 100644
> > >> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> > >> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > >> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > >> > struct ath_hw *ah = sc->sc_ah;
> > >> > u8 i, tx_rateindex;
> > >> >
> > >> > + ieee80211_tx_info_clear_status(tx_info);
> > >> > +
> > >>
> > >> As this also clears the status.rates[].count, see include/net/mac80211.h:
> > >>
> > >> 1195 static inline void
> > >> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
> > >> 1197 {
> > >> 1198 int i;
> > >> 1199
> > >> 1200 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> > >> 1201 offsetof(struct ieee80211_tx_info, control.rates));
> > >> 1202 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> > >> 1203 offsetof(struct ieee80211_tx_info, driver_rates));
> > >> 1204 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8) ;
> > >> 1205 /* clear the rate counts */
> > >> 1206 for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> > >> 1207 info->status.rates[i].count = 0;
> > >> 1208 memset_after(&info->status, 0, rates);
> > >> 1209 }
> > >>
> > >> I would have expected some lines added to restore the count (for the
> > >> rates with index < tx_rateindex), e.g. as done in
> > >> drivers/net/wireless/ath/ath5k/base.c:
> > >>
> > >> 1731 ieee80211_tx_info_clear_status(info);
> > >> 1732
> > >> 1733 for (i = 0; i < ts->ts_final_idx; i++) {
> > >> 1734 struct ieee80211_tx_rate *r =
> > >> 1735 &info->status.rates[i];
> > >> 1736
> > >> 1737 r->count = tries[i];
> > >> 1738 }
> > >>
> > >> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
> > >> the tx_rateindex index (which is often zero in case the first suggested rate
> > >> succeeds, but in noisy environment is sometimes > 0)...
> > >>
> > >> > if (txok)
> > >> > tx_info->status.ack_signal = ts->ts_rssi;
> > >> >
> > >> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > >> > }
> > >> >
> > >> > tx_info->status.rates[)].count = ts->ts_longretry + 1;
> > >> > -
> > >> > - /* we report airtime in ath_tx_count_airtime(), don't report twice */
> > >> > - tx_info->status.tx_time = 0;
> > >> > }
> > >> >
> > >> > static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)
> > >>
> > >>
> > >> And from drivers/net/wireless/ath/ath9k/xmit.c:
> > >>
> > >> 2592 for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
> > >> 2593 tx_info->status.rates[i].count = 0;
> > >> 2594 tx_info->status.rates[i].idx = -1;
> > >> 2595 }
> > >>
> > >> Line 2593 can be deleted as status.rates[].count is already zeroed through the
> > >> ieee80211_tx_info_clear_status() call...
> > >>
> > >> And it should be sufficient to do:
> > >>
> > >> if (tx_rateindex + 1 < hw->max_rates)
> > >> tx_info->status.rates[tx_rateindex + 1].idx = -1;
> > >>
> > >> Regards,
> > >> Peter
> > >
> > > And one additional problem found with your patch applied (to 5.16.18), the log
> > > get spammed by:
> > >
> > > [ 445.489139] ------------[ cut here ]------------
> > > [ 445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
> > > [ 445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
> > > [ 445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G W 5.16.18+ #2
> > > [ 445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
> > > [ 445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
> > > [ 445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
> > > [ 445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
> > > [ 445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
> > > [ 445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
> > > [ 445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
> > > [ 445.489362] Call Trace:
> > > [ 445.489366] <SOFTIRQ>
> > > [ 445.489372] ath_tx_complete_buf+0x100/0x130 [ath9k]
> > > [ 445.489390] ath_tx_process_buffer+0x187/0xb40 [ath9k]
> > > [ 445.489407] ? resched_curr+0x1d/0xc0
> > > [ 445.489419] ? check_preempt_wakeup+0x115/0x250
> > > [ 445.489429] ? task_fork_fair+0xc0/0x170
> > > [ 445.489436] ? put_prev_task_fair+0x40/0x40
> > > [ 445.489444] ? check_preempt_curr+0x62/0x70
> > > [ 445.489452] ? __local_bh_enable_ip+0x33/0x80
> > > [ 445.489460] ? _raw_spin_unlock_bh+0x13/0x20
> > > [ 445.489468] ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
> > > [ 445.489485] ? _raw_spin_unlock_irqrestore+0x16/0x30
> > > [ 445.489492] ? try_to_wake_up+0x7c/0x550
> > > [ 445.489499] ath_tx_tasklet+0x1de/0x2e0 [ath9k]
> > > [ 445.489519] ath9k_tasklet+0x22f/0x330 [ath9k]
> > > [ 445.489535] tasklet_action_common.constprop.0+0x89/0xb0
> > > [ 445.489544] tasklet_action+0x21/0x30
> > > [ 445.489551] __do_softirq+0xc5/0x28b
> > > [ 445.489559] ? __entry_text_end+0x4/0x4
> > > [ 445.489567] call_on_stack+0x40/0x50
> > > [ 445.489576] </SOFTIRQ>
> > > [ 445.489580] ? irq_exit_rcu+0x92/0x100
> > > [ 445.489587] ? common_interrupt+0x27/0x40
> > > [ 445.489595] ? asm_common_interrupt+0x102/0x140
> > > [ 445.489605] ---[ end trace 5c176b666255bca1 ]---
> > > [ 445.607206] ------------[ cut here ]------------
> > >
> > >
> > > Could be fixed by the following additional change:
> > >
> > > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> > > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > > @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > > struct ieee80211_hw *hw = sc->hw;
> > > struct ath_hw *ah = sc->sc_ah;
> > > u8 i, tx_rateindex;
> > > + bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
> > >
> > > ieee80211_tx_info_clear_status(tx_info);
> > >
> > > + tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
> > > +
> > > if (txok)
> >
> > That doesn't seem right, zeroing is_valid_ack_signal was the whole point
> > of the first patch... Something seems off in that backtrace; could you
> > try running it through ./scripts/decode_stacktrace.sh please?
>
> Will do (and re-run the test with an fresh compiled kernel as CONFIG_DEBUG_KERNEL
> was not set - but need to wait until next week as I need physical access to
> the test system for kernel update)...

Same with an fresh build (and with stacktrace run through decode_stacktrace.sh), but
did some more investigations, added an debug print of tx_info->status.is_valid_ack_signal
in ath_tx_rc_status() and an corresponding one in ath_txq_skb_done() printing out
fi->txq, with the additional

tx_info->status.is_valid_ack_signal = is_valid_ack_signal;

line added showed the following output

[ 839.543175] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
[ 839.543195] XXX - ath_txq_skb_done() txq: 2
[ 839.617898] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
[ 839.617961] XXX - ath_txq_skb_done() txq: 2
[ 840.024845] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
[ 840.024863] XXX - ath_txq_skb_done() txq: 2
[...]

and without the following:

[ 924.614745] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
[ 924.614762] XXX - ath_txq_skb_done() txq: 0
[ 924.614805] ------------[ cut here ]------------
[ 924.614809] WARNING: CPU: 0 PID: 0 at drivers/net/wireless/ath/ath9k/xmit.c:176 ath_txq_skb_done.constprop.0+0x28/0x38 [ath9k]
[...]


Seems ieee80211_tx_info_clear_status() clears/destroys the dual use of the
struct ieee80211_tx_info.status vs. ieee80211_tx_info.rate_driver_data...

See drivers/net/wireless/ath/ath9k/xmit.c

140 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
141 {
142 struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
143 BUILD_BUG_ON(sizeof(struct ath_frame_info) >
144 sizeof(tx_info->rate_driver_data));
145 return (struct ath_frame_info *) &tx_info->rate_driver_data[0];
146 }


205 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
206 struct sk_buff *skb)
207 {
208 struct ath_frame_info *fi = get_frame_info(skb);
209 int q = fi->txq;
210
211 if (q < 0)
212 return;
213
214 txq = sc->tx.txq_map[q];
215 if (WARN_ON(--txq->pending_frames < 0))
216 txq->pending_frames = 0;
217
218 }

Seems it only worked by chance (?) before...

Regards,
Peter


>
> Regards,
> Peter
>
> >
> > -Toke
>

2022-04-05 00:20:36

by Peter Seiderer

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Hello Toke,

On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <[email protected]> wrote:

> Hello Toke,
>
> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:
>
> > The ath9k driver was not properly clearing the status area in the
> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> > it was manually filling in fields, which meant that fields introduced later
> > were left as-is.
> >
> > Conveniently, mac80211 actually provides a helper to zero out the status
> > area, so use that to make sure we zero everything.
> >
> > The last commit touching the driver function writing the status information
> > seems to have actually been fixing an issue that was also caused by the
> > area being uninitialised; but it only added clearing of a single field
> > instead of the whole struct. That is now redundant, though, so revert that
> > commit and use it as a convenient Fixes tag.
> >
> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> > Reported-by: Bagas Sanjaya <[email protected]>
> > Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
> > ---
> > drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
> > 1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> > index d0caf1de2bde..cbcf96ac303e 100644
> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > struct ath_hw *ah = sc->sc_ah;
> > u8 i, tx_rateindex;
> >
> > + ieee80211_tx_info_clear_status(tx_info);
> > +
>
> As this also clears the status.rates[].count, see include/net/mac80211.h:
>
> 1195 static inline void
> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
> 1197 {
> 1198 int i;
> 1199
> 1200 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 1201 offsetof(struct ieee80211_tx_info, control.rates));
> 1202 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 1203 offsetof(struct ieee80211_tx_info, driver_rates));
> 1204 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8) ;
> 1205 /* clear the rate counts */
> 1206 for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> 1207 info->status.rates[i].count = 0;
> 1208 memset_after(&info->status, 0, rates);
> 1209 }
>
> I would have expected some lines added to restore the count (for the
> rates with index < tx_rateindex), e.g. as done in
> drivers/net/wireless/ath/ath5k/base.c:
>
> 1731 ieee80211_tx_info_clear_status(info);
> 1732
> 1733 for (i = 0; i < ts->ts_final_idx; i++) {
> 1734 struct ieee80211_tx_rate *r =
> 1735 &info->status.rates[i];
> 1736
> 1737 r->count = tries[i];
> 1738 }
>
> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
> the tx_rateindex index (which is often zero in case the first suggested rate
> succeeds, but in noisy environment is sometimes > 0)...
>
> > if (txok)
> > tx_info->status.ack_signal = ts->ts_rssi;
> >
> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > }
> >
> > tx_info->status.rates[)].count = ts->ts_longretry + 1;
> > -
> > - /* we report airtime in ath_tx_count_airtime(), don't report twice */
> > - tx_info->status.tx_time = 0;
> > }
> >
> > static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)
>
>
> And from drivers/net/wireless/ath/ath9k/xmit.c:
>
> 2592 for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
> 2593 tx_info->status.rates[i].count = 0;
> 2594 tx_info->status.rates[i].idx = -1;
> 2595 }
>
> Line 2593 can be deleted as status.rates[].count is already zeroed through the
> ieee80211_tx_info_clear_status() call...
>
> And it should be sufficient to do:
>
> if (tx_rateindex + 1 < hw->max_rates)
> tx_info->status.rates[tx_rateindex + 1].idx = -1;
>
> Regards,
> Peter

And one additional problem found with your patch applied (to 5.16.18), the log
get spammed by:

[ 445.489139] ------------[ cut here ]------------
[ 445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
[ 445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
[ 445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G W 5.16.18+ #2
[ 445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
[ 445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
[ 445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
[ 445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
[ 445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
[ 445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
[ 445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
[ 445.489362] Call Trace:
[ 445.489366] <SOFTIRQ>
[ 445.489372] ath_tx_complete_buf+0x100/0x130 [ath9k]
[ 445.489390] ath_tx_process_buffer+0x187/0xb40 [ath9k]
[ 445.489407] ? resched_curr+0x1d/0xc0
[ 445.489419] ? check_preempt_wakeup+0x115/0x250
[ 445.489429] ? task_fork_fair+0xc0/0x170
[ 445.489436] ? put_prev_task_fair+0x40/0x40
[ 445.489444] ? check_preempt_curr+0x62/0x70
[ 445.489452] ? __local_bh_enable_ip+0x33/0x80
[ 445.489460] ? _raw_spin_unlock_bh+0x13/0x20
[ 445.489468] ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
[ 445.489485] ? _raw_spin_unlock_irqrestore+0x16/0x30
[ 445.489492] ? try_to_wake_up+0x7c/0x550
[ 445.489499] ath_tx_tasklet+0x1de/0x2e0 [ath9k]
[ 445.489519] ath9k_tasklet+0x22f/0x330 [ath9k]
[ 445.489535] tasklet_action_common.constprop.0+0x89/0xb0
[ 445.489544] tasklet_action+0x21/0x30
[ 445.489551] __do_softirq+0xc5/0x28b
[ 445.489559] ? __entry_text_end+0x4/0x4
[ 445.489567] call_on_stack+0x40/0x50
[ 445.489576] </SOFTIRQ>
[ 445.489580] ? irq_exit_rcu+0x92/0x100
[ 445.489587] ? common_interrupt+0x27/0x40
[ 445.489595] ? asm_common_interrupt+0x102/0x140
[ 445.489605] ---[ end trace 5c176b666255bca1 ]---
[ 445.607206] ------------[ cut here ]------------


Could be fixed by the following additional change:

--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
struct ieee80211_hw *hw = sc->hw;
struct ath_hw *ah = sc->sc_ah;
u8 i, tx_rateindex;
+ bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;

ieee80211_tx_info_clear_status(tx_info);

+ tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
+
if (txok)

Regards,
Peter

2022-04-05 01:00:17

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Peter Seiderer <[email protected]> writes:

> Hello Toke,
>
> On Sat, 2 Apr 2022 18:19:10 +0200, Peter Seiderer <[email protected]> wrote:
>
>> Hello Toke,
>>
>> On Sat, 02 Apr 2022 17:11:54 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:
>>
>> > Peter Seiderer <[email protected]> writes:
>> >
>> > > Hello Toke,
>> > >
>> > > On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <[email protected]> wrote:
>> > >
>> > >> Hello Toke,
>> > >>
>> > >> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <[email protected]> wrote:
>> > >>
>> > >> > The ath9k driver was not properly clearing the status area in the
>> > >> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> > >> > it was manually filling in fields, which meant that fields introduced later
>> > >> > were left as-is.
>> > >> >
>> > >> > Conveniently, mac80211 actually provides a helper to zero out the status
>> > >> > area, so use that to make sure we zero everything.
>> > >> >
>> > >> > The last commit touching the driver function writing the status information
>> > >> > seems to have actually been fixing an issue that was also caused by the
>> > >> > area being uninitialised; but it only added clearing of a single field
>> > >> > instead of the whole struct. That is now redundant, though, so revert that
>> > >> > commit and use it as a convenient Fixes tag.
>> > >> >
>> > >> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> > >> > Reported-by: Bagas Sanjaya <[email protected]>
>> > >> > Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
>> > >> > ---
>> > >> > drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
>> > >> > 1 file changed, 2 insertions(+), 3 deletions(-)
>> > >> >
>> > >> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > index d0caf1de2bde..cbcf96ac303e 100644
>> > >> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > >> > struct ath_hw *ah = sc->sc_ah;
>> > >> > u8 i, tx_rateindex;
>> > >> >
>> > >> > + ieee80211_tx_info_clear_status(tx_info);
>> > >> > +
>> > >>
>> > >> As this also clears the status.rates[].count, see include/net/mac80211.h:
>> > >>
>> > >> 1195 static inline void
>> > >> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
>> > >> 1197 {
>> > >> 1198 int i;
>> > >> 1199
>> > >> 1200 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> > >> 1201 offsetof(struct ieee80211_tx_info, control.rates));
>> > >> 1202 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> > >> 1203 offsetof(struct ieee80211_tx_info, driver_rates));
>> > >> 1204 BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8) ;
>> > >> 1205 /* clear the rate counts */
>> > >> 1206 for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>> > >> 1207 info->status.rates[i].count = 0;
>> > >> 1208 memset_after(&info->status, 0, rates);
>> > >> 1209 }
>> > >>
>> > >> I would have expected some lines added to restore the count (for the
>> > >> rates with index < tx_rateindex), e.g. as done in
>> > >> drivers/net/wireless/ath/ath5k/base.c:
>> > >>
>> > >> 1731 ieee80211_tx_info_clear_status(info);
>> > >> 1732
>> > >> 1733 for (i = 0; i < ts->ts_final_idx; i++) {
>> > >> 1734 struct ieee80211_tx_rate *r =
>> > >> 1735 &info->status.rates[i];
>> > >> 1736
>> > >> 1737 r->count = tries[i];
>> > >> 1738 }
>> > >>
>> > >> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
>> > >> the tx_rateindex index (which is often zero in case the first suggested rate
>> > >> succeeds, but in noisy environment is sometimes > 0)...
>> > >>
>> > >> > if (txok)
>> > >> > tx_info->status.ack_signal = ts->ts_rssi;
>> > >> >
>> > >> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > >> > }
>> > >> >
>> > >> > tx_info->status.rates[)].count = ts->ts_longretry + 1;
>> > >> > -
>> > >> > - /* we report airtime in ath_tx_count_airtime(), don't report twice */
>> > >> > - tx_info->status.tx_time = 0;
>> > >> > }
>> > >> >
>> > >> > static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)
>> > >>
>> > >>
>> > >> And from drivers/net/wireless/ath/ath9k/xmit.c:
>> > >>
>> > >> 2592 for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
>> > >> 2593 tx_info->status.rates[i].count = 0;
>> > >> 2594 tx_info->status.rates[i].idx = -1;
>> > >> 2595 }
>> > >>
>> > >> Line 2593 can be deleted as status.rates[].count is already zeroed through the
>> > >> ieee80211_tx_info_clear_status() call...
>> > >>
>> > >> And it should be sufficient to do:
>> > >>
>> > >> if (tx_rateindex + 1 < hw->max_rates)
>> > >> tx_info->status.rates[tx_rateindex + 1].idx = -1;
>> > >>
>> > >> Regards,
>> > >> Peter
>> > >
>> > > And one additional problem found with your patch applied (to 5.16.18), the log
>> > > get spammed by:
>> > >
>> > > [ 445.489139] ------------[ cut here ]------------
>> > > [ 445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
>> > > [ 445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
>> > > [ 445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G W 5.16.18+ #2
>> > > [ 445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
>> > > [ 445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
>> > > [ 445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
>> > > [ 445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
>> > > [ 445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
>> > > [ 445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
>> > > [ 445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
>> > > [ 445.489362] Call Trace:
>> > > [ 445.489366] <SOFTIRQ>
>> > > [ 445.489372] ath_tx_complete_buf+0x100/0x130 [ath9k]
>> > > [ 445.489390] ath_tx_process_buffer+0x187/0xb40 [ath9k]
>> > > [ 445.489407] ? resched_curr+0x1d/0xc0
>> > > [ 445.489419] ? check_preempt_wakeup+0x115/0x250
>> > > [ 445.489429] ? task_fork_fair+0xc0/0x170
>> > > [ 445.489436] ? put_prev_task_fair+0x40/0x40
>> > > [ 445.489444] ? check_preempt_curr+0x62/0x70
>> > > [ 445.489452] ? __local_bh_enable_ip+0x33/0x80
>> > > [ 445.489460] ? _raw_spin_unlock_bh+0x13/0x20
>> > > [ 445.489468] ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
>> > > [ 445.489485] ? _raw_spin_unlock_irqrestore+0x16/0x30
>> > > [ 445.489492] ? try_to_wake_up+0x7c/0x550
>> > > [ 445.489499] ath_tx_tasklet+0x1de/0x2e0 [ath9k]
>> > > [ 445.489519] ath9k_tasklet+0x22f/0x330 [ath9k]
>> > > [ 445.489535] tasklet_action_common.constprop.0+0x89/0xb0
>> > > [ 445.489544] tasklet_action+0x21/0x30
>> > > [ 445.489551] __do_softirq+0xc5/0x28b
>> > > [ 445.489559] ? __entry_text_end+0x4/0x4
>> > > [ 445.489567] call_on_stack+0x40/0x50
>> > > [ 445.489576] </SOFTIRQ>
>> > > [ 445.489580] ? irq_exit_rcu+0x92/0x100
>> > > [ 445.489587] ? common_interrupt+0x27/0x40
>> > > [ 445.489595] ? asm_common_interrupt+0x102/0x140
>> > > [ 445.489605] ---[ end trace 5c176b666255bca1 ]---
>> > > [ 445.607206] ------------[ cut here ]------------
>> > >
>> > >
>> > > Could be fixed by the following additional change:
>> > >
>> > > --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> > > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> > > @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > > struct ieee80211_hw *hw = sc->hw;
>> > > struct ath_hw *ah = sc->sc_ah;
>> > > u8 i, tx_rateindex;
>> > > + bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
>> > >
>> > > ieee80211_tx_info_clear_status(tx_info);
>> > >
>> > > + tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
>> > > +
>> > > if (txok)
>> >
>> > That doesn't seem right, zeroing is_valid_ack_signal was the whole point
>> > of the first patch... Something seems off in that backtrace; could you
>> > try running it through ./scripts/decode_stacktrace.sh please?
>>
>> Will do (and re-run the test with an fresh compiled kernel as CONFIG_DEBUG_KERNEL
>> was not set - but need to wait until next week as I need physical access to
>> the test system for kernel update)...
>
> Same with an fresh build (and with stacktrace run through decode_stacktrace.sh), but
> did some more investigations, added an debug print of tx_info->status.is_valid_ack_signal
> in ath_tx_rc_status() and an corresponding one in ath_txq_skb_done() printing out
> fi->txq, with the additional
>
> tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
>
> line added showed the following output
>
> [ 839.543175] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [ 839.543195] XXX - ath_txq_skb_done() txq: 2
> [ 839.617898] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [ 839.617961] XXX - ath_txq_skb_done() txq: 2
> [ 840.024845] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [ 840.024863] XXX - ath_txq_skb_done() txq: 2
> [...]
>
> and without the following:
>
> [ 924.614745] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [ 924.614762] XXX - ath_txq_skb_done() txq: 0
> [ 924.614805] ------------[ cut here ]------------
> [ 924.614809] WARNING: CPU: 0 PID: 0 at drivers/net/wireless/ath/ath9k/xmit.c:176 ath_txq_skb_done.constprop.0+0x28/0x38 [ath9k]
> [...]
>
>
> Seems ieee80211_tx_info_clear_status() clears/destroys the dual use of the
> struct ieee80211_tx_info.status vs. ieee80211_tx_info.rate_driver_data...
>
> See drivers/net/wireless/ath/ath9k/xmit.c
>
> 140 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
> 141 {
> 142 struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
> 143 BUILD_BUG_ON(sizeof(struct ath_frame_info) >
> 144 sizeof(tx_info->rate_driver_data));
> 145 return (struct ath_frame_info *) &tx_info->rate_driver_data[0];
> 146 }
>
>
> 205 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
> 206 struct sk_buff *skb)
> 207 {
> 208 struct ath_frame_info *fi = get_frame_info(skb);
> 209 int q = fi->txq;
> 210
> 211 if (q < 0)
> 212 return;
> 213
> 214 txq = sc->tx.txq_map[q];
> 215 if (WARN_ON(--txq->pending_frames < 0))
> 216 txq->pending_frames = 0;
> 217
> 218 }
>
> Seems it only worked by chance (?) before...

Yikes, yeah, that's no good! Also, this comment above
ieee80211_tx_info_clear_status() is completely misleading:

* NOTE: You can only use this function if you do NOT use
* info->driver_data! Use info->rate_driver_data
* instead if you need only the less space that allows.

(I read that, and figured that the current use was OK, but nope, guess
not!)

The ath_frame_info struct is small enough to fit into the
status_driver_data field, though, so guess we could just move it there.
However, ieee80211_tx_info_clear_status() also clears that AFAICT, so
we'll need to save & restore it? That seems a bit odd, is this
intentional Johannes?

-Toke