Subject: [PATCH] iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (roaming)

Issues like this, in scenarios with continuous roaming, have been reported:

...
[149909.128985] wlp3s0: disconnect from AP 34:13:e8:b1:db:9a for new auth to 34:13:e8:3c:fb:db
[149909.307717] wlp3s0: authenticate with 34:13:e8:3c:fb:db
[149909.313266] wlp3s0: 80 MHz not supported, disabling VHT
[149909.430578] wlp3s0: send auth to 34:13:e8:3c:fb:db (try 1/3)
[149909.482194] wlp3s0: authenticated
[149909.487314] wlp3s0: associate with 34:13:e8:3c:fb:db (try 1/3)
[149909.507995] wlp3s0: RX ReassocResp from 34:13:e8:3c:fb:db (capab=0x411 status=0 aid=1)
[149909.523045] wlp3s0: associated
[149911.155963] wlp3s0: disconnect from AP 34:13:e8:3c:fb:db for new auth to 34:13:e8:b1:db:9a
[149911.338758] wlp3s0: authenticate with 34:13:e8:b1:db:9a
[149911.344214] wlp3s0: 80 MHz not supported, disabling VHT
[149911.461627] wlp3s0: send auth to 34:13:e8:b1:db:9a (try 1/3)
[149911.527693] wlp3s0: authenticated
[149911.531890] wlp3s0: associate with 34:13:e8:b1:db:9a (try 1/3)
[149911.555003] wlp3s0: RX ReassocResp from 34:13:e8:b1:db:9a (capab=0x411 status=0 aid=1)
[149911.570254] wlp3s0: associated
[149913.302972] wlp3s0: disconnect from AP 34:13:e8:b1:db:9a for new auth to 34:13:e8:3c:fb:db
[149913.512736] wlp3s0: authenticate with 34:13:e8:3c:fb:db
[149913.518183] wlp3s0: 80 MHz not supported, disabling VHT
[149913.635546] wlp3s0: send auth to 34:13:e8:3c:fb:db (try 1/3)
[149913.672891] list_add double add: new=ffff8e66943f1300, prev=ffff8e66943f1300, next=ffff8e6692a72300.
[149913.688659] ------------[ cut here ]------------
[149913.699785] kernel BUG at lib/list_debug.c:33!
[149913.704403] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[149913.709540] CPU: 1 PID: 626 Comm: wpa_supplicant Not tainted 6.2.0-rc6+ #6
[149913.716602] Hardware name: Dell Inc. Inspiron 660s/0478VN , BIOS A07 08/24/2012
[149913.724725] RIP: 0010:__list_add_valid.cold+0x23/0x5b
[149913.729953] Code: 01 00 e9 f0 be 90 ff 48 c7 c7 e8 38 7b 90 e8 4b 6d fd ff 0f 0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 e8 39 7b 90 e8 34 6d fd ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 90 39 7b 90 e8 1d 6d fd
[149913.749125] RSP: 0018:ffff9c0a009b7708 EFLAGS: 00010286
[149913.754437] RAX: 0000000000000058 RBX: ffff8e66943f1300 RCX: 0000000000000000
[149913.761853] RDX: 0000000000000201 RSI: ffffffff90799b20 RDI: 00000000ffffffff
[149913.769192] RBP: ffff8e66943f12e8 R08: 77656e203a646461 R09: 20656c62756f6420
[149913.776532] R10: 20656c62756f6420 R11: 6464615f7473696c R12: ffff8e6692a72080
[149913.783873] R13: ffff8e66943f1300 R14: ffff8e6692a72300 R15: ffff8e6692a708e0
[149913.791214] FS: 00007f2e55a027c0(0000) GS:ffff8e669a300000(0000) knlGS:0000000000000000
[149913.799438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[149913.805353] CR2: 00007f101a652ce0 CR3: 0000000108404006 CR4: 00000000000606e0
[149913.812681] Call Trace:
[149913.815332] <TASK>
[149913.817552] iwl_mvm_mac_wake_tx_queue+0xb5/0x114 [iwlmvm]
[149913.823256] ieee80211_queue_skb+0x2ca/0x730 [mac80211]
[149913.828752] ieee80211_tx+0x9a/0x110 [mac80211]
...

This can be reproduced with a single script from the station:
while true; do
wpa_cli -i wlp3s0 roam 34:13:E8:B1:DB:9A
sleep 2
wpa_cli -i wlp3s0 roam 34:13:E8:3C:FB:DB
sleep 2
done
And flooding with tx traffic.

The reason is the race condition produced in iwl_mvm_mac_wake_tx_queue when
add_stream_txqs is added to que list but only if the list is empty, because
iwl_mvm_mac_wake_tx_queue can be called concurrently. In the previous example,
it has been seen that two simultaneous executions of iwl_mvm_mac_wake_tx_queue
have progressed by adding to the list because the queue was empty at that
moment for both and the second one tried to add the same element to the list.

So protection for this operation is needed. After applying the solution, the
problem does not reproduce.

Fixes: cfbc6c4c5b91c ("iwlwifi: mvm: support mac80211 TXQs model")
Reported-by: Petr Stourac <[email protected]>
Signed-off-by: Jose Ignacio Tornos Martinez <[email protected]>
---
drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 9 ++++++++-
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 1 +
drivers/net/wireless/intel/iwlwifi/mvm/sta.c | 1 +
3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index 5273ade71117..6d591b48f1bf 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -787,11 +787,18 @@ static void iwl_mvm_mac_wake_tx_queue(struct ieee80211_hw *hw,
return;
}

+ spin_lock_bh(&mvmtxq->list_lock);
+
/* The list is being deleted only after the queue is fully allocated. */
- if (!list_empty(&mvmtxq->list))
+ if (!list_empty(&mvmtxq->list)) {
+ spin_unlock_bh(&mvmtxq->list_lock);
return;
+ }

list_add_tail(&mvmtxq->list, &mvm->add_stream_txqs);
+
+ spin_unlock_bh(&mvmtxq->list_lock);
+
schedule_work(&mvm->add_stream_wk);
}

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
index ce6b701f3f4c..f51bdb4d41fb 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
@@ -727,6 +727,7 @@ enum iwl_mvm_queue_status {

struct iwl_mvm_txq {
struct list_head list;
+ spinlock_t list_lock;
u16 txq_id;
atomic_t tx_request;
bool stopped;
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
index 69634fb82a9b..252a652be79f 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
@@ -1705,6 +1705,7 @@ int iwl_mvm_add_sta(struct iwl_mvm *mvm,

mvmtxq->txq_id = IWL_MVM_INVALID_QUEUE;
INIT_LIST_HEAD(&mvmtxq->list);
+ spin_lock_init(&mvmtxq->list_lock);
atomic_set(&mvmtxq->tx_request, 0);
}

--
2.38.1



2023-03-14 11:26:51

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (roaming)

Hi,

> Issues like this, in scenarios with continuous roaming, have been reported:

Can you say where? Any public reports?

> This can be reproduced with a single script from the station:
> while true; do
> wpa_cli -i wlp3s0 roam 34:13:E8:B1:DB:9A
> sleep 2
> wpa_cli -i wlp3s0 roam 34:13:E8:3C:FB:DB
> sleep 2
> done
> And flooding with tx traffic.

Oh, nice to have a reproducer.


Funny thing is, I was _just_ looking at this exact bug, because we were
discussing all this concurrency over in

https://lore.kernel.org/r/[email protected]

> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
> @@ -787,11 +787,18 @@ static void iwl_mvm_mac_wake_tx_queue(struct ieee80211_hw *hw,
> return;
> }
>
> + spin_lock_bh(&mvmtxq->list_lock);
> +
> /* The list is being deleted only after the queue is fully allocated. */
> - if (!list_empty(&mvmtxq->list))
> + if (!list_empty(&mvmtxq->list)) {
> + spin_unlock_bh(&mvmtxq->list_lock);
> return;
> + }
>
> list_add_tail(&mvmtxq->list, &mvm->add_stream_txqs);
> +
> + spin_unlock_bh(&mvmtxq->list_lock);


While this might fix the issue as far as you could observe, that is
clearly not sufficient, since you don't protect the list on the other
side, where the items are removed from it again.

Below are the two patches that I've come up with so far, if anyone wants
to try them. Please ignore all the extra metadata, I exported this
directly from our internal code base.

johannes


From 191299b19eeb95a92a78152c1b15bec3d77b7a5b Mon Sep 17 00:00:00 2001
From: Johannes Berg <[email protected]>
Date: Mon, 13 Mar 2023 22:23:19 +0100
Subject: [PATCH 1/2] [BUGFIX] wifi: iwlwifi: mvm: fix mvmtxq->stopped handling

This could race if the queue is redirected while full, then
the flushing internally would start it while it's not yet
usable again. Fix it by using two state bits instead of just
one.

type=bugfix
ticket=none
fixes=I37dfb07e617ba0ba6fad55eaf7bd7b28bd24b82e

Change-Id: I68fc8d1a16a7c6927ac92fe855ada741acb9fe14
Signed-off-by: Johannes Berg <[email protected]>
---
drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 5 ++++-
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 4 +++-
drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 5 ++++-
drivers/net/wireless/intel/iwlwifi/mvm/sta.c | 4 ++--
4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index b068639f516b..cc51469cccdd 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -961,7 +961,10 @@ void iwl_mvm_mac_itxq_xmit(struct ieee80211_hw *hw, struct ieee80211_txq *txq)

rcu_read_lock();
do {
- while (likely(!mvmtxq->stopped &&
+ while (likely(!test_bit(IWL_MVM_TXQ_STATE_STOP_FULL,
+ &mvmtxq->state) &&
+ !test_bit(IWL_MVM_TXQ_STATE_STOP_REDIRECT,
+ &mvmtxq->state) &&
!test_bit(IWL_MVM_STATUS_IN_D3, &mvm->status))) {
skb = ieee80211_tx_dequeue(hw, txq);

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
index f598f0c7d145..8040ad4c13b0 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
@@ -789,7 +789,9 @@ struct iwl_mvm_txq {
struct list_head list;
u16 txq_id;
atomic_t tx_request;
- bool stopped;
+#define IWL_MVM_TXQ_STATE_STOP_FULL 0
+#define IWL_MVM_TXQ_STATE_STOP_REDIRECT 1
+ unsigned long state;
};

static inline struct iwl_mvm_txq *
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
index f4df101fac58..1a925ea9f822 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
@@ -2000,7 +2000,10 @@ static void iwl_mvm_queue_state_change(struct iwl_op_mode *op_mode,

txq = sta->txq[tid];
mvmtxq = iwl_mvm_txq_from_mac80211(txq);
- mvmtxq->stopped = !start;
+ if (start)
+ clear_bit(IWL_MVM_TXQ_STATE_STOP_FULL, &mvmtxq->state);
+ else
+ set_bit(IWL_MVM_TXQ_STATE_STOP_FULL, &mvmtxq->state);

if (start && mvmsta->sta_state != IEEE80211_STA_NOTEXIST) {
local_bh_disable();
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
index e8f38db02dc2..5b39da622f37 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
@@ -730,7 +730,7 @@ static int iwl_mvm_redirect_queue(struct iwl_mvm *mvm, int queue, int tid,
queue, iwl_mvm_ac_to_tx_fifo[ac]);

/* Stop the queue and wait for it to empty */
- txq->stopped = true;
+ set_bit(IWL_MVM_TXQ_STATE_STOP_REDIRECT, &txq->state);

ret = iwl_trans_wait_tx_queues_empty(mvm->trans, BIT(queue));
if (ret) {
@@ -773,7 +773,7 @@ static int iwl_mvm_redirect_queue(struct iwl_mvm *mvm, int queue, int tid,

out:
/* Continue using the queue */
- txq->stopped = false;
+ clear_bit(IWL_MVM_TXQ_STATE_STOP_REDIRECT, &txq->state);

return ret;
}
--
2.39.2


From 7001088d33bf1ad9c9c5f24a59502b2a0e89a25b Mon Sep 17 00:00:00 2001
From: Johannes Berg <[email protected]>
Date: Mon, 13 Mar 2023 22:53:19 +0100
Subject: [PATCH 2/2] [BUGFIX] wifi: iwlwifi: mvm: protect TXQ list
manipulation

Some recent upstream debugging uncovered the fact that in
iwlwifi, the TXQ list manipulation is racy.

Introduce a new state bit for when the TXQ is completely
ready and can be used without locking, and if that's not
set yet acquire the lock to check everything correctly.

type=bugfix
ticket=none
fixes=I37dfb07e617ba0ba6fad55eaf7bd7b28bd24b82e

Change-Id: Ife6276980e699d6d8dc0307bfd3a87b36c3e02d0
Signed-off-by: Johannes Berg <[email protected]>
---
.../net/wireless/intel/iwlwifi/mvm/mac80211.c | 41 +++++--------------
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 2 +
drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 1 +
drivers/net/wireless/intel/iwlwifi/mvm/sta.c | 25 +++++++++--
4 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index cc51469cccdd..fb3c22fb1ae4 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -989,42 +989,21 @@ void iwl_mvm_mac_wake_tx_queue(struct ieee80211_hw *hw,
struct iwl_mvm *mvm = IWL_MAC80211_GET_MVM(hw);
struct iwl_mvm_txq *mvmtxq = iwl_mvm_txq_from_mac80211(txq);

- /*
- * Please note that racing is handled very carefully here:
- * mvmtxq->txq_id is updated during allocation, and mvmtxq->list is
- * deleted afterwards.
- * This means that if:
- * mvmtxq->txq_id != INVALID_QUEUE && list_empty(&mvmtxq->list):
- * queue is allocated and we can TX.
- * mvmtxq->txq_id != INVALID_QUEUE && !list_empty(&mvmtxq->list):
- * a race, should defer the frame.
- * mvmtxq->txq_id == INVALID_QUEUE && list_empty(&mvmtxq->list):
- * need to allocate the queue and defer the frame.
- * mvmtxq->txq_id == INVALID_QUEUE && !list_empty(&mvmtxq->list):
- * queue is already scheduled for allocation, no need to allocate,
- * should defer the frame.
- */
-
- /* If the queue is allocated TX and return. */
- if (!txq->sta || mvmtxq->txq_id != IWL_MVM_INVALID_QUEUE) {
- /*
- * Check that list is empty to avoid a race where txq_id is
- * already updated, but the queue allocation work wasn't
- * finished
- */
- if (unlikely(txq->sta && !list_empty(&mvmtxq->list)))
- return;
-
+ if (likely(test_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state)) ||
+ !txq->sta) {
iwl_mvm_mac_itxq_xmit(hw, txq);
return;
}

+ spin_lock_bh(&mvm->add_stream_lock);
/* The list is being deleted only after the queue is fully allocated. */
- if (!list_empty(&mvmtxq->list))
- return;
-
- list_add_tail(&mvmtxq->list, &mvm->add_stream_txqs);
- schedule_work(&mvm->add_stream_wk);
+ if (list_empty(&mvmtxq->list) &&
+ /* recheck under lock */
+ !test_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state)) {
+ list_add_tail(&mvmtxq->list, &mvm->add_stream_txqs);
+ schedule_work(&mvm->add_stream_wk);
+ }
+ spin_unlock_bh(&mvm->add_stream_lock);
}

#define CHECK_BA_TRIGGER(_mvm, _trig, _tid_bm, _tid, _fmt...) \
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
index 8040ad4c13b0..4652c90147c3 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
@@ -791,6 +791,7 @@ struct iwl_mvm_txq {
atomic_t tx_request;
#define IWL_MVM_TXQ_STATE_STOP_FULL 0
#define IWL_MVM_TXQ_STATE_STOP_REDIRECT 1
+#define IWL_MVM_TXQ_STATE_READY 2
unsigned long state;
};

@@ -938,6 +939,7 @@ struct iwl_mvm {
struct iwl_mvm_tvqm_txq_info tvqm_info[IWL_MAX_TVQM_QUEUES];
};
struct work_struct add_stream_wk; /* To add streams to queues */
+ spinlock_t add_stream_lock;

const char *nvm_file_name;
struct iwl_nvm_data *nvm_data;
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
index 1a925ea9f822..aa6eb9986641 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
@@ -1396,6 +1396,7 @@ iwl_op_mode_mvm_start(struct iwl_trans *trans, const struct iwl_cfg *cfg,
INIT_DELAYED_WORK(&mvm->scan_timeout_dwork, iwl_mvm_scan_timeout_wk);
INIT_WORK(&mvm->add_stream_wk, iwl_mvm_add_new_dqa_stream_wk);
INIT_LIST_HEAD(&mvm->add_stream_txqs);
+ spin_lock_init(&mvm->add_stream_lock);

init_waitqueue_head(&mvm->rx_sync_waitq);

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
index 5b39da622f37..bf2f3b1bc888 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
@@ -421,8 +421,11 @@ static int iwl_mvm_disable_txq(struct iwl_mvm *mvm, struct ieee80211_sta *sta,
struct iwl_mvm_txq *mvmtxq =
iwl_mvm_txq_from_tid(sta, tid);

- mvmtxq->txq_id = IWL_MVM_INVALID_QUEUE;
+ spin_lock_bh(&mvm->add_stream_lock);
list_del_init(&mvmtxq->list);
+ clear_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state);
+ mvmtxq->txq_id = IWL_MVM_INVALID_QUEUE;
+ spin_unlock_bh(&mvm->add_stream_lock);
}

/* Regardless if this is a reserved TXQ for a STA - mark it as false */
@@ -516,8 +519,11 @@ static int iwl_mvm_remove_sta_queue_marking(struct iwl_mvm *mvm, int queue)
disable_agg_tids |= BIT(tid);
mvmsta->tid_data[tid].txq_id = IWL_MVM_INVALID_QUEUE;

- mvmtxq->txq_id = IWL_MVM_INVALID_QUEUE;
+ spin_lock_bh(&mvm->add_stream_lock);
list_del_init(&mvmtxq->list);
+ clear_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state);
+ mvmtxq->txq_id = IWL_MVM_INVALID_QUEUE;
+ spin_unlock_bh(&mvm->add_stream_lock);
}

mvmsta->tfd_queue_msk &= ~BIT(queue); /* Don't use this queue anymore */
@@ -1523,12 +1529,22 @@ void iwl_mvm_add_new_dqa_stream_wk(struct work_struct *wk)
* a queue in the function itself.
*/
if (iwl_mvm_sta_alloc_queue(mvm, txq->sta, txq->ac, tid)) {
+ spin_lock_bh(&mvm->add_stream_lock);
list_del_init(&mvmtxq->list);
+ spin_unlock_bh(&mvm->add_stream_lock);
continue;
}

- list_del_init(&mvmtxq->list);
+ /* now we're ready, any remaining races/concurrency will be
+ * handled in iwl_mvm_mac_itxq_xmit()
+ */
+ set_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state);
+
local_bh_disable();
+ spin_lock(&mvm->add_stream_lock);
+ list_del_init(&mvmtxq->list);
+ spin_unlock(&mvm->add_stream_lock);
+
iwl_mvm_mac_itxq_xmit(mvm->hw, txq);
local_bh_enable();
}
@@ -1965,8 +1981,11 @@ static void iwl_mvm_disable_sta_queues(struct iwl_mvm *mvm,
struct iwl_mvm_txq *mvmtxq =
iwl_mvm_txq_from_mac80211(sta->txq[i]);

+ spin_lock_bh(&mvm->add_stream_lock);
mvmtxq->txq_id = IWL_MVM_INVALID_QUEUE;
list_del_init(&mvmtxq->list);
+ clear_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state);
+ spin_unlock_bh(&mvm->add_stream_lock);
}
}

--
2.39.2



Subject: [PATCH] iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (roaming)

Hi,

>> Issues like this, in scenarios with continuous roaming, have been reported:
> Can you say where? Any public reports?
We have some Bugzilla (i.e.
https://bugzilla.redhat.com/show_bug.cgi?id=2152168).
You only have to create a user for this tool.

>> This can be reproduced with a single script from the station:
>> while true; do
>> wpa_cli -i wlp3s0 roam 34:13:E8:B1:DB:9A
>> sleep 2
>> wpa_cli -i wlp3s0 roam 34:13:E8:3C:FB:DB
>> sleep 2
>> done
>> And flooding with tx traffic.
>Oh, nice to have a reproducer.
It is not immediate but I can reproduce here like this.

> Funny thing is, I was _just_ looking at this exact bug, because we were
> discussing all this concurrency over in
So more people is struggling this this, good to get the best solution.

> https://lore.kernel.org/r/[email protected]

>> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
>> @@ -787,11 +787,18 @@ static void iwl_mvm_mac_wake_tx_queue(struct ieee80211_hw *hw,
>> return;
>> }
>>
>> + spin_lock_bh(&mvmtxq->list_lock);
>> +
>> /* The list is being deleted only after the queue is fully allocated. */
>> - if (!list_empty(&mvmtxq->list))
>> + if (!list_empty(&mvmtxq->list)) {
>> + spin_unlock_bh(&mvmtxq->list_lock);
>> return;
>> + }
>>
>> list_add_tail(&mvmtxq->list, &mvm->add_stream_txqs);
>> +
>> + spin_unlock_bh(&mvmtxq->list_lock);


> While this might fix the issue as far as you could observe, that is
> clearly not sufficient, since you don't protect the list on the other
> side, where the items are removed from it again.
Ok, I thought about that as well but I was not able to find any problem with
the other side. Anyway, the better the solution is made the better.

> Below are the two patches that I've come up with so far, if anyone wants
> to try them. Please ignore all the extra metadata, I exported this
> directly from our internal code base.
Of course, I can test the soutions here in order to be sure.
Do you prefer I reply with the result here or in the other thread that you have
commented me before?

Thanks

Best regards
Jose Ignacio


2023-03-14 14:57:15

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (roaming)

On Tue, 2023-03-14 at 15:52 +0100, Jose Ignacio Tornos Martinez wrote:
>
> We have some Bugzilla (i.e.
> https://bugzilla.redhat.com/show_bug.cgi?id=2152168).
> You only have to create a user for this tool.

Looks like it's not public, and anyway I was mostly asking for a commit
log record, so that's not useful even if I could get access:

"You are not authorized to access bug #2152168."

> > > This can be reproduced with a single script from the station:
> > > while true; do
> > > wpa_cli -i wlp3s0 roam 34:13:E8:B1:DB:9A
> > > sleep 2
> > > wpa_cli -i wlp3s0 roam 34:13:E8:3C:FB:DB
> > > sleep 2
> > > done
> > > And flooding with tx traffic.
> > Oh, nice to have a reproducer.
> It is not immediate but I can reproduce here like this.

Right.

> > Funny thing is, I was _just_ looking at this exact bug, because we were
> > discussing all this concurrency over in
> So more people is struggling this this, good to get the best solution.

Well that was a different issue in different drivers, but then we looked
at it ...

> > While this might fix the issue as far as you could observe, that is
> > clearly not sufficient, since you don't protect the list on the other
> > side, where the items are removed from it again.

> Ok, I thought about that as well but I was not able to find any problem with
> the other side. Anyway, the better the solution is made the better.

Well it doesn't _look_ like it accesses the list, but the
list_del_init(&mvmtxq->list) on the other side will access the list too.

> > Below are the two patches that I've come up with so far, if anyone wants
> > to try them. Please ignore all the extra metadata, I exported this
> > directly from our internal code base.
> Of course, I can test the soutions here in order to be sure.
> Do you prefer I reply with the result here or in the other thread that you have
> commented me before?

Here is fine, thanks!

johannes

Subject: Re: [PATCH] iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (roaming)

My previous solution was working with my test during one week with no problem
and now your more complete solution has been working during 2 days with no
problem as well. I was able to reproduce in less time with my test, so your
solution is successfully verified and I can say:

Tested-by: Jose Ignacio Tornos Martinez <[email protected]>

Thanks