V2:
- Functionally decomposes the DXE reset in an additional patch.
Since we call this logic more than once, it should be in a function.
- Leaves as-is the DXE reset write.
Johannes Berg asked me if we are sure by the time the write to the reset
register completes that DXE transactions will be suitably quiesced.
The answer is:
1. I believe these writes are non-posted writes
2. Downstream doesn't poll for DXE reset completion
So on #2 I have no real data for or against a polling operation, my tests
indicate the reset indication in the register is atomic and as far as I
can discern that also means DMA transactions are terminated.
V1:
Digging around through some bugs reported from an extensive testing cycle
we've found that wcn36xx has a number of unexplained RX related oopses.
In at least one case we appear to have DMA'd data to an unmapped region.
The written data appears to be a correctly formed DMA buffer descriptor - a
DXE BD in WCNSS parlance, with an AP beacon inside of it.
Reasoning about how such a situation might come about and reviewing the
run-time code, there was no obvious path where we might free a BD or an
skbuff pointed to by a BD, which DXE might not be aware of.
However looking at the ieee80211_ops.start and ieee80211_ops.stop callbacks
in wcn36xx we can see a number of bugs associated with BD allocation, error
handling and leaving the DMA engine active, despite freeing SKBs on the MSM
side.
This last mention - failure to quiesce potential DMA from the downstream
agent - WCNSS DXE despite freeing the memory @ the skbuffs is a decent
candidate for our unexplained upstream DMA transaction to unmapped memory.
Since wcn36xx_stop and wcn36xx_start can be called a number of times by the
controlling upper layers it means there is a potential gap between
wcn36xx_stop and wcn36xx_start which could leave WCNSS in a state where it
will try to DMA to memory we have freed.
This series addresses the obvious bugs that jump out on the start()/stop()
path.
Patch #1
In order to make it easier to read the DXE code, I've moved all of the
lock taking and freeing for DXE into dxe.c
Patch #2
Fixes a very obviously broken channel enable/disable cycle
Patch #3
Fixes a very obvious memory leak with dma_alloc_coherent()
Patch #4
Makes sure before we release skbuffs which we assigned to the RX channels
that we ensure the DXE block is put into reset
Bryan O'Donoghue (5):
wcn36xx: Fix dxe lock layering violation
wcn36xx: Fix DMA channel enable/disable cycle
wcn36xx: Release DMA channel descriptor allocations
wcn36xx: Functionally decompose DXE reset
wcn36xx: Put DXE block into reset before freeing memory
drivers/net/wireless/ath/wcn36xx/dxe.c | 83 +++++++++++++++++++++----
drivers/net/wireless/ath/wcn36xx/dxe.h | 2 +
drivers/net/wireless/ath/wcn36xx/txrx.c | 15 +----
3 files changed, 74 insertions(+), 26 deletions(-)
--
2.33.0
When unloading the driver we are not releasing the DMA descriptors which we
previously allocated.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <[email protected]>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index b832afedde76a..30f4484b336c3 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -1084,4 +1084,9 @@ void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_l_ch);
wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_h_ch);
+
+ wcn36xx_dxe_deinit_descs(wcn->dev, &wcn->dxe_tx_l_ch);
+ wcn36xx_dxe_deinit_descs(wcn->dev, &wcn->dxe_tx_h_ch);
+ wcn36xx_dxe_deinit_descs(wcn->dev, &wcn->dxe_rx_l_ch);
+ wcn36xx_dxe_deinit_descs(wcn->dev, &wcn->dxe_rx_h_ch);
}
--
2.33.0
We are looking at some interesting crashes with wcn36xx in the wild, with
some of the data appearing to indicate multiple instances of "WARNING
Spurious TX complete indication".
Looking at the code here we see that txrx.c is taking the dxe.c lock to set
and unset the current TX skbuff pointer.
There is no obvious logical bug however, it is a layering violation to
share locks like this.
Lets tidy up the code a bit by making access functions to set and unset the
TX sbuff. This makes it easier to reason about this code without having to
switch between multiple files.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <[email protected]>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 26 +++++++++++++++++++++++++
drivers/net/wireless/ath/wcn36xx/dxe.h | 2 ++
drivers/net/wireless/ath/wcn36xx/txrx.c | 15 ++------------
3 files changed, 30 insertions(+), 13 deletions(-)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index c4e9e939d7d6d..6c43df4bc92c3 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -878,6 +878,32 @@ int wcn36xx_dxe_tx_flush(struct wcn36xx *wcn)
return -EBUSY;
}
+int wcn36xx_dxe_set_tx_ack_skb(struct wcn36xx *wcn, struct sk_buff *skb)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&wcn->dxe_lock, flags);
+ if (wcn->tx_ack_skb) {
+ spin_unlock_irqrestore(&wcn->dxe_lock, flags);
+ wcn36xx_warn("tx_ack_skb already set\n");
+ return -EINVAL;
+ }
+
+ wcn->tx_ack_skb = skb;
+ spin_unlock_irqrestore(&wcn->dxe_lock, flags);
+
+ return 0;
+}
+
+void wcn36xx_dxe_unset_tx_ack_skb(struct wcn36xx *wcn)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&wcn->dxe_lock, flags);
+ wcn->tx_ack_skb = NULL;
+ spin_unlock_irqrestore(&wcn->dxe_lock, flags);
+}
+
int wcn36xx_dxe_init(struct wcn36xx *wcn)
{
int reg_data = 0, ret;
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.h b/drivers/net/wireless/ath/wcn36xx/dxe.h
index 26a31edf52e99..9a7655d6af982 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.h
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.h
@@ -468,4 +468,6 @@ int wcn36xx_dxe_tx_frame(struct wcn36xx *wcn,
bool is_low);
int wcn36xx_dxe_tx_flush(struct wcn36xx *wcn);
void wcn36xx_dxe_tx_ack_ind(struct wcn36xx *wcn, u32 status);
+int wcn36xx_dxe_set_tx_ack_skb(struct wcn36xx *wcn, struct sk_buff *skb);
+void wcn36xx_dxe_unset_tx_ack_skb(struct wcn36xx *wcn);
#endif /* _DXE_H_ */
diff --git a/drivers/net/wireless/ath/wcn36xx/txrx.c b/drivers/net/wireless/ath/wcn36xx/txrx.c
index d727b0dd98de5..1218bd85de3ba 100644
--- a/drivers/net/wireless/ath/wcn36xx/txrx.c
+++ b/drivers/net/wireless/ath/wcn36xx/txrx.c
@@ -584,7 +584,6 @@ int wcn36xx_start_tx(struct wcn36xx *wcn,
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
struct wcn36xx_vif *vif_priv = NULL;
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- unsigned long flags;
bool is_low = ieee80211_is_data(hdr->frame_control);
bool bcast = is_broadcast_ether_addr(hdr->addr1) ||
is_multicast_ether_addr(hdr->addr1);
@@ -606,15 +605,8 @@ int wcn36xx_start_tx(struct wcn36xx *wcn,
if (info->flags & IEEE80211_TX_CTL_REQ_TX_STATUS) {
wcn36xx_dbg(WCN36XX_DBG_DXE, "TX_ACK status requested\n");
- spin_lock_irqsave(&wcn->dxe_lock, flags);
- if (wcn->tx_ack_skb) {
- spin_unlock_irqrestore(&wcn->dxe_lock, flags);
- wcn36xx_warn("tx_ack_skb already set\n");
+ if (wcn36xx_dxe_set_tx_ack_skb(wcn, skb))
return -EINVAL;
- }
-
- wcn->tx_ack_skb = skb;
- spin_unlock_irqrestore(&wcn->dxe_lock, flags);
/* Only one at a time is supported by fw. Stop the TX queues
* until the ack status gets back.
@@ -644,10 +636,7 @@ int wcn36xx_start_tx(struct wcn36xx *wcn,
/* If the skb has not been transmitted,
* don't keep a reference to it.
*/
- spin_lock_irqsave(&wcn->dxe_lock, flags);
- wcn->tx_ack_skb = NULL;
- spin_unlock_irqrestore(&wcn->dxe_lock, flags);
-
+ wcn36xx_dxe_unset_tx_ack_skb(wcn);
ieee80211_wake_queues(wcn->hw);
}
--
2.33.0
A follow-on patch will reset the DXE block in dxe_deinit. Prepare the way
by first functionally decomposing the reset.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <[email protected]>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 30f4484b336c3..d4f5746d9b10a 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -919,12 +919,19 @@ void wcn36xx_dxe_unset_tx_ack_skb(struct wcn36xx *wcn)
spin_unlock_irqrestore(&wcn->dxe_lock, flags);
}
-int wcn36xx_dxe_init(struct wcn36xx *wcn)
+static void wcn36xx_dxe_reset(struct wcn36xx *wcn)
{
- int reg_data = 0, ret;
+ int reg_data = 0;
reg_data = WCN36XX_DXE_REG_RESET;
wcn36xx_dxe_write_register(wcn, WCN36XX_DXE_REG_CSR_RESET, reg_data);
+}
+
+int wcn36xx_dxe_init(struct wcn36xx *wcn)
+{
+ int reg_data = 0, ret;
+
+ wcn36xx_dxe_reset(wcn);
/* Select channels for rx avail and xfer done interrupts... */
reg_data = (WCN36XX_DXE_INT_CH3_MASK | WCN36XX_DXE_INT_CH1_MASK) << 16 |
--
2.33.0
When deiniting the DXE hardware we should reset the block to ensure there
is no spurious DMA write transaction from the downstream WCNSS to upstream
MSM at a skbuff address we will have released.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <[email protected]>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index d4f5746d9b10a..cf85b0cd11be4 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -1089,6 +1089,9 @@ void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
wcn->tx_ack_skb = NULL;
}
+ /* Put the DXE block into reset before freeing memory */
+ wcn36xx_dxe_reset(wcn);
+
wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_l_ch);
wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_h_ch);
--
2.33.0
Right now we have a broken sequence where we enable DMA channel interrupts
which can be left enabled and never disabled if we hit an error path.
Worse still when we unload the driver, the DMA channel interrupt bits are
left intact. About the only saving grace here is that we do remember to
disable the wcnss interrupt when unload the driver.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <[email protected]>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 38 ++++++++++++++++++--------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 6c43df4bc92c3..b832afedde76a 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -272,6 +272,21 @@ static int wcn36xx_dxe_enable_ch_int(struct wcn36xx *wcn, u16 wcn_ch)
return 0;
}
+static void wcn36xx_dxe_disable_ch_int(struct wcn36xx *wcn, u16 wcn_ch)
+{
+ int reg_data = 0;
+
+ wcn36xx_dxe_read_register(wcn,
+ WCN36XX_DXE_INT_MASK_REG,
+ ®_data);
+
+ reg_data &= ~wcn_ch;
+
+ wcn36xx_dxe_write_register(wcn,
+ WCN36XX_DXE_INT_MASK_REG,
+ (int)reg_data);
+}
+
static int wcn36xx_dxe_fill_skb(struct device *dev,
struct wcn36xx_dxe_ctl *ctl,
gfp_t gfp)
@@ -939,7 +954,6 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
WCN36XX_DXE_WQ_TX_L);
wcn36xx_dxe_read_register(wcn, WCN36XX_DXE_REG_CH_EN, ®_data);
- wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_L);
/***************************************/
/* Init descriptors for TX HIGH channel */
@@ -963,9 +977,6 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
wcn36xx_dxe_read_register(wcn, WCN36XX_DXE_REG_CH_EN, ®_data);
- /* Enable channel interrupts */
- wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_H);
-
/***************************************/
/* Init descriptors for RX LOW channel */
/***************************************/
@@ -975,7 +986,6 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
goto out_err_rxl_ch;
}
-
/* For RX we need to preallocated buffers */
wcn36xx_dxe_ch_alloc_skb(wcn, &wcn->dxe_rx_l_ch);
@@ -998,9 +1008,6 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
WCN36XX_DXE_REG_CTL_RX_L,
WCN36XX_DXE_CH_DEFAULT_CTL_RX_L);
- /* Enable channel interrupts */
- wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_L);
-
/***************************************/
/* Init descriptors for RX HIGH channel */
/***************************************/
@@ -1032,15 +1039,18 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
WCN36XX_DXE_REG_CTL_RX_H,
WCN36XX_DXE_CH_DEFAULT_CTL_RX_H);
- /* Enable channel interrupts */
- wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_H);
-
ret = wcn36xx_dxe_request_irqs(wcn);
if (ret < 0)
goto out_err_irq;
timer_setup(&wcn->tx_ack_timer, wcn36xx_dxe_tx_timer, 0);
+ /* Enable channel interrupts */
+ wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_L);
+ wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_H);
+ wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_L);
+ wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_H);
+
return 0;
out_err_irq:
@@ -1057,6 +1067,12 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
{
+ /* Disable channel interrupts */
+ wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_H);
+ wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_L);
+ wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_H);
+ wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_L);
+
free_irq(wcn->tx_irq, wcn);
free_irq(wcn->rx_irq, wcn);
del_timer(&wcn->tx_ack_timer);
--
2.33.0
On 10/18/2021 4:17 PM, Bryan O'Donoghue wrote:
> V2:
> - Functionally decomposes the DXE reset in an additional patch.
> Since we call this logic more than once, it should be in a function.
>
> - Leaves as-is the DXE reset write.
>
> Johannes Berg asked me if we are sure by the time the write to the reset
> register completes that DXE transactions will be suitably quiesced.
>
> The answer is:
> 1. I believe these writes are non-posted writes
> 2. Downstream doesn't poll for DXE reset completion
>
> So on #2 I have no real data for or against a polling operation, my tests
> indicate the reset indication in the register is atomic and as far as I
> can discern that also means DMA transactions are terminated.
>
> V1:
> Digging around through some bugs reported from an extensive testing cycle
> we've found that wcn36xx has a number of unexplained RX related oopses.
>
> In at least one case we appear to have DMA'd data to an unmapped region.
> The written data appears to be a correctly formed DMA buffer descriptor - a
> DXE BD in WCNSS parlance, with an AP beacon inside of it.
>
> Reasoning about how such a situation might come about and reviewing the
> run-time code, there was no obvious path where we might free a BD or an
> skbuff pointed to by a BD, which DXE might not be aware of.
>
> However looking at the ieee80211_ops.start and ieee80211_ops.stop callbacks
> in wcn36xx we can see a number of bugs associated with BD allocation, error
> handling and leaving the DMA engine active, despite freeing SKBs on the MSM
> side.
>
> This last mention - failure to quiesce potential DMA from the downstream
> agent - WCNSS DXE despite freeing the memory @ the skbuffs is a decent
> candidate for our unexplained upstream DMA transaction to unmapped memory.
>
> Since wcn36xx_stop and wcn36xx_start can be called a number of times by the
> controlling upper layers it means there is a potential gap between
> wcn36xx_stop and wcn36xx_start which could leave WCNSS in a state where it
> will try to DMA to memory we have freed.
>
> This series addresses the obvious bugs that jump out on the start()/stop()
> path.
>
> Patch #1
> In order to make it easier to read the DXE code, I've moved all of the
> lock taking and freeing for DXE into dxe.c
>
> Patch #2
> Fixes a very obviously broken channel enable/disable cycle
>
> Patch #3
> Fixes a very obvious memory leak with dma_alloc_coherent()
>
> Patch #4
> Makes sure before we release skbuffs which we assigned to the RX channels
> that we ensure the DXE block is put into reset
>
> Bryan O'Donoghue (5):
> wcn36xx: Fix dxe lock layering violation
> wcn36xx: Fix DMA channel enable/disable cycle
> wcn36xx: Release DMA channel descriptor allocations
> wcn36xx: Functionally decompose DXE reset
> wcn36xx: Put DXE block into reset before freeing memory
>
> drivers/net/wireless/ath/wcn36xx/dxe.c | 83 +++++++++++++++++++++----
> drivers/net/wireless/ath/wcn36xx/dxe.h | 2 +
> drivers/net/wireless/ath/wcn36xx/txrx.c | 15 +----
> 3 files changed, 74 insertions(+), 26 deletions(-)
>
Reviewed-by: Jeff Johnson <[email protected]>
Bryan O'Donoghue <[email protected]> wrote:
> We are looking at some interesting crashes with wcn36xx in the wild, with
> some of the data appearing to indicate multiple instances of "WARNING
> Spurious TX complete indication".
>
> Looking at the code here we see that txrx.c is taking the dxe.c lock to set
> and unset the current TX skbuff pointer.
>
> There is no obvious logical bug however, it is a layering violation to
> share locks like this.
>
> Lets tidy up the code a bit by making access functions to set and unset the
> TX sbuff. This makes it easier to reason about this code without having to
> switch between multiple files.
>
> Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
> Signed-off-by: Bryan O'Donoghue <[email protected]>
Failed to apply, please rebase on top of ath.git master branch. But
please wait few days as there are quite a few wcn36xx patches pending.
error: patch failed: drivers/net/wireless/ath/wcn36xx/dxe.c:878
error: drivers/net/wireless/ath/wcn36xx/dxe.c: patch does not apply
error: patch failed: drivers/net/wireless/ath/wcn36xx/dxe.h:468
error: drivers/net/wireless/ath/wcn36xx/dxe.h: patch does not apply
error: patch failed: drivers/net/wireless/ath/wcn36xx/txrx.c:584
error: drivers/net/wireless/ath/wcn36xx/txrx.c: patch does not apply
stg import: Diff does not apply cleanly
5 patches set to Changes Requested.
12568271 [v2,1/5] wcn36xx: Fix dxe lock layering violation
12568273 [v2,2/5] wcn36xx: Fix DMA channel enable/disable cycle
12568275 [v2,3/5] wcn36xx: Release DMA channel descriptor allocations
12568277 [v2,4/5] wcn36xx: Functionally decompose DXE reset
12568279 [v2,5/5] wcn36xx: Put DXE block into reset before freeing memory
--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches