2017-05-25 00:11:36

by Brian Norris

[permalink] [raw]
Subject: [PATCH 01/14] mwifiex: pcie: properly synchronize, disable interrupts in driver callbacks

It seems that the implicit assumption of the mwifiex
{enable,disable}_int() callbacks is that after ->disable_int(), all
interrupt handling should be complete (synchronized) and not fire again
until after ->enable_int(). Also, interrupts should not be serviced
until after the first ->enable_int().

However, the PCIe driver does none of this. First, the existing
interrupt mask programming appears to only have an effect for legacy
interrupts. It doesn't actually prevent MSI/MSI-X interrupts. Second,
even when it might mask interrupts, we're doing nothing to ensure that
pending IRQs have finished processing; they could be already in-flight
when a CPU masks them.

Another quirk of this driver's design is the use of a racy
"surprise_removed" check in mwifiex_pcie_interrupt(). This appears to
act like a racy poor-man's version of masking our interrupts -- it
allows us to short-circuit the ISR if it fires when we're not prepared
to handle more work.

We can resolve this all by:
(a) disabling our IRQs after requesting them
(b) call {enable,disable}_irq() in the {enable,disable}_int() callbacks
(c) remove the racy '->surprise_removed' hack from
mwifiex_pcie_interrupt()
(d) document the effect (or lack thereof) of PCIE_HOST_INT_MASK, to
clarify and possibly prevent future misuse

Along the way, I decided to use underscores to prefix the driver-private
forms of "disabling interrupts" (instead of the awkward "_noerr" suffix
used already), partly to discourage their use.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 70 ++++++++++++++++++++---------
1 file changed, 49 insertions(+), 21 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 394224d6c219..ea75315bf19d 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -505,12 +505,10 @@ static int mwifiex_pm_wakeup_card_complete(struct mwifiex_adapter *adapter)
}

/*
- * This function disables the host interrupt.
- *
- * The host interrupt mask is read, the disable bit is reset and
- * written back to the card host interrupt mask register.
+ * This function masks the host interrupt. Effective only for legacy PCI
+ * interrupts.
*/
-static int mwifiex_pcie_disable_host_int(struct mwifiex_adapter *adapter)
+static int __mwifiex_pcie_disable_host_int(struct mwifiex_adapter *adapter)
{
if (mwifiex_pcie_ok_to_access_hw(adapter)) {
if (mwifiex_write_reg(adapter, PCIE_HOST_INT_MASK,
@@ -525,18 +523,30 @@ static int mwifiex_pcie_disable_host_int(struct mwifiex_adapter *adapter)
return 0;
}

-static void mwifiex_pcie_disable_host_int_noerr(struct mwifiex_adapter *adapter)
+/*
+ * Disable interrupts, synchronizing with any outstanding interrupts.
+ */
+static void mwifiex_pcie_disable_host_int(struct mwifiex_adapter *adapter)
{
- WARN_ON(mwifiex_pcie_disable_host_int(adapter));
+ struct pcie_service_card *card = adapter->card;
+ int i;
+
+ WARN_ON(__mwifiex_pcie_disable_host_int(adapter));
+
+ if (card->msix_enable) {
+ for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++) {
+ disable_irq(card->msix_entries[i].vector);
+ }
+ } else {
+ disable_irq(card->dev->irq);
+ }
}

/*
- * This function enables the host interrupt.
- *
- * The host interrupt enable mask is written to the card
- * host interrupt mask register.
+ * This function unmasks the host interrupt. Effective only for legacy PCI
+ * interrupts.
*/
-static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
+static int __mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
{
if (mwifiex_pcie_ok_to_access_hw(adapter)) {
/* Simply write the mask to the register */
@@ -551,6 +561,26 @@ static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
return 0;
}

+static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
+{
+ struct pcie_service_card *card = adapter->card;
+ int i, ret;
+
+ ret = __mwifiex_pcie_enable_host_int(adapter);
+ if (ret)
+ return ret;
+
+ if (card->msix_enable) {
+ for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++) {
+ enable_irq(card->msix_entries[i].vector);
+ }
+ } else {
+ enable_irq(card->dev->irq);
+ }
+
+ return 0;
+}
+
/*
* This function initializes TX buffer ring descriptors
*/
@@ -1738,7 +1768,7 @@ static int mwifiex_pcie_process_cmd_complete(struct mwifiex_adapter *adapter)
while (reg->sleep_cookie && (count++ < 10) &&
mwifiex_pcie_ok_to_access_hw(adapter))
usleep_range(50, 60);
- mwifiex_pcie_enable_host_int(adapter);
+ __mwifiex_pcie_enable_host_int(adapter);
mwifiex_process_sleep_confirm_resp(adapter, skb->data,
skb->len);
} else {
@@ -2081,7 +2111,7 @@ static int mwifiex_prog_fw_w_helper(struct mwifiex_adapter *adapter,
"info: Downloading FW image (%d bytes)\n",
firmware_len);

- if (mwifiex_pcie_disable_host_int(adapter)) {
+ if (__mwifiex_pcie_disable_host_int(adapter)) {
mwifiex_dbg(adapter, ERROR,
"%s: Disabling interrupts failed.\n", __func__);
return -1;
@@ -2335,8 +2365,7 @@ static void mwifiex_interrupt_status(struct mwifiex_adapter *adapter,
if ((pcie_ireg == 0xFFFFFFFF) || !pcie_ireg)
return;

-
- mwifiex_pcie_disable_host_int(adapter);
+ __mwifiex_pcie_disable_host_int(adapter);

/* Clear the pending interrupts */
if (mwifiex_write_reg(adapter, PCIE_HOST_INT_STATUS,
@@ -2387,9 +2416,6 @@ static irqreturn_t mwifiex_pcie_interrupt(int irq, void *context)
}
adapter = card->adapter;

- if (adapter->surprise_removed)
- goto exit;
-
if (card->msix_enable)
mwifiex_interrupt_status(adapter, ctx->msg_id);
else
@@ -2494,7 +2520,7 @@ static int mwifiex_process_pcie_int(struct mwifiex_adapter *adapter)
"info: cmd_sent=%d data_sent=%d\n",
adapter->cmd_sent, adapter->data_sent);
if (!card->msi_enable && adapter->ps_state != PS_STATE_SLEEP)
- mwifiex_pcie_enable_host_int(adapter);
+ __mwifiex_pcie_enable_host_int(adapter);

return 0;
}
@@ -3055,6 +3081,7 @@ static int mwifiex_pcie_request_irq(struct mwifiex_adapter *adapter)
&card->msix_ctx[i]);
if (ret)
break;
+ disable_irq(card->msix_entries[i].vector);
}

if (ret) {
@@ -3087,6 +3114,7 @@ static int mwifiex_pcie_request_irq(struct mwifiex_adapter *adapter)
pr_err("request_irq failed: ret=%d\n", ret);
return -1;
}
+ disable_irq(pdev->irq);

return 0;
}
@@ -3244,7 +3272,7 @@ static struct mwifiex_if_ops pcie_ops = {
.register_dev = mwifiex_register_dev,
.unregister_dev = mwifiex_unregister_dev,
.enable_int = mwifiex_pcie_enable_host_int,
- .disable_int = mwifiex_pcie_disable_host_int_noerr,
+ .disable_int = mwifiex_pcie_disable_host_int,
.process_int_status = mwifiex_process_int_status,
.host_to_card = mwifiex_pcie_host_to_card,
.wakeup = mwifiex_pm_wakeup_card,
--
2.13.0.219.gdb65acc882-goog


2017-05-25 00:11:42

by Brian Norris

[permalink] [raw]
Subject: [PATCH 10/14] mwifiex: pcie: stop masking interrupts in FW downloader

The upper layers already have disabled our interrupts (haven't called
->enable_int() yet for probe(), and we call ->disable_int() before
trying to reset), so this is redundant.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 6 ------
1 file changed, 6 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 7d5a4f2a9a22..c86119b05f52 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -2113,12 +2113,6 @@ static int mwifiex_prog_fw_w_helper(struct mwifiex_adapter *adapter,
"info: Downloading FW image (%d bytes)\n",
firmware_len);

- if (__mwifiex_pcie_disable_host_int(adapter)) {
- mwifiex_dbg(adapter, ERROR,
- "%s: Disabling interrupts failed.\n", __func__);
- return -1;
- }
-
skb = dev_alloc_skb(MWIFIEX_UPLD_SIZE);
if (!skb) {
ret = -ENOMEM;
--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:36

by Brian Norris

[permalink] [raw]
Subject: [PATCH 02/14] mwifiex: reunite copy-and-pasted remove/reset code

When PCIe FLR code was added, it explicitly copy-and-pasted much of
mwifiex_remove_card() into mwifiex_shutdown_sw(). This is unnecessary,
as almost all of the code should be reused.

Let's reunite what we can for now.

The only functional changes for now:

* call netif_device_detach() in the remove() code path -- this wasn't
done before, but it really should be a no-op, when the device is
getting totally unregistered soon anyway

* call the ->down_dev() driver callback only after we've finished all
SW teardown -- this should have no significant effect, since the only
user (pcie.c) does very minimal work there, and it doesn't matter
that we reorder this

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/main.c | 104 ++++++++--------------------
1 file changed, 28 insertions(+), 76 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index dd87b9ff64c3..a1e98e36c1ce 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1348,26 +1348,12 @@ static void mwifiex_main_work_queue(struct work_struct *work)
mwifiex_main_process(adapter);
}

-/*
- * This function gets called during PCIe function level reset. Required
- * code is extracted from mwifiex_remove_card()
- */
-int
-mwifiex_shutdown_sw(struct mwifiex_adapter *adapter)
+/* Common teardown code used for both device removal and reset */
+static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter)
{
struct mwifiex_private *priv;
int i;

- if (!adapter)
- goto exit_return;
-
- wait_for_completion(adapter->fw_done);
- /* Caller should ensure we aren't suspending while this happens */
- reinit_completion(adapter->fw_done);
-
- priv = mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_ANY);
- mwifiex_deauthenticate(priv, NULL);
-
/* We can no longer handle interrupts once we start doing the teardown
* below.
*/
@@ -1389,12 +1375,9 @@ mwifiex_shutdown_sw(struct mwifiex_adapter *adapter)
}

mwifiex_dbg(adapter, CMD, "cmd: calling mwifiex_shutdown_drv...\n");
-
mwifiex_shutdown_drv(adapter);
- if (adapter->if_ops.down_dev)
- adapter->if_ops.down_dev(adapter);
-
mwifiex_dbg(adapter, CMD, "cmd: mwifiex_shutdown_drv done\n");
+
if (atomic_read(&adapter->rx_pending) ||
atomic_read(&adapter->tx_pending) ||
atomic_read(&adapter->cmd_pending)) {
@@ -1417,9 +1400,30 @@ mwifiex_shutdown_sw(struct mwifiex_adapter *adapter)
rtnl_unlock();
}
vfree(adapter->chan_stats);
+}
+
+/*
+ * This function gets called during PCIe function level reset.
+ */
+int mwifiex_shutdown_sw(struct mwifiex_adapter *adapter)
+{
+ struct mwifiex_private *priv;
+
+ if (!adapter)
+ return 0;
+
+ wait_for_completion(adapter->fw_done);
+ /* Caller should ensure we aren't suspending while this happens */
+ reinit_completion(adapter->fw_done);
+
+ priv = mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_ANY);
+ mwifiex_deauthenticate(priv, NULL);
+
+ mwifiex_uninit_sw(adapter);
+
+ if (adapter->if_ops.down_dev)
+ adapter->if_ops.down_dev(adapter);

- mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
-exit_return:
return 0;
}
EXPORT_SYMBOL_GPL(mwifiex_shutdown_sw);
@@ -1672,61 +1676,10 @@ EXPORT_SYMBOL_GPL(mwifiex_add_card);
*/
int mwifiex_remove_card(struct mwifiex_adapter *adapter)
{
- struct mwifiex_private *priv = NULL;
- int i;
-
if (!adapter)
- goto exit_remove;
-
- /* We can no longer handle interrupts once we start doing the teardown
- * below. */
- if (adapter->if_ops.disable_int)
- adapter->if_ops.disable_int(adapter);
-
- adapter->surprise_removed = true;
-
- mwifiex_terminate_workqueue(adapter);
-
- /* Stop data */
- for (i = 0; i < adapter->priv_num; i++) {
- priv = adapter->priv[i];
- if (priv && priv->netdev) {
- mwifiex_stop_net_dev_queue(priv->netdev, adapter);
- if (netif_carrier_ok(priv->netdev))
- netif_carrier_off(priv->netdev);
- }
- }
-
- mwifiex_dbg(adapter, CMD,
- "cmd: calling mwifiex_shutdown_drv...\n");
-
- mwifiex_shutdown_drv(adapter);
- mwifiex_dbg(adapter, CMD,
- "cmd: mwifiex_shutdown_drv done\n");
- if (atomic_read(&adapter->rx_pending) ||
- atomic_read(&adapter->tx_pending) ||
- atomic_read(&adapter->cmd_pending)) {
- mwifiex_dbg(adapter, ERROR,
- "rx_pending=%d, tx_pending=%d,\t"
- "cmd_pending=%d\n",
- atomic_read(&adapter->rx_pending),
- atomic_read(&adapter->tx_pending),
- atomic_read(&adapter->cmd_pending));
- }
-
- for (i = 0; i < adapter->priv_num; i++) {
- priv = adapter->priv[i];
-
- if (!priv)
- continue;
+ return 0;

- rtnl_lock();
- if (priv->netdev &&
- priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED)
- mwifiex_del_virtual_intf(adapter->wiphy, &priv->wdev);
- rtnl_unlock();
- }
- vfree(adapter->chan_stats);
+ mwifiex_uninit_sw(adapter);

wiphy_unregister(adapter->wiphy);
wiphy_free(adapter->wiphy);
@@ -1744,7 +1697,6 @@ int mwifiex_remove_card(struct mwifiex_adapter *adapter)
"info: free adapter\n");
mwifiex_free_adapter(adapter);

-exit_remove:
return 0;
}
EXPORT_SYMBOL_GPL(mwifiex_remove_card);
--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:42

by Brian Norris

[permalink] [raw]
Subject: [PATCH 03/14] mwifiex: reset interrupt status across device reset

When resetting the device, we might have queued up interrupts that
didn't get a chance to finish processing. We really don't need to handle
them at this point; we just want to make sure they don't cause us to try
to process old commands from before the device was reset.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/main.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index a1e98e36c1ce..3b7316b537ea 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1362,6 +1362,7 @@ static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter)

adapter->surprise_removed = true;
mwifiex_terminate_workqueue(adapter);
+ adapter->int_status = 0;

/* Stop data */
for (i = 0; i < adapter->priv_num; i++) {
--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:38

by Brian Norris

[permalink] [raw]
Subject: [PATCH 04/14] mwifiex: pcie: don't allow cmd buffer reuse after reset

In rogue cases (due to other bugs) it's possible we try to process an
old command response *after* resetting the device. This could trigger a
double-free (or the SKB can get reallocated elsewhere...causing other
memory corruptions) in mwifiex_pcie_process_cmd_complete().

For safety (and symmetry) let's always NULL out the command buffer as we
free it up. We're already doing this for the command response buffer.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index ea75315bf19d..3d5c29d79609 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -1060,12 +1060,14 @@ static int mwifiex_pcie_delete_cmdrsp_buf(struct mwifiex_adapter *adapter)
mwifiex_unmap_pci_memory(adapter, card->cmdrsp_buf,
PCI_DMA_FROMDEVICE);
dev_kfree_skb_any(card->cmdrsp_buf);
+ card->cmdrsp_buf = NULL;
}

if (card && card->cmd_buf) {
mwifiex_unmap_pci_memory(adapter, card->cmd_buf,
PCI_DMA_TODEVICE);
dev_kfree_skb_any(card->cmd_buf);
+ card->cmd_buf = NULL;
}
return 0;
}
@@ -2946,7 +2948,6 @@ static void mwifiex_pcie_free_buffers(struct mwifiex_adapter *adapter)
mwifiex_pcie_delete_evtbd_ring(adapter);
mwifiex_pcie_delete_rxbd_ring(adapter);
mwifiex_pcie_delete_txbd_ring(adapter);
- card->cmdrsp_buf = NULL;
}

/*
--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:40

by Brian Norris

[permalink] [raw]
Subject: [PATCH 07/14] mwifiex: fixup init_channel_scan_gap error case

In reading through _mwifiex_fw_dpc(), I noticed that after we've
registered our wiphy, we still have error paths that don't free it back
up. Let's do that.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index be3badba028a..c6cdbc311471 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -584,7 +584,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
if (mwifiex_init_channel_scan_gap(adapter)) {
mwifiex_dbg(adapter, ERROR,
"could not init channel stats table\n");
- goto err_init_fw;
+ goto err_init_chan_scan;
}

if (driver_mode) {
@@ -632,6 +632,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)

err_add_intf:
vfree(adapter->chan_stats);
+err_init_chan_scan:
wiphy_unregister(adapter->wiphy);
wiphy_free(adapter->wiphy);
err_init_fw:
--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:42

by Brian Norris

[permalink] [raw]
Subject: [PATCH 09/14] mwifiex: pcie: remove redundant synchronize_irq()

free_irq() already calls synchronize_irq() in a non-racy manner. Calling
synchronize_irq() here is redundant.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 3d5c29d79609..7d5a4f2a9a22 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -3209,9 +3209,6 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)

if (card->msix_enable) {
for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
- synchronize_irq(card->msix_entries[i].vector);
-
- for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
free_irq(card->msix_entries[i].vector,
&card->msix_ctx[i]);

--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:41

by Brian Norris

[permalink] [raw]
Subject: [PATCH 08/14] mwifiex: ensure "disable auto DS" struct is initialized

The .idle_time field *should* be unused, but technically, we're allowing
unitialized stack garbage to pass all the way through to the firmware
host command. Let's zero it out instead.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/sta_ioctl.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
index 42997e05d90f..43ecd621d1ef 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
@@ -654,9 +654,9 @@ int mwifiex_get_bss_info(struct mwifiex_private *priv,
*/
int mwifiex_disable_auto_ds(struct mwifiex_private *priv)
{
- struct mwifiex_ds_auto_ds auto_ds;
-
- auto_ds.auto_ds = DEEP_SLEEP_OFF;
+ struct mwifiex_ds_auto_ds auto_ds = {
+ .auto_ds = DEEP_SLEEP_OFF,
+ };

return mwifiex_send_cmd(priv, HostCmd_CMD_802_11_PS_MODE_ENH,
DIS_AUTO_PS, BITMAP_AUTO_DS, &auto_ds, true);
--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:53

by Brian Norris

[permalink] [raw]
Subject: [PATCH 06/14] mwifiex: don't short-circuit netdev notifiers on interface deletion

When we leave the delete interface function, there are still netdev
hooks that might try to process the device. We're short-circuiting some
of that by changing the interface type and clearing ieee80211_ptr. This
means we skip NETDEV_UNREGISTER_FINAL in cfg80211. Fortunately, that is
currently a no-op.

We don't need most of the cleanup here anyway:

* the connection state will get (un)set as part of the disconnect
process (which cfg80211 already initiates for us)
* the interface type doesn't actually need to be cleared at all (it'll
trigger a WARN_ON() in cfg80211 if we do)
* the iee80211_ptr isn't really "ours" to clear anyway

So stop resetting those 3 things.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index 025bc06a19d6..c3a25b7f2b48 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -3129,11 +3129,7 @@ int mwifiex_del_virtual_intf(struct wiphy *wiphy, struct wireless_dev *wdev)
priv->dfs_chan_sw_workqueue = NULL;
}
/* Clear the priv in adapter */
- priv->netdev->ieee80211_ptr = NULL;
priv->netdev = NULL;
- priv->wdev.iftype = NL80211_IFTYPE_UNSPECIFIED;
-
- priv->media_connected = false;

switch (priv->bss_mode) {
case NL80211_IFTYPE_UNSPECIFIED:
--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:44

by Brian Norris

[permalink] [raw]
Subject: [PATCH 12/14] mwifiex: don't open-code ARRAY_SIZE()

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/cfp.c | 4 +---
drivers/net/wireless/marvell/mwifiex/sta_cmd.c | 8 ++------
drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c | 5 ++---
3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/cfp.c b/drivers/net/wireless/marvell/mwifiex/cfp.c
index 1ff22055e54f..cb5027a0195b 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfp.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfp.c
@@ -180,11 +180,9 @@ static struct region_code_mapping region_code_mapping_t[] = {
u8 *mwifiex_11d_code_2_region(u8 code)
{
u8 i;
- u8 size = sizeof(region_code_mapping_t)/
- sizeof(struct region_code_mapping);

/* Look for code in mapping table */
- for (i = 0; i < size; i++)
+ for (i = 0; i < ARRAY_SIZE(region_code_mapping_t); i++)
if (region_code_mapping_t[i].code == code)
return region_code_mapping_t[i].region;

diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmd.c b/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
index 83916c1439af..21333e66ffdc 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
@@ -189,9 +189,7 @@ static int mwifiex_cmd_tx_rate_cfg(struct mwifiex_private *priv,
if (pbitmap_rates != NULL) {
rate_scope->hr_dsss_rate_bitmap = cpu_to_le16(pbitmap_rates[0]);
rate_scope->ofdm_rate_bitmap = cpu_to_le16(pbitmap_rates[1]);
- for (i = 0;
- i < sizeof(rate_scope->ht_mcs_rate_bitmap) / sizeof(u16);
- i++)
+ for (i = 0; i < ARRAY_SIZE(rate_scope->ht_mcs_rate_bitmap); i++)
rate_scope->ht_mcs_rate_bitmap[i] =
cpu_to_le16(pbitmap_rates[2 + i]);
if (priv->adapter->fw_api_ver == MWIFIEX_FW_V15) {
@@ -206,9 +204,7 @@ static int mwifiex_cmd_tx_rate_cfg(struct mwifiex_private *priv,
cpu_to_le16(priv->bitmap_rates[0]);
rate_scope->ofdm_rate_bitmap =
cpu_to_le16(priv->bitmap_rates[1]);
- for (i = 0;
- i < sizeof(rate_scope->ht_mcs_rate_bitmap) / sizeof(u16);
- i++)
+ for (i = 0; i < ARRAY_SIZE(rate_scope->ht_mcs_rate_bitmap); i++)
rate_scope->ht_mcs_rate_bitmap[i] =
cpu_to_le16(priv->bitmap_rates[2 + i]);
if (priv->adapter->fw_api_ver == MWIFIEX_FW_V15) {
diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
index f1d1f56fc23f..97ac10ea4242 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
@@ -298,9 +298,8 @@ static int mwifiex_ret_tx_rate_cfg(struct mwifiex_private *priv,
priv->bitmap_rates[1] =
le16_to_cpu(rate_scope->ofdm_rate_bitmap);
for (i = 0;
- i <
- sizeof(rate_scope->ht_mcs_rate_bitmap) /
- sizeof(u16); i++)
+ i < ARRAY_SIZE(rate_scope->ht_mcs_rate_bitmap);
+ i++)
priv->bitmap_rates[2 + i] =
le16_to_cpu(rate_scope->
ht_mcs_rate_bitmap[i]);
--
2.13.0.219.gdb65acc882-goog

2017-05-31 17:11:03

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 01/14] mwifiex: pcie: properly synchronize, disable interrupts in driver callbacks

By the way, this had a few review comments elsewhere, which I'll
summarize here, since I plan to resubmit a new version sometime.

On Wed, May 24, 2017 at 05:11:06PM -0700, Brian Norris wrote:
> It seems that the implicit assumption of the mwifiex
> {enable,disable}_int() callbacks is that after ->disable_int(), all
> interrupt handling should be complete (synchronized) and not fire again
> until after ->enable_int(). Also, interrupts should not be serviced
> until after the first ->enable_int().
>
> However, the PCIe driver does none of this. First, the existing
> interrupt mask programming appears to only have an effect for legacy
> interrupts. It doesn't actually prevent MSI/MSI-X interrupts. Second,
> even when it might mask interrupts, we're doing nothing to ensure that
> pending IRQs have finished processing; they could be already in-flight
> when a CPU masks them.
>
> Another quirk of this driver's design is the use of a racy
> "surprise_removed" check in mwifiex_pcie_interrupt(). This appears to
> act like a racy poor-man's version of masking our interrupts -- it
> allows us to short-circuit the ISR if it fires when we're not prepared
> to handle more work.
>
> We can resolve this all by:
> (a) disabling our IRQs after requesting them
> (b) call {enable,disable}_irq() in the {enable,disable}_int() callbacks
> (c) remove the racy '->surprise_removed' hack from
> mwifiex_pcie_interrupt()
> (d) document the effect (or lack thereof) of PCIE_HOST_INT_MASK, to
> clarify and possibly prevent future misuse
>
> Along the way, I decided to use underscores to prefix the driver-private
> forms of "disabling interrupts" (instead of the awkward "_noerr" suffix
> used already), partly to discourage their use.
>
> Signed-off-by: Brian Norris <[email protected]>
> ---
> drivers/net/wireless/marvell/mwifiex/pcie.c | 70 ++++++++++++++++++++---------
> 1 file changed, 49 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index 394224d6c219..ea75315bf19d 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -505,12 +505,10 @@ static int mwifiex_pm_wakeup_card_complete(struct mwifiex_adapter *adapter)
> }
>
> /*
> - * This function disables the host interrupt.
> - *
> - * The host interrupt mask is read, the disable bit is reset and
> - * written back to the card host interrupt mask register.
> + * This function masks the host interrupt. Effective only for legacy PCI
> + * interrupts.
> */
> -static int mwifiex_pcie_disable_host_int(struct mwifiex_adapter *adapter)
> +static int __mwifiex_pcie_disable_host_int(struct mwifiex_adapter *adapter)
> {
> if (mwifiex_pcie_ok_to_access_hw(adapter)) {
> if (mwifiex_write_reg(adapter, PCIE_HOST_INT_MASK,
> @@ -525,18 +523,30 @@ static int mwifiex_pcie_disable_host_int(struct mwifiex_adapter *adapter)
> return 0;
> }
>
> -static void mwifiex_pcie_disable_host_int_noerr(struct mwifiex_adapter *adapter)
> +/*
> + * Disable interrupts, synchronizing with any outstanding interrupts.
> + */
> +static void mwifiex_pcie_disable_host_int(struct mwifiex_adapter *adapter)
> {
> - WARN_ON(mwifiex_pcie_disable_host_int(adapter));
> + struct pcie_service_card *card = adapter->card;
> + int i;
> +
> + WARN_ON(__mwifiex_pcie_disable_host_int(adapter));
> +
> + if (card->msix_enable) {
> + for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++) {
> + disable_irq(card->msix_entries[i].vector);
> + }
> + } else {
> + disable_irq(card->dev->irq);

This approach is not safe for the non-MSI-X case, since we actually
requested this IRQ with IRQF_SHARED. That's likely mostly for the legacy
PCI interrupt case (where we *have* to support shared interrupts) and
could probably be modified, but at any rate, this is unsafe as written.

Also, I've fielded objections to using the host-level IRQ masking for
disabling MSI interrupts here. I'm still not completely sure *why* the
objection, but I'm investigating whether there's any device-level
mechanism for disabling MSI interrupts on the Wif card. (Marvell folks,
feel free to speak up here.)

> + }
> }
>
> /*
> - * This function enables the host interrupt.
> - *
> - * The host interrupt enable mask is written to the card
> - * host interrupt mask register.
> + * This function unmasks the host interrupt. Effective only for legacy PCI
> + * interrupts.
> */
> -static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
> +static int __mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
> {
> if (mwifiex_pcie_ok_to_access_hw(adapter)) {
> /* Simply write the mask to the register */
> @@ -551,6 +561,26 @@ static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
> return 0;
> }
>
> +static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
> +{
> + struct pcie_service_card *card = adapter->card;
> + int i, ret;
> +
> + ret = __mwifiex_pcie_enable_host_int(adapter);
> + if (ret)
> + return ret;
> +
> + if (card->msix_enable) {
> + for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++) {
> + enable_irq(card->msix_entries[i].vector);
> + }
> + } else {
> + enable_irq(card->dev->irq);
> + }
> +
> + return 0;
> +}
> +
> /*
> * This function initializes TX buffer ring descriptors
> */
> @@ -1738,7 +1768,7 @@ static int mwifiex_pcie_process_cmd_complete(struct mwifiex_adapter *adapter)
> while (reg->sleep_cookie && (count++ < 10) &&
> mwifiex_pcie_ok_to_access_hw(adapter))
> usleep_range(50, 60);
> - mwifiex_pcie_enable_host_int(adapter);
> + __mwifiex_pcie_enable_host_int(adapter);
> mwifiex_process_sleep_confirm_resp(adapter, skb->data,
> skb->len);
> } else {
> @@ -2081,7 +2111,7 @@ static int mwifiex_prog_fw_w_helper(struct mwifiex_adapter *adapter,
> "info: Downloading FW image (%d bytes)\n",
> firmware_len);
>
> - if (mwifiex_pcie_disable_host_int(adapter)) {
> + if (__mwifiex_pcie_disable_host_int(adapter)) {
> mwifiex_dbg(adapter, ERROR,
> "%s: Disabling interrupts failed.\n", __func__);
> return -1;
> @@ -2335,8 +2365,7 @@ static void mwifiex_interrupt_status(struct mwifiex_adapter *adapter,
> if ((pcie_ireg == 0xFFFFFFFF) || !pcie_ireg)
> return;
>
> -
> - mwifiex_pcie_disable_host_int(adapter);
> + __mwifiex_pcie_disable_host_int(adapter);
>
> /* Clear the pending interrupts */
> if (mwifiex_write_reg(adapter, PCIE_HOST_INT_STATUS,
> @@ -2387,9 +2416,6 @@ static irqreturn_t mwifiex_pcie_interrupt(int irq, void *context)
> }
> adapter = card->adapter;
>
> - if (adapter->surprise_removed)
> - goto exit;
> -
> if (card->msix_enable)
> mwifiex_interrupt_status(adapter, ctx->msg_id);
> else
> @@ -2494,7 +2520,7 @@ static int mwifiex_process_pcie_int(struct mwifiex_adapter *adapter)
> "info: cmd_sent=%d data_sent=%d\n",
> adapter->cmd_sent, adapter->data_sent);
> if (!card->msi_enable && adapter->ps_state != PS_STATE_SLEEP)
> - mwifiex_pcie_enable_host_int(adapter);
> + __mwifiex_pcie_enable_host_int(adapter);
>
> return 0;
> }
> @@ -3055,6 +3081,7 @@ static int mwifiex_pcie_request_irq(struct mwifiex_adapter *adapter)
> &card->msix_ctx[i]);
> if (ret)
> break;
> + disable_irq(card->msix_entries[i].vector);

Also, if we're really dealing with spurious interrupts at init time,
then this leaves a window of time in between request_irq() and this
disable_irq() in which we could still receive a bad IRQ. So this should
be reworked to do one of:
(a) move the request_irq() later, until we're really able to handle
interrupts
(b) set the IRQ_NOAUTOEN flag (for the non-shared case), to avoid
enabling IRQs initially
(c) use some sort of (yet-unknown) device-level mask for MSI interrupts.

I'm looking to address these problems in a v2. Many of the other patches
are likely independent. I'll plan to resubmit them in the next series
(if they aren't applied before then), to avoid conflicts with those that
aren't independent, and because I intentionally put bugfixes (like this
patch) first in the series.

Brian

> }
>
> if (ret) {
> @@ -3087,6 +3114,7 @@ static int mwifiex_pcie_request_irq(struct mwifiex_adapter *adapter)
> pr_err("request_irq failed: ret=%d\n", ret);
> return -1;
> }
> + disable_irq(pdev->irq);
>
> return 0;
> }
> @@ -3244,7 +3272,7 @@ static struct mwifiex_if_ops pcie_ops = {
> .register_dev = mwifiex_register_dev,
> .unregister_dev = mwifiex_unregister_dev,
> .enable_int = mwifiex_pcie_enable_host_int,
> - .disable_int = mwifiex_pcie_disable_host_int_noerr,
> + .disable_int = mwifiex_pcie_disable_host_int,
> .process_int_status = mwifiex_process_int_status,
> .host_to_card = mwifiex_pcie_host_to_card,
> .wakeup = mwifiex_pm_wakeup_card,
> --
> 2.13.0.219.gdb65acc882-goog
>

2017-05-25 00:11:43

by Brian Norris

[permalink] [raw]
Subject: [PATCH 11/14] mwifiex: utilize netif_tx_{wake,stop}_all_queues()

We're open-coding these. Just use the helpers.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/init.c | 20 ++------------------
1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c
index 80bdf1c5f77f..d3b8ca402d08 100644
--- a/drivers/net/wireless/marvell/mwifiex/init.c
+++ b/drivers/net/wireless/marvell/mwifiex/init.c
@@ -332,17 +332,9 @@ void mwifiex_wake_up_net_dev_queue(struct net_device *netdev,
struct mwifiex_adapter *adapter)
{
unsigned long dev_queue_flags;
- unsigned int i;

spin_lock_irqsave(&adapter->queue_lock, dev_queue_flags);
-
- for (i = 0; i < netdev->num_tx_queues; i++) {
- struct netdev_queue *txq = netdev_get_tx_queue(netdev, i);
-
- if (netif_tx_queue_stopped(txq))
- netif_tx_wake_queue(txq);
- }
-
+ netif_tx_wake_all_queues(netdev);
spin_unlock_irqrestore(&adapter->queue_lock, dev_queue_flags);
}

@@ -353,17 +345,9 @@ void mwifiex_stop_net_dev_queue(struct net_device *netdev,
struct mwifiex_adapter *adapter)
{
unsigned long dev_queue_flags;
- unsigned int i;

spin_lock_irqsave(&adapter->queue_lock, dev_queue_flags);
-
- for (i = 0; i < netdev->num_tx_queues; i++) {
- struct netdev_queue *txq = netdev_get_tx_queue(netdev, i);
-
- if (!netif_tx_queue_stopped(txq))
- netif_tx_stop_queue(txq);
- }
-
+ netif_tx_stop_all_queues(netdev);
spin_unlock_irqrestore(&adapter->queue_lock, dev_queue_flags);
}

--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:39

by Brian Norris

[permalink] [raw]
Subject: [PATCH 05/14] mwifiex: re-register wiphy across reset

In general, it's helpful to use the same code for device removal as for
device reset, as this tends to have fewer bugs. Let's move the wiphy
unregistration code into the common reset and removal code.

In particular, it's very hard to properly handle the reset sequence when
something fails. Currently, if mwifiex_reinit_sw() fails, we've failed
to unregister the associated wiphy, and so running something as simple
as "iw phy" can trigger an OOPS, as the wiphy still has hooks back into
freed mwifiex data structures. For example, KASAN complained:

[... see reset fail for other reasons ...]
[ 1184.821158] mwifiex_pcie 0000:01:00.0: info: dnld wifi firmware from 174948 bytes
[ 1186.870914] mwifiex_pcie 0000:01:00.0: info: FW download over, size 608396 bytes
[ 1187.685990] mwifiex_pcie 0000:01:00.0: WLAN FW is active
[ 1187.692673] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -512
[ 1187.699075] mwifiex_pcie 0000:01:00.0: info: _mwifiex_fw_dpc: unregister device
[ 1187.713476] mwifiex: Failed to bring up adapter: -5
[ 1187.718644] mwifiex_pcie 0000:01:00.0: reinit failed: -5

[... run `iw phy` ...]
[ 1212.902419] ==================================================================
[ 1212.909806] BUG: KASAN: use-after-free in mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex] at addr ffffffc0ad1a8028
[ 1212.920246] Read of size 1 by task iw/3127
[...]
[ 1212.934946] page dumped because: kasan: bad access detected
[...]
[ 1212.950665] Call trace:
[ 1212.953148] [<ffffffc00020a69c>] dump_backtrace+0x0/0x190
[ 1212.958572] [<ffffffc00020a96c>] show_stack+0x20/0x28
[ 1212.963648] [<ffffffc0005ce18c>] dump_stack+0xa4/0xcc
[ 1212.968723] [<ffffffc0003c4430>] kasan_report+0x378/0x500
[ 1212.974140] [<ffffffc0003c3358>] __asan_load1+0x44/0x4c
[ 1212.979462] [<ffffffbffc2e8360>] mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex]
[ 1212.987131] [<ffffffbffc084fc4>] nl80211_send_wiphy+0x75c/0x2de0 [cfg80211]
[ 1212.994246] [<ffffffbffc094f60>] nl80211_dump_wiphy+0x32c/0x438 [cfg80211]
[ 1213.001149] [<ffffffc000ab6404>] genl_lock_dumpit+0x48/0x64
[ 1213.006746] [<ffffffc000ab3474>] netlink_dump+0x178/0x398
[ 1213.012171] [<ffffffc000ab3d18>] __netlink_dump_start+0x1bc/0x260
[...]

This all goes away if we just tear down the wiphy on the way down, and
set it back up if/when we bring the device back up.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/main.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index 3b7316b537ea..be3badba028a 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1401,6 +1401,10 @@ static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter)
rtnl_unlock();
}
vfree(adapter->chan_stats);
+
+ wiphy_unregister(adapter->wiphy);
+ wiphy_free(adapter->wiphy);
+ adapter->wiphy = NULL;
}

/*
@@ -1682,9 +1686,6 @@ int mwifiex_remove_card(struct mwifiex_adapter *adapter)

mwifiex_uninit_sw(adapter);

- wiphy_unregister(adapter->wiphy);
- wiphy_free(adapter->wiphy);
-
if (adapter->irq_wakeup >= 0)
device_init_wakeup(adapter->dev, false);

--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:45

by Brian Norris

[permalink] [raw]
Subject: [PATCH 13/14] mwifiex: drop 'add_tail' param from mwifiex_insert_cmd_to_pending_q()

It's always called with 'true' -- we only determine it 'false' locally
within this function. So drop the parameter.

Also, this should be 'bool' (since we use true/false), not 'u32'.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/cmdevt.c | 5 +++--
drivers/net/wireless/marvell/mwifiex/main.h | 3 +--
drivers/net/wireless/marvell/mwifiex/scan.c | 5 ++---
3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/cmdevt.c b/drivers/net/wireless/marvell/mwifiex/cmdevt.c
index 95221306a4e5..2d94104fe7ea 100644
--- a/drivers/net/wireless/marvell/mwifiex/cmdevt.c
+++ b/drivers/net/wireless/marvell/mwifiex/cmdevt.c
@@ -667,7 +667,7 @@ int mwifiex_send_cmd(struct mwifiex_private *priv, u16 cmd_no,
cmd_no == HostCmd_CMD_802_11_SCAN_EXT) {
mwifiex_queue_scan_cmd(priv, cmd_node);
} else {
- mwifiex_insert_cmd_to_pending_q(adapter, cmd_node, true);
+ mwifiex_insert_cmd_to_pending_q(adapter, cmd_node);
queue_work(adapter->workqueue, &adapter->main_work);
if (cmd_node->wait_q_enabled)
ret = mwifiex_wait_queue_complete(adapter, cmd_node);
@@ -685,11 +685,12 @@ int mwifiex_send_cmd(struct mwifiex_private *priv, u16 cmd_no,
*/
void
mwifiex_insert_cmd_to_pending_q(struct mwifiex_adapter *adapter,
- struct cmd_ctrl_node *cmd_node, u32 add_tail)
+ struct cmd_ctrl_node *cmd_node)
{
struct host_cmd_ds_command *host_cmd = NULL;
u16 command;
unsigned long flags;
+ bool add_tail = true;

host_cmd = (struct host_cmd_ds_command *) (cmd_node->cmd_skb->data);
if (!host_cmd) {
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index c1d96c64af74..96d68ca9394f 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -1071,8 +1071,7 @@ void mwifiex_recycle_cmd_node(struct mwifiex_adapter *adapter,
struct cmd_ctrl_node *cmd_node);

void mwifiex_insert_cmd_to_pending_q(struct mwifiex_adapter *adapter,
- struct cmd_ctrl_node *cmd_node,
- u32 addtail);
+ struct cmd_ctrl_node *cmd_node);

int mwifiex_exec_next_cmd(struct mwifiex_adapter *adapter);
int mwifiex_process_cmdresp(struct mwifiex_adapter *adapter);
diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c b/drivers/net/wireless/marvell/mwifiex/scan.c
index ce6936d0c5c0..2ed6a22d5247 100644
--- a/drivers/net/wireless/marvell/mwifiex/scan.c
+++ b/drivers/net/wireless/marvell/mwifiex/scan.c
@@ -1534,8 +1534,7 @@ int mwifiex_scan_networks(struct mwifiex_private *priv,
list_del(&cmd_node->list);
spin_unlock_irqrestore(&adapter->scan_pending_q_lock,
flags);
- mwifiex_insert_cmd_to_pending_q(adapter, cmd_node,
- true);
+ mwifiex_insert_cmd_to_pending_q(adapter, cmd_node);
queue_work(adapter->workqueue, &adapter->main_work);

/* Perform internal scan synchronously */
@@ -2033,7 +2032,7 @@ static void mwifiex_check_next_scan_command(struct mwifiex_private *priv)
struct cmd_ctrl_node, list);
list_del(&cmd_node->list);
spin_unlock_irqrestore(&adapter->scan_pending_q_lock, flags);
- mwifiex_insert_cmd_to_pending_q(adapter, cmd_node, true);
+ mwifiex_insert_cmd_to_pending_q(adapter, cmd_node);
}

return;
--
2.13.0.219.gdb65acc882-goog

2017-05-25 00:11:51

by Brian Norris

[permalink] [raw]
Subject: [PATCH 14/14] mwifiex: pcie: fix whitespace

This keeps annoying me.

Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index c86119b05f52..4f1d946ea460 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -3211,7 +3211,7 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
} else {
mwifiex_dbg(adapter, INFO,
"%s(): calling free_irq()\n", __func__);
- free_irq(card->dev->irq, &card->share_irq_ctx);
+ free_irq(card->dev->irq, &card->share_irq_ctx);

if (card->msi_enable)
pci_disable_msi(pdev);
--
2.13.0.219.gdb65acc882-goog

2017-06-27 20:49:03

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

On Thu, Jun 22, 2017 at 03:02:34PM +0200, Johannes Berg wrote:
> On Wed, 2017-06-21 at 11:27 -0700, Brian Norris wrote:

> > > Without checking the code now, it seems entirely plausible that
> > > this is
> > > holding some lock that would lock out the control path entirely,
> > > for
> > > the duration until the wiphy is actually unregistered?
> > >
> > > Actually, you can't unregister with the relevant locks held
> > > (without
> > > causing deadlocks), so perhaps it's marking the wiphy as
> > > unavailable so
> > > that all operations fail?
> >
> > One of the above two sounds along the right line. But it's something
> > I couldn't really figure out how to do quite right.
> >
> > Dumb question: how would I mark the wiphy as unavailable? Is there
> > something I can do at the cfg80211 level? Or would I really have to
> > guard all the cfg80211 entry points into mwifiex with a flag or lock?
>
> There isn't really a good way to do this. You can, of course, call
> wiphy_unregister(), but if you could do that you'd already have the
> problem solved, I think?

That's probably along the right track. There are still some things we'd
need to do properly before that though, and this is where all the
problems are so far. (Also, this is what Kalle was already objecting to;
he didn't think we should be unregistering/recreating the wiphy, but I
think he ended up softening on that a bit.)

For one, I still expect I should be removing the wireless dev's before
unregistering the wihpy, no? Otherwise, there will be existing wdevs
backed by an unregistered wiphy?

And that gets to the heart of another bug: deleting interfaces (e.g.,
"iw dev foo del") races with a lot of stuff -- like see

mwifiex_process_sta_event() ->
EVENT_EXT_SCAN_REPORT ->
netif_running(priv->netdev)

Because mwifiex_del_virtual_intf() doesn't stop any outstanding
commands, we can be both deleting the netdev and processing scans for
it.

> I'm not really familiar enough with the context this happens in - can't
> you let all the operations that try to talk to the firmware fail
> (because the firmware is dead, or whatever) and then call
> wiphy_unregister()?

Yes, something like that, barring some of the other bugs mentioned.

> > Also, IIUC, we need to wait for all control paths to complete (or
> > cancel) before we can free up the associated resources; so just
> > marking "unavailable" isn't enough.
>
> Yeah, I suppose so. Though if you just do all the freeing after
> wiphy_unregister() it'll do that for you?

Yes, I think so. Then part of the problem is probably that some of the
current "cancel command" logic is tied up with the "free command
structures" logic. So we're freeing some stuff too early.

Anyway, those sorts of bugs aside, IIUC the full sequence for teardown
should probably be something like:

1. Stop TX queues
2. Cancel outstanding commands (let them fail or finish, etc.) -- but
DON'T free their backing resources yet
3. Remove wdevs
4. wiphy_unregister()
5. Free up resources

Current problems are at least:

* we don't do step 4 in the right place (if at all; see this patch)
* step 2 mixes in "free"ing resources too early

Brian

2017-06-28 07:28:54

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

On Tue, 2017-06-27 at 13:48 -0700, Brian Norris wrote:

> > There isn't really a good way to do this. You can, of course, call
> > wiphy_unregister(), but if you could do that you'd already have the
> > problem solved, I think?
>
> That's probably along the right track. There are still some things
> we'd need to do properly before that though, and this is where all
> the problems are so far. (Also, this is what Kalle was already
> objecting to; he didn't think we should be unregistering/recreating
> the wiphy, but I think he ended up softening on that a bit.)
>
> For one, I still expect I should be removing the wireless dev's
> before unregistering the wihpy, no? Otherwise, there will be existing
> wdevs backed by an unregistered wiphy?

Yeah, that's true - though once you get rid of those they can't be
accessed any more.

> And that gets to the heart of another bug: deleting interfaces (e.g.,
> "iw dev foo del") races with a lot of stuff -- like see
>
> mwifiex_process_sta_event() ->
>   EVENT_EXT_SCAN_REPORT ->
>     netif_running(priv->netdev)
>
> Because mwifiex_del_virtual_intf() doesn't stop any outstanding
> commands, we can be both deleting the netdev and processing scans for
> it.

Huh, well, I guess you need some kind of locking here anyway, since the
user can always do things like deleting the interface while a scan is
running?

> > > Also, IIUC, we need to wait for all control paths to complete (or
> > > cancel) before we can free up the associated resources; so just
> > > marking "unavailable" isn't enough.
> >
> > Yeah, I suppose so. Though if you just do all the freeing after
> > wiphy_unregister() it'll do that for you?
>
> Yes, I think so. Then part of the problem is probably that some of
> the current "cancel command" logic is tied up with the "free command
> structures" logic. So we're freeing some stuff too early.
>
> Anyway, those sorts of bugs aside, IIUC the full sequence for
> teardown should probably be something like:
>
> 1. Stop TX queues
> 2. Cancel outstanding commands (let them fail or finish, etc.) -- but
>    DON'T free their backing resources yet
> 3. Remove wdevs
> 4. wiphy_unregister()
> 5. Free up resources
>
> Current problems are at least:
>
> * we don't do step 4 in the right place (if at all; see this patch)
> * step 2 mixes in "free"ing resources too early

So I'm not sure what you mean by splitting in 2/5 - this seems
reasonable, but I don't understand why something like a scan request
wouldn't be freed while you cancel it in 2? In fact, you really have to
free it before you remove the corresponding wdev, or cfg80211 will
complain?

johannes

2017-06-22 13:02:49

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

On Wed, 2017-06-21 at 11:27 -0700, Brian Norris wrote:
>
> > I'm not sure what you mean by "we need to atually stop all the
> > virtual interfaces ([...]) first".
>
> Judging by your following comments, I may have been completely
> mistaken.
> (But that's why I asked you folks!)

:)

> > There are essentially only two/three ways to reach this - data
> > path,
> > which is getting stopped here, and control path (both nl80211 and
> > perhaps ndo ops like start/stop).
>
> I think I was conflating virtual interfaces with control path (e.g.,
> nl80211 scans, set freq, etc.). The idea is that control operations
> may still get *started* after the above, and it's just plain
> impossible to resolve the races with driver queue teardown if we're
> queueing up new control ops at the same time.

Agree.

> But even if we kill off the wireless_dev's, I suppose there are still
> control interfaces that can talk directly to the wiphy.

Yeah, only a few.

> > Without checking the code now, it seems entirely plausible that
> > this is
> > holding some lock that would lock out the control path entirely,
> > for
> > the duration until the wiphy is actually unregistered?
> >
> > Actually, you can't unregister with the relevant locks held
> > (without
> > causing deadlocks), so perhaps it's marking the wiphy as
> > unavailable so
> > that all operations fail?
>
> One of the above two sounds along the right line. But it's something
> I couldn't really figure out how to do quite right.
>
> Dumb question: how would I mark the wiphy as unavailable? Is there
> something I can do at the cfg80211 level? Or would I really have to
> guard all the cfg80211 entry points into mwifiex with a flag or lock?

There isn't really a good way to do this. You can, of course, call
wiphy_unregister(), but if you could do that you'd already have the
problem solved, I think?

I'm not really familiar enough with the context this happens in - can't
you let all the operations that try to talk to the firmware fail
(because the firmware is dead, or whatever) and then call
wiphy_unregister()?

> Also, IIUC, we need to wait for all control paths to complete (or
> cancel) before we can free up the associated resources; so just
> marking "unavailable" isn't enough.

Yeah, I suppose so. Though if you just do all the freeing after
wiphy_unregister() it'll do that for you?

johannes

2017-06-05 11:49:55

by Xinming Hu

[permalink] [raw]
Subject: Re: [PATCH 01/14] mwifiex: pcie: properly synchronize, disable interrupts in driver callbacks

Hi Brian,

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Brian Norris
> Sent: 2017$BG/(B6$B7n(B1$BF|(B 1:11
> To: Ganapathi Bhat; Nishant Sarmukadam
> Cc: [email protected]; Dmitry Torokhov; Amitkumar Karwar; Kalle
> Valo; [email protected]
> Subject: Re: [PATCH 01/14] mwifiex: pcie: properly synchronize, disable
> interrupts in driver callbacks
>
> By the way, this had a few review comments elsewhere, which I'll summarize
> here, since I plan to resubmit a new version sometime.
>
> On Wed, May 24, 2017 at 05:11:06PM -0700, Brian Norris wrote:
> > It seems that the implicit assumption of the mwifiex
> > {enable,disable}_int() callbacks is that after ->disable_int(), all
> > interrupt handling should be complete (synchronized) and not fire
> > again until after ->enable_int(). Also, interrupts should not be
> > serviced until after the first ->enable_int().
> >
> > However, the PCIe driver does none of this. First, the existing
> > interrupt mask programming appears to only have an effect for legacy
> > interrupts. It doesn't actually prevent MSI/MSI-X interrupts. Second,
> > even when it might mask interrupts, we're doing nothing to ensure that
> > pending IRQs have finished processing; they could be already in-flight
> > when a CPU masks them.
> >
> > Another quirk of this driver's design is the use of a racy
> > "surprise_removed" check in mwifiex_pcie_interrupt(). This appears to
> > act like a racy poor-man's version of masking our interrupts -- it
> > allows us to short-circuit the ISR if it fires when we're not prepared
> > to handle more work.
> >
> > We can resolve this all by:
> > (a) disabling our IRQs after requesting them
> > (b) call {enable,disable}_irq() in the {enable,disable}_int()
> > callbacks
> > (c) remove the racy '->surprise_removed' hack from
> > mwifiex_pcie_interrupt()
> > (d) document the effect (or lack thereof) of PCIE_HOST_INT_MASK, to
> > clarify and possibly prevent future misuse
> >
> > Along the way, I decided to use underscores to prefix the
> > driver-private forms of "disabling interrupts" (instead of the awkward
> > "_noerr" suffix used already), partly to discourage their use.
> >
> > Signed-off-by: Brian Norris <[email protected]>
> > ---
> > drivers/net/wireless/marvell/mwifiex/pcie.c | 70
> > ++++++++++++++++++++---------
> > 1 file changed, 49 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > index 394224d6c219..ea75315bf19d 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > @@ -505,12 +505,10 @@ static int
> > mwifiex_pm_wakeup_card_complete(struct mwifiex_adapter *adapter) }
> >
> > /*
> > - * This function disables the host interrupt.
> > - *
> > - * The host interrupt mask is read, the disable bit is reset and
> > - * written back to the card host interrupt mask register.
> > + * This function masks the host interrupt. Effective only for legacy
> > + PCI
> > + * interrupts.
> > */
> > -static int mwifiex_pcie_disable_host_int(struct mwifiex_adapter
> > *adapter)
> > +static int __mwifiex_pcie_disable_host_int(struct mwifiex_adapter
> > +*adapter)
> > {
> > if (mwifiex_pcie_ok_to_access_hw(adapter)) {
> > if (mwifiex_write_reg(adapter, PCIE_HOST_INT_MASK, @@ -525,18
> > +523,30 @@ static int mwifiex_pcie_disable_host_int(struct mwifiex_adapter
> *adapter)
> > return 0;
> > }
> >
> > -static void mwifiex_pcie_disable_host_int_noerr(struct
> > mwifiex_adapter *adapter)
> > +/*
> > + * Disable interrupts, synchronizing with any outstanding interrupts.
> > + */
> > +static void mwifiex_pcie_disable_host_int(struct mwifiex_adapter
> > +*adapter)
> > {
> > - WARN_ON(mwifiex_pcie_disable_host_int(adapter));
> > + struct pcie_service_card *card = adapter->card;
> > + int i;
> > +
> > + WARN_ON(__mwifiex_pcie_disable_host_int(adapter));
> > +
> > + if (card->msix_enable) {
> > + for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++) {
> > + disable_irq(card->msix_entries[i].vector);
> > + }
> > + } else {
> > + disable_irq(card->dev->irq);
>
> This approach is not safe for the non-MSI-X case, since we actually requested
> this IRQ with IRQF_SHARED. That's likely mostly for the legacy PCI interrupt
> case (where we *have* to support shared interrupts) and could probably be
> modified, but at any rate, this is unsafe as written.
>
> Also, I've fielded objections to using the host-level IRQ masking for disabling
> MSI interrupts here. I'm still not completely sure *why* the objection, but I'm
> investigating whether there's any device-level mechanism for disabling MSI
> interrupts on the Wif card. (Marvell folks, feel free to speak up here.)
>

per our investigate:
the msi interrupt could be latch in device function temporary(pci_msi_mask_irq), but it is not able to disable interrupt generating from device, these pending interrupts will arrive after pci_msi_unmask_irq..

this is expected , as implied by the spec $B!H(BWhile a vector is masked, the function is prohibited from sending the associated message, and the function must set the associated Pending bit whenever the function would otherwise send the message. When software unmasks a vector whose associated Pending bit is set, the function must schedule sending the associated message, and clear the Pending bit as soon as the message has been sent.$B!I(B

Apart from these two API, the only way we can find is pci_disable_msi/msix, obviously not suitable..(quite strange, there is no lightweight host API for disable MSI interrupt,, maybe something related with MSI spec..)

And from device side, in current design, there is no similar $B!H(BHost Interrupt Status Mask$B!I(B registers/logics to prevent PCIe MSI logic to latch the interrupt request it received.



Regards,
Simon

> > + }
> > }
> >
> > /*
> > - * This function enables the host interrupt.
> > - *
> > - * The host interrupt enable mask is written to the card
> > - * host interrupt mask register.
> > + * This function unmasks the host interrupt. Effective only for
> > + legacy PCI
> > + * interrupts.
> > */
> > -static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter
> > *adapter)
> > +static int __mwifiex_pcie_enable_host_int(struct mwifiex_adapter
> > +*adapter)
> > {
> > if (mwifiex_pcie_ok_to_access_hw(adapter)) {
> > /* Simply write the mask to the register */ @@ -551,6 +561,26 @@
> > static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter *adapter)
> > return 0;
> > }
> >
> > +static int mwifiex_pcie_enable_host_int(struct mwifiex_adapter
> > +*adapter) {
> > + struct pcie_service_card *card = adapter->card;
> > + int i, ret;
> > +
> > + ret = __mwifiex_pcie_enable_host_int(adapter);
> > + if (ret)
> > + return ret;
> > +
> > + if (card->msix_enable) {
> > + for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++) {
> > + enable_irq(card->msix_entries[i].vector);
> > + }
> > + } else {
> > + enable_irq(card->dev->irq);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > /*
> > * This function initializes TX buffer ring descriptors
> > */
> > @@ -1738,7 +1768,7 @@ static int
> mwifiex_pcie_process_cmd_complete(struct mwifiex_adapter *adapter)
> > while (reg->sleep_cookie && (count++ < 10) &&
> > mwifiex_pcie_ok_to_access_hw(adapter))
> > usleep_range(50, 60);
> > - mwifiex_pcie_enable_host_int(adapter);
> > + __mwifiex_pcie_enable_host_int(adapter);
> > mwifiex_process_sleep_confirm_resp(adapter, skb->data,
> > skb->len);
> > } else {
> > @@ -2081,7 +2111,7 @@ static int mwifiex_prog_fw_w_helper(struct
> mwifiex_adapter *adapter,
> > "info: Downloading FW image (%d bytes)\n",
> > firmware_len);
> >
> > - if (mwifiex_pcie_disable_host_int(adapter)) {
> > + if (__mwifiex_pcie_disable_host_int(adapter)) {
> > mwifiex_dbg(adapter, ERROR,
> > "%s: Disabling interrupts failed.\n", __func__);
> > return -1;
> > @@ -2335,8 +2365,7 @@ static void mwifiex_interrupt_status(struct
> mwifiex_adapter *adapter,
> > if ((pcie_ireg == 0xFFFFFFFF) || !pcie_ireg)
> > return;
> >
> > -
> > - mwifiex_pcie_disable_host_int(adapter);
> > + __mwifiex_pcie_disable_host_int(adapter);
> >
> > /* Clear the pending interrupts */
> > if (mwifiex_write_reg(adapter, PCIE_HOST_INT_STATUS, @@ -2387,9
> > +2416,6 @@ static irqreturn_t mwifiex_pcie_interrupt(int irq, void *context)
> > }
> > adapter = card->adapter;
> >
> > - if (adapter->surprise_removed)
> > - goto exit;
> > -
> > if (card->msix_enable)
> > mwifiex_interrupt_status(adapter, ctx->msg_id);
> > else
> > @@ -2494,7 +2520,7 @@ static int mwifiex_process_pcie_int(struct
> mwifiex_adapter *adapter)
> > "info: cmd_sent=%d data_sent=%d\n",
> > adapter->cmd_sent, adapter->data_sent);
> > if (!card->msi_enable && adapter->ps_state != PS_STATE_SLEEP)
> > - mwifiex_pcie_enable_host_int(adapter);
> > + __mwifiex_pcie_enable_host_int(adapter);
> >
> > return 0;
> > }
> > @@ -3055,6 +3081,7 @@ static int mwifiex_pcie_request_irq(struct
> mwifiex_adapter *adapter)
> > &card->msix_ctx[i]);
> > if (ret)
> > break;
> > + disable_irq(card->msix_entries[i].vector);
>
> Also, if we're really dealing with spurious interrupts at init time, then this
> leaves a window of time in between request_irq() and this
> disable_irq() in which we could still receive a bad IRQ. So this should be
> reworked to do one of:
> (a) move the request_irq() later, until we're really able to handle interrupts
> (b) set the IRQ_NOAUTOEN flag (for the non-shared case), to avoid enabling
> IRQs initially
> (c) use some sort of (yet-unknown) device-level mask for MSI interrupts.
>
> I'm looking to address these problems in a v2. Many of the other patches are
> likely independent. I'll plan to resubmit them in the next series (if they aren't
> applied before then), to avoid conflicts with those that aren't independent,
> and because I intentionally put bugfixes (like this
> patch) first in the series.
>
> Brian
>
> > }
> >
> > if (ret) {
> > @@ -3087,6 +3114,7 @@ static int mwifiex_pcie_request_irq(struct
> mwifiex_adapter *adapter)
> > pr_err("request_irq failed: ret=%d\n", ret);
> > return -1;
> > }
> > + disable_irq(pdev->irq);
> >
> > return 0;
> > }
> > @@ -3244,7 +3272,7 @@ static struct mwifiex_if_ops pcie_ops = {
> > .register_dev = mwifiex_register_dev,
> > .unregister_dev = mwifiex_unregister_dev,
> > .enable_int = mwifiex_pcie_enable_host_int,
> > - .disable_int = mwifiex_pcie_disable_host_int_noerr,
> > + .disable_int = mwifiex_pcie_disable_host_int,
> > .process_int_status = mwifiex_process_int_status,
> > .host_to_card = mwifiex_pcie_host_to_card,
> > .wakeup = mwifiex_pm_wakeup_card,
> > --
> > 2.13.0.219.gdb65acc882-goog
> >

2017-06-22 13:00:05

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

On Wed, 2017-06-21 at 10:48 -0700, Brian Norris wrote:
>
> Yes, that all sounds nice. But for my sake, can you describe better
> what's actually going on there (e.g., can you point me at which code
> does this)?

It's much easier with mac80211, it has all the state. Basically the
reconfig is in ieee80211_reconfig() :)

> I'm really not familiar with mac80211 (though I was aware of
> the above general behavior). But to my knowledge, mac80211 drivers
> keep a lot more state managed in the kernel, so it's a little easier
> and more natural to get the driver/FW back to "the same state" than
> it is with a full-MAC driver.

Indeed.

johannes

2017-06-01 09:15:50

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

Brian Norris <[email protected]> writes:

> In general, it's helpful to use the same code for device removal as for
> device reset, as this tends to have fewer bugs. Let's move the wiphy
> unregistration code into the common reset and removal code.
>
> In particular, it's very hard to properly handle the reset sequence when
> something fails. Currently, if mwifiex_reinit_sw() fails, we've failed
> to unregister the associated wiphy, and so running something as simple
> as "iw phy" can trigger an OOPS, as the wiphy still has hooks back into
> freed mwifiex data structures. For example, KASAN complained:
>
> [... see reset fail for other reasons ...]
> [ 1184.821158] mwifiex_pcie 0000:01:00.0: info: dnld wifi firmware from 174948 bytes
> [ 1186.870914] mwifiex_pcie 0000:01:00.0: info: FW download over, size 608396 bytes
> [ 1187.685990] mwifiex_pcie 0000:01:00.0: WLAN FW is active
> [ 1187.692673] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -512
> [ 1187.699075] mwifiex_pcie 0000:01:00.0: info: _mwifiex_fw_dpc: unregister device
> [ 1187.713476] mwifiex: Failed to bring up adapter: -5
> [ 1187.718644] mwifiex_pcie 0000:01:00.0: reinit failed: -5
>
> [... run `iw phy` ...]
> [ 1212.902419] ==================================================================
> [ 1212.909806] BUG: KASAN: use-after-free in mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex] at addr ffffffc0ad1a8028
> [ 1212.920246] Read of size 1 by task iw/3127
> [...]
> [ 1212.934946] page dumped because: kasan: bad access detected
> [...]
> [ 1212.950665] Call trace:
> [ 1212.953148] [<ffffffc00020a69c>] dump_backtrace+0x0/0x190
> [ 1212.958572] [<ffffffc00020a96c>] show_stack+0x20/0x28
> [ 1212.963648] [<ffffffc0005ce18c>] dump_stack+0xa4/0xcc
> [ 1212.968723] [<ffffffc0003c4430>] kasan_report+0x378/0x500
> [ 1212.974140] [<ffffffc0003c3358>] __asan_load1+0x44/0x4c
> [ 1212.979462] [<ffffffbffc2e8360>] mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex]
> [ 1212.987131] [<ffffffbffc084fc4>] nl80211_send_wiphy+0x75c/0x2de0 [cfg80211]
> [ 1212.994246] [<ffffffbffc094f60>] nl80211_dump_wiphy+0x32c/0x438 [cfg80211]
> [ 1213.001149] [<ffffffc000ab6404>] genl_lock_dumpit+0x48/0x64
> [ 1213.006746] [<ffffffc000ab3474>] netlink_dump+0x178/0x398
> [ 1213.012171] [<ffffffc000ab3d18>] __netlink_dump_start+0x1bc/0x260
> [...]
>
> This all goes away if we just tear down the wiphy on the way down, and
> set it back up if/when we bring the device back up.

I don't know exactly what kind of reset this is about, but isn't this a
quite intrusive solution? For example, phy name changes because of this?

--
Kalle Valo

2017-06-21 17:48:33

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

Hi Kalle (and Johannes; I'll reply to Johannes response separately too),

On Mon, Jun 05, 2017 at 06:54:18PM +0300, Kalle Valo wrote:
> Brian Norris <[email protected]> writes:
> > That's not to say that there aren't such bugs out there. I'd still be
> > willing to bet there are. And IMO, it seems wise to just do the same
> > teardown/setup as one would do for (e.g.) 'rmmod', to prevent exposing
> > *too* many new permutations of "wiphy is available but rest of the
> > driver is torn down".
>
> This feels like a sledge hammer approach causing all sort of problems

Yes, it is a sledge hammer. But I'm working with what we have here. With
this approach, it's also easier to tell that things aren't out-of-sync,
since I'm never quite sure how much state was held in the firmware (and
now won't match what user space thinks). A full removal / re-init makes
this clear -- user space should expect *everything* to be reset.

I'm open to learning better approaches if possible, but this also might
be difficult if I don't get any support from Marvell on this. (They seem
quite happy to let sleeping dogs lie.)

> for user space and I really like the mac80211 approach more. For
> example, if an ath10k firmware crash happens user only sees a few second
> pause in data traffic and a warning in kernel log, otherwise everything
> happens behind the scenes. Of course there are very likely races
> somewhere but at least I haven't seen that many reports related to
> firmware restart functionality.

Yes, that all sounds nice. But for my sake, can you describe better
what's actually going on there (e.g., can you point me at which code
does this)? I'm really not familiar with mac80211 (though I was aware of
the above general behavior). But to my knowledge, mac80211 drivers keep
a lot more state managed in the kernel, so it's a little easier and
more natural to get the driver/FW back to "the same state" than it is
with a full-MAC driver.

> > But if none of this is convincing to you, I can take a stab at a
> > different solution.
>
> I don't have any problem applying this patch but more about being
> curious why doing it like this. And hopefully finding a less intrusive
> solution in the future.

OK, sure. I'll see what I can do, but I don't see an easy path at the
moment toward fixing (i.e., completely rewriting) this long-standing
driver behavior.

[trim]

Thanks,
Brian

2017-06-01 17:40:02

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

On Thu, Jun 01, 2017 at 12:15:45PM +0300, Kalle Valo wrote:
> Brian Norris <[email protected]> writes:
>
> > In general, it's helpful to use the same code for device removal as for
> > device reset, as this tends to have fewer bugs. Let's move the wiphy
> > unregistration code into the common reset and removal code.
> >
> > In particular, it's very hard to properly handle the reset sequence when
> > something fails. Currently, if mwifiex_reinit_sw() fails, we've failed
> > to unregister the associated wiphy, and so running something as simple
> > as "iw phy" can trigger an OOPS, as the wiphy still has hooks back into
> > freed mwifiex data structures. For example, KASAN complained:
> >
> > [... see reset fail for other reasons ...]
> > [ 1184.821158] mwifiex_pcie 0000:01:00.0: info: dnld wifi firmware from 174948 bytes
> > [ 1186.870914] mwifiex_pcie 0000:01:00.0: info: FW download over, size 608396 bytes
> > [ 1187.685990] mwifiex_pcie 0000:01:00.0: WLAN FW is active
> > [ 1187.692673] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -512
> > [ 1187.699075] mwifiex_pcie 0000:01:00.0: info: _mwifiex_fw_dpc: unregister device
> > [ 1187.713476] mwifiex: Failed to bring up adapter: -5
> > [ 1187.718644] mwifiex_pcie 0000:01:00.0: reinit failed: -5
> >
> > [... run `iw phy` ...]
> > [ 1212.902419] ==================================================================
> > [ 1212.909806] BUG: KASAN: use-after-free in mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex] at addr ffffffc0ad1a8028
> > [ 1212.920246] Read of size 1 by task iw/3127
> > [...]
> > [ 1212.934946] page dumped because: kasan: bad access detected
> > [...]
> > [ 1212.950665] Call trace:
> > [ 1212.953148] [<ffffffc00020a69c>] dump_backtrace+0x0/0x190
> > [ 1212.958572] [<ffffffc00020a96c>] show_stack+0x20/0x28
> > [ 1212.963648] [<ffffffc0005ce18c>] dump_stack+0xa4/0xcc
> > [ 1212.968723] [<ffffffc0003c4430>] kasan_report+0x378/0x500
> > [ 1212.974140] [<ffffffc0003c3358>] __asan_load1+0x44/0x4c
> > [ 1212.979462] [<ffffffbffc2e8360>] mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex]
> > [ 1212.987131] [<ffffffbffc084fc4>] nl80211_send_wiphy+0x75c/0x2de0 [cfg80211]
> > [ 1212.994246] [<ffffffbffc094f60>] nl80211_dump_wiphy+0x32c/0x438 [cfg80211]
> > [ 1213.001149] [<ffffffc000ab6404>] genl_lock_dumpit+0x48/0x64
> > [ 1213.006746] [<ffffffc000ab3474>] netlink_dump+0x178/0x398
> > [ 1213.012171] [<ffffffc000ab3d18>] __netlink_dump_start+0x1bc/0x260
> > [...]
> >
> > This all goes away if we just tear down the wiphy on the way down, and
> > set it back up if/when we bring the device back up.
>
> I don't know exactly what kind of reset this is about,

Marvell firmwares are known to be quite buggy, and there are plenty of
situations in which they crash (often resulting in a command timeout).
The current best workaround for these is to essentially unwind the whole
driver, reset the card, and reprobe the whole thing. See anywhere that
the ->card_reset() callback is called.

This has been around for a long time on SDIO, and you recently merged my
changes to enable this for PCIe:

6d7d579a8243 mwifiex: pcie: add card_reset() support

> but isn't this a
> quite intrusive solution? For example, phy name changes because of this?

Yes, it is a bit intrusive. But the whole process is intrusive, as it
deletes all the virtual interfaces and loses your settings. This all
relies on user space being prepared to clean up and reinitialize
everything afterward.

And yes, this causes a phy name change.

In favor: this is what the SDIO reset code *used* to do, before this
commit:

c742e623e941 mwifiex: sdio card reset enhancement

where the SDIO driver started using the half-baked reset solution
written for PCIe.

Lastly, I still need to analyze a few more things in this driver, but
AFAICT, if we *don't* unregister the wiphy, we are exposed to quite a
few more race conditions -- not just the easy-to-notice condition
described above. What happens if the wiphy still processes cfg80211
operations while we're still resetting the firmware? Much of the driver
may not be prepared for this. At the moment, I can't find anything
terribly wrong; if I slow down the reset (e.g., with msleep()s) I can
just trigger complaints about "cmd node not available" or "card is
removed", but I haven't yet found a true bug.

That's not to say that there aren't such bugs out there. I'd still be
willing to bet there are. And IMO, it seems wise to just do the same
teardown/setup as one would do for (e.g.) 'rmmod', to prevent exposing
*too* many new permutations of "wiphy is available but rest of the
driver is torn down".

But if none of this is convincing to you, I can take a stab at a
different solution.

BTW, since you're taking an interest in this code now, can I trouble you
with a question? Looking at mwifiex_uninit_sw() (after this patchset),
you can see a loop like this:

/* Stop data */
for (i = 0; i < adapter->priv_num; i++) {
priv = adapter->priv[i];
if (priv && priv->netdev) {
mwifiex_stop_net_dev_queue(priv->netdev, adapter);
if (netif_carrier_ok(priv->netdev))
netif_carrier_off(priv->netdev);
netif_device_detach(priv->netdev);
}
}

That seems to be the only attempt to prevent user space from talking to
the device while we proceed to shut down (mwifiex_shutdown_drv()). AIUI,
that's wholly insufficient, and we need to actually stop all the virtual
interfaces (and possibly the wiphy as well) first. I'm looking at trying
to move the mwifiex_del_virtual_intf() loop up much further in this
function (but there are other bugs preventing me from doing that yet).

Does that sound like the right approach to you? I'm kinda figuring this
should better mimic the mac80211 ieee80211_remove_interfaces()
structure.

Brian

2017-06-27 19:50:09

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

(A little slow on follow-up here)

On Thu, Jun 22, 2017 at 02:59:49PM +0200, Johannes Berg wrote:
> On Wed, 2017-06-21 at 10:48 -0700, Brian Norris wrote:
> >
> > Yes, that all sounds nice. But for my sake, can you describe better
> > what's actually going on there (e.g., can you point me at which code
> > does this)?
>
> It's much easier with mac80211, it has all the state. Basically the
> reconfig is in ieee80211_reconfig() :)

Wow, that's not exactly simple code; I expect it could be pretty
difficult to get that right today on mwifiex. The current approach
actually should be *easier* (for the kernel side) to avoid bugs, as it
should be basically the same thing as 'rmmod'. Nonetheless, there are
plenty of bugs.

Thanks for the pointer though.

> > I'm really not familiar with mac80211 (though I was aware of
> > the above general behavior). But to my knowledge, mac80211 drivers
> > keep a lot more state managed in the kernel, so it's a little easier
> > and more natural to get the driver/FW back to "the same state" than
> > it is with a full-MAC driver.
>
> Indeed.

Brian

2017-06-21 18:27:11

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

Hi Johannes,

On Fri, Jun 09, 2017 at 11:03:38AM +0200, Johannes Berg wrote:
> On Mon, 2017-06-05 at 18:54 +0300, Kalle Valo wrote:
>
> > > BTW, since you're taking an interest in this code now, can I
> > > trouble you with a question? Looking at mwifiex_uninit_sw() (after
> > > this patchset), you can see a loop like this:
> > >
> > > ????????/* Stop data */
> > > ????????for (i = 0; i < adapter->priv_num; i++) {
> > > ????????????????priv = adapter->priv[i];
> > > ????????????????if (priv && priv->netdev) {
> > > ????????????????????????mwifiex_stop_net_dev_queue(priv->netdev,
> > > adapter);
> > > ????????????????????????if (netif_carrier_ok(priv->netdev))
> > > ????????????????????????????????netif_carrier_off(priv->netdev);
> > > ????????????????????????netif_device_detach(priv->netdev);
> > > ????????????????}
> > > ????????}
> > >
> > > That seems to be the only attempt to prevent user space from
> > > talking to the device while we proceed to shut down
> > > (mwifiex_shutdown_drv()). AIUI, that's wholly insufficient, and we
> > > need to actually stop all the virtual interfaces (and possibly the
> > > wiphy as well) first. I'm looking at trying to move the
> > > mwifiex_del_virtual_intf() loop up much further in this function
> > > (but there are other bugs preventing me from doing that yet).
> > >
> > > Does that sound like the right approach to you? I'm kinda figuring
> > > this should better mimic the mac80211 ieee80211_remove_interfaces()
> > > structure.
> >
> > Johannes is much better person to answer this (CCed).
>
> Wait, what? You're throwing me into pretty deep water ;-)

Regardless, thanks for the help :)

> I'm not sure what you mean by "we need to atually stop all the virtual
> interfaces ([...]) first".

Judging by your following comments, I may have been completely mistaken.
(But that's why I asked you folks!)

> There are essentially only two/three ways to reach this - data path,
> which is getting stopped here, and control path (both nl80211 and
> perhaps ndo ops like start/stop).

I think I was conflating virtual interfaces with control path (e.g.,
nl80211 scans, set freq, etc.). The idea is that control operations may
still get *started* after the above, and it's just plain impossible to
resolve the races with driver queue teardown if we're queueing up new
control ops at the same time.

But even if we kill off the wireless_dev's, I suppose there are still
control interfaces that can talk directly to the wiphy.

> Without checking the code now, it seems entirely plausible that this is
> holding some lock that would lock out the control path entirely, for
> the duration until the wiphy is actually unregistered?
>
> Actually, you can't unregister with the relevant locks held (without
> causing deadlocks), so perhaps it's marking the wiphy as unavailable so
> that all operations fail?

One of the above two sounds along the right line. But it's something I
couldn't really figure out how to do quite right.

Dumb question: how would I mark the wiphy as unavailable? Is there
something I can do at the cfg80211 level? Or would I really have to
guard all the cfg80211 entry points into mwifiex with a flag or lock?

Also, IIUC, we need to wait for all control paths to complete (or
cancel) before we can free up the associated resources; so just marking
"unavailable" isn't enough.

Thanks,
Brian

2017-06-09 09:03:44

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

On Mon, 2017-06-05 at 18:54 +0300, Kalle Valo wrote:

> > BTW, since you're taking an interest in this code now, can I
> > trouble you with a question? Looking at mwifiex_uninit_sw() (after
> > this patchset), you can see a loop like this:
> >
> >         /* Stop data */
> >         for (i = 0; i < adapter->priv_num; i++) {
> >                 priv = adapter->priv[i];
> >                 if (priv && priv->netdev) {
> >                         mwifiex_stop_net_dev_queue(priv->netdev,
> > adapter);
> >                         if (netif_carrier_ok(priv->netdev))
> >                                 netif_carrier_off(priv->netdev);
> >                         netif_device_detach(priv->netdev);
> >                 }
> >         }
> >
> > That seems to be the only attempt to prevent user space from
> > talking to the device while we proceed to shut down
> > (mwifiex_shutdown_drv()). AIUI, that's wholly insufficient, and we
> > need to actually stop all the virtual interfaces (and possibly the
> > wiphy as well) first. I'm looking at trying to move the
> > mwifiex_del_virtual_intf() loop up much further in this function
> > (but there are other bugs preventing me from doing that yet).
> >
> > Does that sound like the right approach to you? I'm kinda figuring
> > this should better mimic the mac80211 ieee80211_remove_interfaces()
> > structure.
>
> Johannes is much better person to answer this (CCed).


Wait, what? You're throwing me into pretty deep water ;-)

I'm not sure what you mean by "we need to atually stop all the virtual
interfaces ([...]) first".

There are essentially only two/three ways to reach this - data path,
which is getting stopped here, and control path (both nl80211 and
perhaps ndo ops like start/stop).

Without checking the code now, it seems entirely plausible that this is
holding some lock that would lock out the control path entirely, for
the duration until the wiphy is actually unregistered?

Actually, you can't unregister with the relevant locks held (without
causing deadlocks), so perhaps it's marking the wiphy as unavailable so
that all operations fail?

johannes

2017-06-05 15:54:25

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

Brian Norris <[email protected]> writes:

> On Thu, Jun 01, 2017 at 12:15:45PM +0300, Kalle Valo wrote:
>> Brian Norris <[email protected]> writes:
>>
>> > In general, it's helpful to use the same code for device removal as for
>> > device reset, as this tends to have fewer bugs. Let's move the wiphy
>> > unregistration code into the common reset and removal code.
>> >
>> > In particular, it's very hard to properly handle the reset sequence when
>> > something fails. Currently, if mwifiex_reinit_sw() fails, we've failed
>> > to unregister the associated wiphy, and so running something as simple
>> > as "iw phy" can trigger an OOPS, as the wiphy still has hooks back into
>> > freed mwifiex data structures. For example, KASAN complained:
>> >
>> > [... see reset fail for other reasons ...]
>> > [ 1184.821158] mwifiex_pcie 0000:01:00.0: info: dnld wifi firmware from 174948 bytes
>> > [ 1186.870914] mwifiex_pcie 0000:01:00.0: info: FW download over, size 608396 bytes
>> > [ 1187.685990] mwifiex_pcie 0000:01:00.0: WLAN FW is active
>> > [ 1187.692673] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -512
>> > [ 1187.699075] mwifiex_pcie 0000:01:00.0: info: _mwifiex_fw_dpc: unregister device
>> > [ 1187.713476] mwifiex: Failed to bring up adapter: -5
>> > [ 1187.718644] mwifiex_pcie 0000:01:00.0: reinit failed: -5
>> >
>> > [... run `iw phy` ...]
>> > [ 1212.902419] ==================================================================
>> > [ 1212.909806] BUG: KASAN: use-after-free in mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex] at addr ffffffc0ad1a8028
>> > [ 1212.920246] Read of size 1 by task iw/3127
>> > [...]
>> > [ 1212.934946] page dumped because: kasan: bad access detected
>> > [...]
>> > [ 1212.950665] Call trace:
>> > [ 1212.953148] [<ffffffc00020a69c>] dump_backtrace+0x0/0x190
>> > [ 1212.958572] [<ffffffc00020a96c>] show_stack+0x20/0x28
>> > [ 1212.963648] [<ffffffc0005ce18c>] dump_stack+0xa4/0xcc
>> > [ 1212.968723] [<ffffffc0003c4430>] kasan_report+0x378/0x500
>> > [ 1212.974140] [<ffffffc0003c3358>] __asan_load1+0x44/0x4c
>> > [ 1212.979462] [<ffffffbffc2e8360>] mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex]
>> > [ 1212.987131] [<ffffffbffc084fc4>] nl80211_send_wiphy+0x75c/0x2de0 [cfg80211]
>> > [ 1212.994246] [<ffffffbffc094f60>] nl80211_dump_wiphy+0x32c/0x438 [cfg80211]
>> > [ 1213.001149] [<ffffffc000ab6404>] genl_lock_dumpit+0x48/0x64
>> > [ 1213.006746] [<ffffffc000ab3474>] netlink_dump+0x178/0x398
>> > [ 1213.012171] [<ffffffc000ab3d18>] __netlink_dump_start+0x1bc/0x260
>> > [...]
>> >
>> > This all goes away if we just tear down the wiphy on the way down, and
>> > set it back up if/when we bring the device back up.
>>
>> I don't know exactly what kind of reset this is about,
>
> Marvell firmwares are known to be quite buggy, and there are plenty of
> situations in which they crash (often resulting in a command timeout).

That is a common problem with firmwares :/

> The current best workaround for these is to essentially unwind the
> whole driver, reset the card, and reprobe the whole thing. See
> anywhere that the ->card_reset() callback is called.
>
> This has been around for a long time on SDIO, and you recently merged my
> changes to enable this for PCIe:
>
> 6d7d579a8243 mwifiex: pcie: add card_reset() support
>
>> but isn't this a
>> quite intrusive solution? For example, phy name changes because of this?
>
> Yes, it is a bit intrusive. But the whole process is intrusive, as it
> deletes all the virtual interfaces and loses your settings. This all
> relies on user space being prepared to clean up and reinitialize
> everything afterward.
>
> And yes, this causes a phy name change.
>
> In favor: this is what the SDIO reset code *used* to do, before this
> commit:
>
> c742e623e941 mwifiex: sdio card reset enhancement
>
> where the SDIO driver started using the half-baked reset solution
> written for PCIe.
>
> Lastly, I still need to analyze a few more things in this driver, but
> AFAICT, if we *don't* unregister the wiphy, we are exposed to quite a
> few more race conditions -- not just the easy-to-notice condition
> described above. What happens if the wiphy still processes cfg80211
> operations while we're still resetting the firmware? Much of the driver
> may not be prepared for this. At the moment, I can't find anything
> terribly wrong; if I slow down the reset (e.g., with msleep()s) I can
> just trigger complaints about "cmd node not available" or "card is
> removed", but I haven't yet found a true bug.
>
> That's not to say that there aren't such bugs out there. I'd still be
> willing to bet there are. And IMO, it seems wise to just do the same
> teardown/setup as one would do for (e.g.) 'rmmod', to prevent exposing
> *too* many new permutations of "wiphy is available but rest of the
> driver is torn down".

This feels like a sledge hammer approach causing all sort of problems
for user space and I really like the mac80211 approach more. For
example, if an ath10k firmware crash happens user only sees a few second
pause in data traffic and a warning in kernel log, otherwise everything
happens behind the scenes. Of course there are very likely races
somewhere but at least I haven't seen that many reports related to
firmware restart functionality.

> But if none of this is convincing to you, I can take a stab at a
> different solution.

I don't have any problem applying this patch but more about being
curious why doing it like this. And hopefully finding a less intrusive
solution in the future.

> BTW, since you're taking an interest in this code now, can I trouble you
> with a question? Looking at mwifiex_uninit_sw() (after this patchset),
> you can see a loop like this:
>
> /* Stop data */
> for (i = 0; i < adapter->priv_num; i++) {
> priv = adapter->priv[i];
> if (priv && priv->netdev) {
> mwifiex_stop_net_dev_queue(priv->netdev, adapter);
> if (netif_carrier_ok(priv->netdev))
> netif_carrier_off(priv->netdev);
> netif_device_detach(priv->netdev);
> }
> }
>
> That seems to be the only attempt to prevent user space from talking to
> the device while we proceed to shut down (mwifiex_shutdown_drv()). AIUI,
> that's wholly insufficient, and we need to actually stop all the virtual
> interfaces (and possibly the wiphy as well) first. I'm looking at trying
> to move the mwifiex_del_virtual_intf() loop up much further in this
> function (but there are other bugs preventing me from doing that yet).
>
> Does that sound like the right approach to you? I'm kinda figuring this
> should better mimic the mac80211 ieee80211_remove_interfaces()
> structure.

Johannes is much better person to answer this (CCed).

--
Kalle Valo

2017-06-28 07:21:58

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

Hi Brian,

> Wow, that's not exactly simple code; I expect it could be pretty
> difficult to get that right today on mwifiex.

Yeah, I have no doubt. You'd probably have to track a lot of state that
you just pass down to the firmware too, and possibly can't even track
some state that the firmware derives itself (like for example PNs for
keys)

> The current approach
> actually should be *easier* (for the kernel side) to avoid bugs, as
> it should be basically the same thing as 'rmmod'. Nonetheless, there
> are plenty of bugs.

:-)

johannes

2017-06-29 18:46:01

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

Hi Johannes,

On Wed, Jun 28, 2017 at 09:28:49AM +0200, Johannes Berg wrote:
> On Tue, 2017-06-27 at 13:48 -0700, Brian Norris wrote:
>
> > > There isn't really a good way to do this. You can, of course, call
> > > wiphy_unregister(), but if you could do that you'd already have the
> > > problem solved, I think?
> >
> > That's probably along the right track. There are still some things
> > we'd need to do properly before that though, and this is where all
> > the problems are so far. (Also, this is what Kalle was already
> > objecting to; he didn't think we should be unregistering/recreating
> > the wiphy, but I think he ended up softening on that a bit.)
> >
> > For one, I still expect I should be removing the wireless dev's
> > before unregistering the wihpy, no? Otherwise, there will be existing
> > wdevs backed by an unregistered wiphy?
>
> Yeah, that's true - though once you get rid of those they can't be
> accessed any more.

Right. That's the whole idea for the current reset implementation in
this driver anyway.

> > And that gets to the heart of another bug: deleting interfaces (e.g.,
> > "iw dev foo del") races with a lot of stuff -- like see
> >
> > mwifiex_process_sta_event() ->
> > ? EVENT_EXT_SCAN_REPORT ->
> > ????netif_running(priv->netdev)
> >
> > Because mwifiex_del_virtual_intf() doesn't stop any outstanding
> > commands, we can be both deleting the netdev and processing scans for
> > it.
>
> Huh, well, I guess you need some kind of locking here anyway, since the
> user can always do things like deleting the interface while a scan is
> running?

Yes, some sort of locking, and maybe ability to cancel outstanding
commands on just the targeted interface. I gave the locking a try myself
previously and got something sorta working, before getting distracted by
other problems. I also reported this directly to Marvell to see if they
could be bothered to fix it. They might be working on that.

But actually I think the rmmod or reset code path has this a little
easier, since we're fine just killing all outstanding commands and
interfaces. So these two problems are somewhat orthogonal.

> > > > Also, IIUC, we need to wait for all control paths to complete (or
> > > > cancel) before we can free up the associated resources; so just
> > > > marking "unavailable" isn't enough.
> > >
> > > Yeah, I suppose so. Though if you just do all the freeing after
> > > wiphy_unregister() it'll do that for you?
> >
> > Yes, I think so. Then part of the problem is probably that some of
> > the current "cancel command" logic is tied up with the "free command
> > structures" logic. So we're freeing some stuff too early.
> >
> > Anyway, those sorts of bugs aside, IIUC the full sequence for
> > teardown should probably be something like:
> >
> > 1. Stop TX queues
> > 2. Cancel outstanding commands (let them fail or finish, etc.) -- but
> > ???DON'T free their backing resources yet

I also failed to mention "don't queue new FW commands". The driver does
this before step 1 currently, though the code isn't beautiful:

mwifiex_send_cmd()
...
if (adapter->surprise_removed) {
mwifiex_dbg(adapter, ERROR,
"PREP_CMD: card is removed\n");
return -1;
}
... // continue on to prepare and queue (or sync) the command

static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter)
{
...
adapter->surprise_removed = true;
... // continue on to step 1, 2, ...


(And now that I think about it, I'm pretty sure there's a race in there
somewhere... Someone could easily miss the "surprise removed" check, grab a
command node, and miss out on step 2 (since the command isn't sitting on any of
the queues that get "canceled" yet). I believe this can easily blow up once
they try to queue the command, as we are no longer ready to handle the command
queue...)

> > 3. Remove wdevs
> > 4. wiphy_unregister()
> > 5. Free up resources
> >
> > Current problems are at least:
> >
> > * we don't do step 4 in the right place (if at all; see this patch)
> > * step 2 mixes in "free"ing resources too early
>
> So I'm not sure what you mean by splitting in 2/5 - this seems
> reasonable, but I don't understand why something like a scan request
> wouldn't be freed while you cancel it in 2? In fact, you really have to
> free it before you remove the corresponding wdev, or cfg80211 will
> complain?

I haven't validated all the related code, but I think the problem isn't
that a scan is still being processed after the wdev is removed. The
problem is simply that we've canceled the command (and it will
"complete" before the wdev removal), but we're freeing the associated
resource before the caller is actually completely done with it. Or in
more detail:

The driver keeps a pool of FW command structures (like 'struct
cmd_ctrl_node') that are queued in various ways. We cancel everything in
step 2 (mwifiex_adapter_cleanup() -> mwifiex_cancel_all_pending_cmd()),
but we can't free the pool (mwifiex_free_cmd_buffer()) until step 5, since
there can be wdev or wiphy control paths that are still exiting (and using one
of the cmd nodes' "condition" variables) until we've finished waiting on them
with step 3 or 4. But currently we free these buffers in step 2
(mwifiex_adapter_cleanup() -> mwifiex_free_cmd_buffer()).

I could have missed something else, and my description above definitely
isn't exhaustive. But AFAICT, this straightens out a solution for some
of the problems I've noticed recently. (But then, I noticed another
problem, as noted above...ugh.)

Anyway, thanks for the pointers so far!

Brian

2017-07-04 14:10:42

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset

(I was on vacation and inbox exploded again)

Brian Norris <[email protected]> writes:

> On Thu, Jun 22, 2017 at 03:02:34PM +0200, Johannes Berg wrote:
>> On Wed, 2017-06-21 at 11:27 -0700, Brian Norris wrote:
>
>> > > Without checking the code now, it seems entirely plausible that
>> > > this is
>> > > holding some lock that would lock out the control path entirely,
>> > > for
>> > > the duration until the wiphy is actually unregistered?
>> > >
>> > > Actually, you can't unregister with the relevant locks held
>> > > (without
>> > > causing deadlocks), so perhaps it's marking the wiphy as
>> > > unavailable so
>> > > that all operations fail?
>> >
>> > One of the above two sounds along the right line. But it's something
>> > I couldn't really figure out how to do quite right.
>> >
>> > Dumb question: how would I mark the wiphy as unavailable? Is there
>> > something I can do at the cfg80211 level? Or would I really have to
>> > guard all the cfg80211 entry points into mwifiex with a flag or lock?
>>
>> There isn't really a good way to do this. You can, of course, call
>> wiphy_unregister(), but if you could do that you'd already have the
>> problem solved, I think?
>
> That's probably along the right track. There are still some things we'd
> need to do properly before that though, and this is where all the
> problems are so far. (Also, this is what Kalle was already objecting to;
> he didn't think we should be unregistering/recreating the wiphy, but I
> think he ended up softening on that a bit.)

Just to clarify I was more like hoping there would be a better way to do
it, not really objecting the current method, but I didn't realise that
of course it's harder to do a transparent firmware restart with fullmac
design. And certainly what are you doing now is much much better than
doing nothing after a firmware crash.

--
Kalle Valo