2024-03-07 09:46:52

by Mingyen Hsieh

[permalink] [raw]
Subject: [PATCH] wifi: mt76: mt7921s: fix potential hung tasks during chip recovery

From: Leon Yen <[email protected]>

During chip recovery (e.g. chip reset), there is a possible situation that
kernel worker reset_work is holding the lock and waiting for kernel thread
stat_worker to be parked, while stat_worker is waiting for the release of
the same lock.
It causes a deadlock resulting in the dumping of hung tasks messages and
possible rebooting of the device.

This patch prevents the execution of stat_worker during the chip recovery.

Signed-off-by: Leon Yen <[email protected]>
Signed-off-by: Ming Yen Hsieh <[email protected]>
---
drivers/net/wireless/mediatek/mt76/mt7921/mac.c | 2 ++
drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c | 2 --
drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c | 2 --
drivers/net/wireless/mediatek/mt76/sdio.c | 3 ++-
4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
index 867e14f6b93a..73e42ef42983 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
@@ -663,6 +663,7 @@ void mt7921_mac_reset_work(struct work_struct *work)
int i, ret;

dev_dbg(dev->mt76.dev, "chip reset\n");
+ set_bit(MT76_RESET, &dev->mphy.state);
dev->hw_full_reset = true;
ieee80211_stop_queues(hw);

@@ -691,6 +692,7 @@ void mt7921_mac_reset_work(struct work_struct *work)
}

dev->hw_full_reset = false;
+ clear_bit(MT76_RESET, &dev->mphy.state);
pm->suspended = false;
ieee80211_wake_queues(hw);
ieee80211_iterate_active_interfaces(hw,
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
index c866144ff061..031ba9aaa4e2 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
@@ -64,7 +64,6 @@ int mt7921e_mac_reset(struct mt792x_dev *dev)
mt76_wr(dev, dev->irq_map->host_irq_enable, 0);
mt76_wr(dev, MT_PCIE_MAC_INT_ENABLE, 0x0);

- set_bit(MT76_RESET, &dev->mphy.state);
set_bit(MT76_MCU_RESET, &dev->mphy.state);
wake_up(&dev->mt76.mcu.wait);
skb_queue_purge(&dev->mt76.mcu.res_q);
@@ -115,7 +114,6 @@ int mt7921e_mac_reset(struct mt792x_dev *dev)

err = __mt7921_start(&dev->phy);
out:
- clear_bit(MT76_RESET, &dev->mphy.state);

local_bh_disable();
napi_enable(&dev->mt76.tx_napi);
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c
index 389eb0903807..1f77cf71ca70 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c
@@ -98,7 +98,6 @@ int mt7921s_mac_reset(struct mt792x_dev *dev)
mt76_connac_free_pending_tx_skbs(&dev->pm, NULL);
mt76_txq_schedule_all(&dev->mphy);
mt76_worker_disable(&dev->mt76.tx_worker);
- set_bit(MT76_RESET, &dev->mphy.state);
set_bit(MT76_MCU_RESET, &dev->mphy.state);
wake_up(&dev->mt76.mcu.wait);
skb_queue_purge(&dev->mt76.mcu.res_q);
@@ -135,7 +134,6 @@ int mt7921s_mac_reset(struct mt792x_dev *dev)

err = __mt7921_start(&dev->phy);
out:
- clear_bit(MT76_RESET, &dev->mphy.state);

mt76_worker_enable(&dev->mt76.tx_worker);

diff --git a/drivers/net/wireless/mediatek/mt76/sdio.c b/drivers/net/wireless/mediatek/mt76/sdio.c
index 3e88798df017..a4ed00eebc48 100644
--- a/drivers/net/wireless/mediatek/mt76/sdio.c
+++ b/drivers/net/wireless/mediatek/mt76/sdio.c
@@ -499,7 +499,8 @@ static void mt76s_tx_status_data(struct mt76_worker *worker)
dev = container_of(sdio, struct mt76_dev, sdio);

while (true) {
- if (test_bit(MT76_REMOVED, &dev->phy.state))
+ if (test_bit(MT76_RESET, &dev->phy.state) ||
+ test_bit(MT76_REMOVED, &dev->phy.state))
break;

if (!dev->drv->tx_status_data(dev, &update))
--
2.18.0



2024-03-07 16:43:34

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] wifi: mt76: mt7921s: fix potential hung tasks during chip recovery

On 3/7/24 01:46, Mingyen Hsieh wrote:
> From: Leon Yen <[email protected]>
>
> During chip recovery (e.g. chip reset), there is a possible situation that
> kernel worker reset_work is holding the lock and waiting for kernel thread
> stat_worker to be parked, while stat_worker is waiting for the release of
> the same lock.
> It causes a deadlock resulting in the dumping of hung tasks messages and
> possible rebooting of the device.
>
> This patch prevents the execution of stat_worker during the chip recovery.

Hello,

I was able to hang my 7996 system doing a radio reset yesterday. Is this same
or similar fix needed for 7996?

Thanks
Ben

>
> Signed-off-by: Leon Yen <[email protected]>
> Signed-off-by: Ming Yen Hsieh <[email protected]>
> ---
> drivers/net/wireless/mediatek/mt76/mt7921/mac.c | 2 ++
> drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c | 2 --
> drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c | 2 --
> drivers/net/wireless/mediatek/mt76/sdio.c | 3 ++-
> 4 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
> index 867e14f6b93a..73e42ef42983 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
> @@ -663,6 +663,7 @@ void mt7921_mac_reset_work(struct work_struct *work)
> int i, ret;
>
> dev_dbg(dev->mt76.dev, "chip reset\n");
> + set_bit(MT76_RESET, &dev->mphy.state);
> dev->hw_full_reset = true;
> ieee80211_stop_queues(hw);
>
> @@ -691,6 +692,7 @@ void mt7921_mac_reset_work(struct work_struct *work)
> }
>
> dev->hw_full_reset = false;
> + clear_bit(MT76_RESET, &dev->mphy.state);
> pm->suspended = false;
> ieee80211_wake_queues(hw);
> ieee80211_iterate_active_interfaces(hw,
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> index c866144ff061..031ba9aaa4e2 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> @@ -64,7 +64,6 @@ int mt7921e_mac_reset(struct mt792x_dev *dev)
> mt76_wr(dev, dev->irq_map->host_irq_enable, 0);
> mt76_wr(dev, MT_PCIE_MAC_INT_ENABLE, 0x0);
>
> - set_bit(MT76_RESET, &dev->mphy.state);
> set_bit(MT76_MCU_RESET, &dev->mphy.state);
> wake_up(&dev->mt76.mcu.wait);
> skb_queue_purge(&dev->mt76.mcu.res_q);
> @@ -115,7 +114,6 @@ int mt7921e_mac_reset(struct mt792x_dev *dev)
>
> err = __mt7921_start(&dev->phy);
> out:
> - clear_bit(MT76_RESET, &dev->mphy.state);
>
> local_bh_disable();
> napi_enable(&dev->mt76.tx_napi);
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c
> index 389eb0903807..1f77cf71ca70 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mac.c
> @@ -98,7 +98,6 @@ int mt7921s_mac_reset(struct mt792x_dev *dev)
> mt76_connac_free_pending_tx_skbs(&dev->pm, NULL);
> mt76_txq_schedule_all(&dev->mphy);
> mt76_worker_disable(&dev->mt76.tx_worker);
> - set_bit(MT76_RESET, &dev->mphy.state);
> set_bit(MT76_MCU_RESET, &dev->mphy.state);
> wake_up(&dev->mt76.mcu.wait);
> skb_queue_purge(&dev->mt76.mcu.res_q);
> @@ -135,7 +134,6 @@ int mt7921s_mac_reset(struct mt792x_dev *dev)
>
> err = __mt7921_start(&dev->phy);
> out:
> - clear_bit(MT76_RESET, &dev->mphy.state);
>
> mt76_worker_enable(&dev->mt76.tx_worker);
>
> diff --git a/drivers/net/wireless/mediatek/mt76/sdio.c b/drivers/net/wireless/mediatek/mt76/sdio.c
> index 3e88798df017..a4ed00eebc48 100644
> --- a/drivers/net/wireless/mediatek/mt76/sdio.c
> +++ b/drivers/net/wireless/mediatek/mt76/sdio.c
> @@ -499,7 +499,8 @@ static void mt76s_tx_status_data(struct mt76_worker *worker)
> dev = container_of(sdio, struct mt76_dev, sdio);
>
> while (true) {
> - if (test_bit(MT76_REMOVED, &dev->phy.state))
> + if (test_bit(MT76_RESET, &dev->phy.state) ||
> + test_bit(MT76_REMOVED, &dev->phy.state))
> break;
>
> if (!dev->drv->tx_status_data(dev, &update))

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com