2022-08-02 15:20:18

by Deren Wu

[permalink] [raw]
Subject: [PATCH] mt76: mt7921e: fix crash in chip reset fail

From: Deren Wu <[email protected]>

In case of drv own fail in reset, we may need to run mac_reset several
times. The sequence would trigger system crash as the log below.

Because we do not re-enable/schedule "tx_napi" before disable it again,
the process would keep waiting for state change in napi_diable(). To
avoid the problem and keep status synchronize for each run, goto final
resource handling if drv own failed.

[ 5857.353423] mt7921e 0000:3b:00.0: driver own failed
[ 5858.433427] mt7921e 0000:3b:00.0: Timeout for driver own
[ 5859.633430] mt7921e 0000:3b:00.0: driver own failed
[ 5859.633444] ------------[ cut here ]------------
[ 5859.633446] WARNING: CPU: 6 at kernel/kthread.c:659 kthread_park+0x11d
[ 5859.633717] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[ 5859.633728] RIP: 0010:kthread_park+0x11d/0x150
[ 5859.633736] RSP: 0018:ffff8881b676fc68 EFLAGS: 00010202
......
[ 5859.633766] Call Trace:
[ 5859.633768] <TASK>
[ 5859.633771] mt7921e_mac_reset+0x176/0x6f0 [mt7921e]
[ 5859.633778] mt7921_mac_reset_work+0x184/0x3a0 [mt7921_common]
[ 5859.633785] ? mt7921_mac_set_timing+0x520/0x520 [mt7921_common]
[ 5859.633794] ? __kasan_check_read+0x11/0x20
[ 5859.633802] process_one_work+0x7ee/0x1320
[ 5859.633810] worker_thread+0x53c/0x1240
[ 5859.633818] kthread+0x2b8/0x370
[ 5859.633824] ? process_one_work+0x1320/0x1320
[ 5859.633828] ? kthread_complete_and_exit+0x30/0x30
[ 5859.633834] ret_from_fork+0x1f/0x30
[ 5859.633842] </TASK>

Fixes: 0efaf31dec57 ("mt76: mt7921: fix MT7921E reset failure")
Signed-off-by: Deren Wu <[email protected]>
---
drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
index e1800674089a..576a0149251b 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
@@ -261,7 +261,7 @@ int mt7921e_mac_reset(struct mt7921_dev *dev)

err = mt7921e_driver_own(dev);
if (err)
- return err;
+ goto out;

err = mt7921_run_firmware(dev);
if (err)
--
2.18.0



2022-08-25 02:11:17

by sean wang

[permalink] [raw]
Subject: Re: [PATCH] mt76: mt7921e: fix crash in chip reset fail

Hi Kalle,

If the patch looks good to you, could you help apply the patch to
wireless-drivers.git because there are getting more users reporting
the issue with the stable kernel such as [1]. I would like to backport
it earlier once it appears in the Linus tree to solve the indefinite
hang issue.

[1] https://lore.kernel.org/linux-wireless/VE1PR04MB64945C660A81D38F290E4A4BE59F9@VE1PR04MB6494.eurprd04.prod.outlook.com/T/

Sean

On Tue, Aug 2, 2022 at 8:20 AM Deren Wu <[email protected]> wrote:
>
> From: Deren Wu <[email protected]>
>
> In case of drv own fail in reset, we may need to run mac_reset several
> times. The sequence would trigger system crash as the log below.
>
> Because we do not re-enable/schedule "tx_napi" before disable it again,
> the process would keep waiting for state change in napi_diable(). To
> avoid the problem and keep status synchronize for each run, goto final
> resource handling if drv own failed.
>
> [ 5857.353423] mt7921e 0000:3b:00.0: driver own failed
> [ 5858.433427] mt7921e 0000:3b:00.0: Timeout for driver own
> [ 5859.633430] mt7921e 0000:3b:00.0: driver own failed
> [ 5859.633444] ------------[ cut here ]------------
> [ 5859.633446] WARNING: CPU: 6 at kernel/kthread.c:659 kthread_park+0x11d
> [ 5859.633717] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
> [ 5859.633728] RIP: 0010:kthread_park+0x11d/0x150
> [ 5859.633736] RSP: 0018:ffff8881b676fc68 EFLAGS: 00010202
> ......
> [ 5859.633766] Call Trace:
> [ 5859.633768] <TASK>
> [ 5859.633771] mt7921e_mac_reset+0x176/0x6f0 [mt7921e]
> [ 5859.633778] mt7921_mac_reset_work+0x184/0x3a0 [mt7921_common]
> [ 5859.633785] ? mt7921_mac_set_timing+0x520/0x520 [mt7921_common]
> [ 5859.633794] ? __kasan_check_read+0x11/0x20
> [ 5859.633802] process_one_work+0x7ee/0x1320
> [ 5859.633810] worker_thread+0x53c/0x1240
> [ 5859.633818] kthread+0x2b8/0x370
> [ 5859.633824] ? process_one_work+0x1320/0x1320
> [ 5859.633828] ? kthread_complete_and_exit+0x30/0x30
> [ 5859.633834] ret_from_fork+0x1f/0x30
> [ 5859.633842] </TASK>
>
> Fixes: 0efaf31dec57 ("mt76: mt7921: fix MT7921E reset failure")
> Signed-off-by: Deren Wu <[email protected]>
> ---
> drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> index e1800674089a..576a0149251b 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> @@ -261,7 +261,7 @@ int mt7921e_mac_reset(struct mt7921_dev *dev)
>
> err = mt7921e_driver_own(dev);
> if (err)
> - return err;
> + goto out;
>
> err = mt7921_run_firmware(dev);
> if (err)
> --
> 2.18.0
>

2022-08-26 10:59:28

by Sean Wang

[permalink] [raw]
Subject: Re: [PATCH] mt76: mt7921e: fix crash in chip reset fail

Hi Johannes,

Kalle seemed not available this week, so I would like to look for help from you.
If the patch looks good to you, could you help apply the patch to
wireless-drivers.git because there are getting more users reporting
the issue with the stable kernel such as [1]. I would like to backport
it sooner once it appears in the Linus tree to solve the indefinite
hang issue. Sorry for the hurry request, I knew you just sent the pull
request one moment ago :(

[1] https://lore.kernel.org/linux-wireless/VE1PR04MB64945C660A81D38F290E4A4BE59F9@VE1PR04MB6494.eurprd04.prod.outlook.com/T/

Sean

On Wed, Aug 24, 2022 at 6:45 PM sean wang <[email protected]> wrote:
>
> Hi Kalle,
>
> If the patch looks good to you, could you help apply the patch to
> wireless-drivers.git because there are getting more users reporting
> the issue with the stable kernel such as [1]. I would like to backport
> it earlier once it appears in the Linus tree to solve the indefinite
> hang issue.
>
> [1] https://lore.kernel.org/linux-wireless/VE1PR04MB64945C660A81D38F290E4A4BE59F9@VE1PR04MB6494.eurprd04.prod.outlook.com/T/
>
> Sean
>
> On Tue, Aug 2, 2022 at 8:20 AM Deren Wu <[email protected]> wrote:
> >
> > From: Deren Wu <[email protected]>
> >
> > In case of drv own fail in reset, we may need to run mac_reset several
> > times. The sequence would trigger system crash as the log below.
> >
> > Because we do not re-enable/schedule "tx_napi" before disable it again,
> > the process would keep waiting for state change in napi_diable(). To
> > avoid the problem and keep status synchronize for each run, goto final
> > resource handling if drv own failed.
> >
> > [ 5857.353423] mt7921e 0000:3b:00.0: driver own failed
> > [ 5858.433427] mt7921e 0000:3b:00.0: Timeout for driver own
> > [ 5859.633430] mt7921e 0000:3b:00.0: driver own failed
> > [ 5859.633444] ------------[ cut here ]------------
> > [ 5859.633446] WARNING: CPU: 6 at kernel/kthread.c:659 kthread_park+0x11d
> > [ 5859.633717] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
> > [ 5859.633728] RIP: 0010:kthread_park+0x11d/0x150
> > [ 5859.633736] RSP: 0018:ffff8881b676fc68 EFLAGS: 00010202
> > ......
> > [ 5859.633766] Call Trace:
> > [ 5859.633768] <TASK>
> > [ 5859.633771] mt7921e_mac_reset+0x176/0x6f0 [mt7921e]
> > [ 5859.633778] mt7921_mac_reset_work+0x184/0x3a0 [mt7921_common]
> > [ 5859.633785] ? mt7921_mac_set_timing+0x520/0x520 [mt7921_common]
> > [ 5859.633794] ? __kasan_check_read+0x11/0x20
> > [ 5859.633802] process_one_work+0x7ee/0x1320
> > [ 5859.633810] worker_thread+0x53c/0x1240
> > [ 5859.633818] kthread+0x2b8/0x370
> > [ 5859.633824] ? process_one_work+0x1320/0x1320
> > [ 5859.633828] ? kthread_complete_and_exit+0x30/0x30
> > [ 5859.633834] ret_from_fork+0x1f/0x30
> > [ 5859.633842] </TASK>
> >
> > Fixes: 0efaf31dec57 ("mt76: mt7921: fix MT7921E reset failure")
> > Signed-off-by: Deren Wu <[email protected]>
> > ---
> > drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> > index e1800674089a..576a0149251b 100644
> > --- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> > +++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> > @@ -261,7 +261,7 @@ int mt7921e_mac_reset(struct mt7921_dev *dev)
> >
> > err = mt7921e_driver_own(dev);
> > if (err)
> > - return err;
> > + goto out;
> >
> > err = mt7921_run_firmware(dev);
> > if (err)
> > --
> > 2.18.0
> >

2022-08-29 16:18:13

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] mt76: mt7921e: fix crash in chip reset fail

Sean Wang <[email protected]> writes:

> Kalle seemed not available this week, so I would like to look for help from you.
> If the patch looks good to you, could you help apply the patch to
> wireless-drivers.git because there are getting more users reporting
> the issue with the stable kernel such as [1]. I would like to backport
> it sooner once it appears in the Linus tree to solve the indefinite
> hang issue. Sorry for the hurry request, I knew you just sent the pull
> request one moment ago :(
>
> [1]
> https://lore.kernel.org/linux-wireless/VE1PR04MB64945C660A81D38F290E4A4BE59F9@VE1PR04MB6494.eurprd04.prod.outlook.com/T/

Johannes applied this now:

https://git.kernel.org/wireless/wireless/c/fa3fbe640378

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches