2022-01-12 00:49:25

by Sean Wang

[permalink] [raw]
Subject: Re: [Bug] mt7921e driver in 5.16 causes kernel panic

From: Sean Wang <[email protected]>

>On 1/11/22 16:31, Ben Greear wrote:
>> On 1/11/22 3:17 PM, Khalid Aziz wrote:
>>> I am seeing an intermittent bug in mt7921e driver. When the driver
>>> module is loaded and is being initialized, almost every other time it
>>> seems to write to some wild memory location. This results in driver
>>> failing to initialize with message "Timeout for driver own" and at
>>> the same time I start to see "Bad page state" messages for random
>>> processes. Here is the relevant part of dmesg:
>>
>> Please see if this helps?
>>
>> From: Ben Greear <[email protected]>
>>
>> If the nic fails to start, it is possible that the reset_work has
>> already been scheduled. Ensure the work item is canceled so we do not
>> have use-after-free crash in case cleanup is called before the work
>> item is executed.
>>
>> This fixes crash on my x86_64 apu2 when mt7921k radio fails to work.
>> Radio still fails, but OS does not crash.
>>
>> Signed-off-by: Ben Greear <[email protected]>
>> ---
>> drivers/net/wireless/mediatek/mt76/mt7921/main.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>> b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>> index 6073bedaa1c08..9b33002dcba4a 100644
>> --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>> @@ -272,6 +272,7 @@ static void mt7921_stop(struct ieee80211_hw *hw)
>>
>> cancel_delayed_work_sync(&dev->pm.ps_work);
>> cancel_work_sync(&dev->pm.wake_work);
>> + cancel_work_sync(&dev->reset_work);
>> mt76_connac_free_pending_tx_skbs(&dev->pm, NULL);
>>
>> mt7921_mutex_acquire(dev);
>
>Hi Ben,
>
>Unfortunately that did not help. I still saw the same messages and a kernel panic. I do not see this bug if I power down the laptop before booting it up, so mt7921_stop() would make sense as the reasonable place to fix it.

Hi, Khalid

Could you try the patch below? It should be helpful to your issue

https://patchwork.kernel.org/project/linux-wireless/patch/70e27cbc652cbdb78277b9c691a3a5ba02653afb.1641540175.git.objelf@gmail.com/

>
>Thanks,
>Khalid
>
>


2022-01-12 02:25:00

by Khalid Aziz

[permalink] [raw]
Subject: Re: [Bug] mt7921e driver in 5.16 causes kernel panic

On 1/11/22 17:49, [email protected] wrote:
> From: Sean Wang <[email protected]>
>
>> On 1/11/22 16:31, Ben Greear wrote:
>>> On 1/11/22 3:17 PM, Khalid Aziz wrote:
>>>> I am seeing an intermittent bug in mt7921e driver. When the driver
>>>> module is loaded and is being initialized, almost every other time it
>>>> seems to write to some wild memory location. This results in driver
>>>> failing to initialize with message "Timeout for driver own" and at
>>>> the same time I start to see "Bad page state" messages for random
>>>> processes. Here is the relevant part of dmesg:
>>>
>>> Please see if this helps?
>>>
>>> From: Ben Greear <[email protected]>
>>>
>>> If the nic fails to start, it is possible that the reset_work has
>>> already been scheduled. Ensure the work item is canceled so we do not
>>> have use-after-free crash in case cleanup is called before the work
>>> item is executed.
>>>
>>> This fixes crash on my x86_64 apu2 when mt7921k radio fails to work.
>>> Radio still fails, but OS does not crash.
>>>
>>> Signed-off-by: Ben Greear <[email protected]>
>>> ---
>>> drivers/net/wireless/mediatek/mt76/mt7921/main.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>>> b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>>> index 6073bedaa1c08..9b33002dcba4a 100644
>>> --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>>> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>>> @@ -272,6 +272,7 @@ static void mt7921_stop(struct ieee80211_hw *hw)
>>>
>>> cancel_delayed_work_sync(&dev->pm.ps_work);
>>> cancel_work_sync(&dev->pm.wake_work);
>>> + cancel_work_sync(&dev->reset_work);
>>> mt76_connac_free_pending_tx_skbs(&dev->pm, NULL);
>>>
>>> mt7921_mutex_acquire(dev);
>>
>> Hi Ben,
>>
>> Unfortunately that did not help. I still saw the same messages and a kernel panic. I do not see this bug if I power down the laptop before booting it up, so mt7921_stop() would make sense as the reasonable place to fix it.
>
> Hi, Khalid
>
> Could you try the patch below? It should be helpful to your issue
>
> https://patchwork.kernel.org/project/linux-wireless/patch/70e27cbc652cbdb78277b9c691a3a5ba02653afb.1641540175.git.objelf@gmail.com/

Hi Sean,

That worked! I tried 5 reboots back-to-back after applying your patch
without powering down my laptop. There were no error messages, kernel
came up every time and wifi worked.

Thanks,
Khalid