2022-12-21 01:16:36

by Mikhail Gavrilov

[permalink] [raw]
Subject: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

Hi,
The kernel 6.2 preparation cycle has begun.
And after the kernel was updated on my laptop, the wifi stopped working.

Bisecting blames this commit:
cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae is the first bad commit
commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae
Author: Lorenzo Bianconi <[email protected]>
Date: Sat Nov 12 16:40:35 2022 +0100

wifi: mt76: add WED RX support to mt76_dma_{add,get}_buf

Introduce the capability to configure RX WED in mt76_dma_{add,get}_buf
utility routines.

Tested-by: Daniel Golle <[email protected]>
Co-developed-by: Sujuan Chen <[email protected]>
Signed-off-by: Sujuan Chen <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>

drivers/net/wireless/mediatek/mt76/dma.c | 125 ++++++++++++++++++++----------
drivers/net/wireless/mediatek/mt76/mt76.h | 2 +
2 files changed, 88 insertions(+), 39 deletions(-)

Unfortunately, I can't be sure that revert this commit will fix the
problem. Because after the revert, compile of kernel failing with
follow error:
drivers/net/wireless/mediatek/mt76/mt7915/dma.c: In function ‘mt7915_dma_init’:
drivers/net/wireless/mediatek/mt76/mt7915/dma.c:489:33: error:
implicit declaration of function ‘MT_WED_Q_RX’; did you mean
‘MT_WED_Q_TX’? [-Werror=implicit-function-declaration]
489 | MT_WED_Q_RX(MT7915_RXQ_BAND0);
| ^~~~~~~~~~~
| MT_WED_Q_TX
cc1: some warnings being treated as errors
CC [M] drivers/net/ethernet/intel/igb/e1000_phy.o
make[7]: *** [scripts/Makefile.build:252:
drivers/net/wireless/mediatek/mt76/mt7915/dma.o] Error 1
make[7]: *** Waiting for unfinished jobs....


In the kernel log I see such error traces after commit
cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae

1)
[ 23.642036] ======================================================
[ 23.642304] WARNING: possible circular locking dependency detected
[ 23.642304] 6.1.0-rc5-13-cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae+
#13 Tainted: G W L
[ 23.642304] ------------------------------------------------------
[ 23.642304] kworker/u32:10/831 is trying to acquire lock:
[ 23.642304] ffff8c43b2043c78 (&dev->mutex#3){+.+.}-{3:3}, at:
mt7921_roc_work+0x37/0xa0 [mt7921_common]
[ 23.642304]
but task is already holding lock:
[ 23.642304] ffffaa0501a8fe78
((work_completion)(&dev->phy.roc_work)){+.+.}-{0:0}, at:
process_one_work+0x20b/0x5b0
[ 23.642304]
which lock already depends on the new lock.

[ 23.642304]
the existing dependency chain (in reverse order) is:
[ 23.642304]
-> #1 ((work_completion)(&dev->phy.roc_work)){+.+.}-{0:0}:
[ 23.642304] __flush_work+0x84/0x4b0
[ 23.642304] __cancel_work_timer+0xfc/0x190
[ 23.642304] mt7921_abort_roc+0x3b/0x60 [mt7921_common]
[ 23.642304] mt7921_mgd_complete_tx+0x4c/0x70 [mt7921_common]
[ 23.642304] drv_mgd_complete_tx+0x8c/0x190 [mac80211]
[ 23.642304] ieee80211_sta_rx_queued_mgmt+0x2a5/0x8e0 [mac80211]
[ 23.642304] ieee80211_iface_work+0x328/0x450 [mac80211]
[ 23.642304] process_one_work+0x294/0x5b0
[ 23.642304] worker_thread+0x4f/0x3a0
[ 23.642304] kthread+0xf5/0x120
[ 23.642304] ret_from_fork+0x22/0x30
[ 23.642304]
-> #0 (&dev->mutex#3){+.+.}-{3:3}:
[ 23.642304] __lock_acquire+0x12b1/0x1ef0
[ 23.642304] lock_acquire+0xc2/0x2b0
[ 23.642304] __mutex_lock+0xbb/0x850
[ 23.642304] mt7921_roc_work+0x37/0xa0 [mt7921_common]
[ 23.642304] process_one_work+0x294/0x5b0
[ 23.642304] worker_thread+0x4f/0x3a0
[ 23.642304] kthread+0xf5/0x120
[ 23.642304] ret_from_fork+0x22/0x30
[ 23.642304]
other info that might help us debug this:

[ 23.642304] Possible unsafe locking scenario:

[ 23.642304] CPU0 CPU1
[ 23.642304] ---- ----
[ 23.642304] lock((work_completion)(&dev->phy.roc_work));
[ 23.642304] lock(&dev->mutex#3);
[ 23.669750]
lock((work_completion)(&dev->phy.roc_work));
[ 23.669750] lock(&dev->mutex#3);
[ 23.669750]
*** DEADLOCK ***

[ 23.671578] 2 locks held by kworker/u32:10/831:
[ 23.671578] #0: ffff8c43ba7aa148
((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0x20b/0x5b0
[ 23.671578] #1: ffffaa0501a8fe78
((work_completion)(&dev->phy.roc_work)){+.+.}-{0:0}, at:
process_one_work+0x20b/0x5b0
[ 23.673701]
stack backtrace:
[ 23.673701] CPU: 8 PID: 831 Comm: kworker/u32:10 Tainted: G
W L 6.1.0-rc5-13-cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae+ #13
[ 23.673701] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
[ 23.673701] Workqueue: phy0 mt7921_roc_work [mt7921_common]
[ 23.673701] Call Trace:
[ 23.673701] <TASK>
[ 23.677973] dump_stack_lvl+0x5b/0x77
[ 23.677973] check_noncircular+0xff/0x110
[ 23.677973] ? sched_clock_local+0xe/0x80
[ 23.677973] __lock_acquire+0x12b1/0x1ef0
[ 23.677973] lock_acquire+0xc2/0x2b0
[ 23.677973] ? mt7921_roc_work+0x37/0xa0 [mt7921_common]
[ 23.677973] __mutex_lock+0xbb/0x850
[ 23.681699] ? mt7921_roc_work+0x37/0xa0 [mt7921_common]
[ 23.681699] ? mt7921_roc_work+0x37/0xa0 [mt7921_common]
[ 23.681699] ? mt7921_roc_work+0x37/0xa0 [mt7921_common]
[ 23.681699] mt7921_roc_work+0x37/0xa0 [mt7921_common]
[ 23.681699] process_one_work+0x294/0x5b0
[ 23.681699] worker_thread+0x4f/0x3a0
[ 23.681699] ? process_one_work+0x5b0/0x5b0
[ 23.681699] kthread+0xf5/0x120
[ 23.685767] ? kthread_complete_and_exit+0x20/0x20
[ 23.685767] ret_from_fork+0x22/0x30
[ 23.685767] </TASK>
[ 24.599971] wlp5s0: authentication with 24:cf:24:c2:72:d0 timed out
[ 24.749911] amdgpu 0000:03:00.0: amdgpu: free PSP TMR buffer
[ 27.607726] mt7921e 0000:05:00.0: Message 00020003 (seq 10) timeout
[ 30.615933] mt7921e 0000:05:00.0: Message 00020002 (seq 11) timeout
[ 30.703139] mt7921e 0000:05:00.0: HW/SW Version: 0x8a108a10, Build
Time: 20220908210919a



2)
[ 57.627571] ------------[ cut here ]------------
[ 57.627575] WARNING: CPU: 10 PID: 831 at
drivers/iommu/dma-iommu.c:1038 iommu_dma_unmap_page+0x79/0x90
[ 57.627586] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
intel_rapl_msr intel_rapl_common sunrpc snd_sof_amd_rembrandt
snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek
mt7921e snd_sof snd_hda_codec_generic snd_hda_codec_hdmi mt7921_common
snd_sof_utils edac_mce_amd snd_soc_core binfmt_misc snd_hda_intel
mt76_connac_lib snd_intel_dspcfg btusb snd_compress snd_intel_sdw_acpi
ac97_bus mt76 btrtl snd_pcm_dmaengine kvm_amd snd_hda_codec snd_pci_ps
btbcm snd_hda_core snd_rpl_pci_acp6x btintel vfat snd_pci_acp6x
snd_hwdep mac80211 fat btmtk kvm snd_seq libarc4 snd_seq_device
bluetooth irqbypass snd_pcm cfg80211 snd_pci_acp5x snd_rn_pci_acp3x
snd_timer snd_acp_config rapl snd snd_soc_acpi asus_nb_wmi wmi_bmof
[ 57.627650] pcspkr i2c_piix4 snd_pci_acp3x k10temp soundcore
joydev asus_wireless amd_pmc zram amdgpu crct10dif_pclmul hid_asus
crc32_pclmul drm_ttm_helper crc32c_intel asus_wmi polyval_clmulni ttm
ledtrig_audio sparse_keymap polyval_generic platform_profile iommu_v2
gpu_sched nvme drm_buddy nvme_core drm_display_helper rfkill
ghash_clmulni_intel ucsi_acpi hid_multitouch sha512_ssse3 serio_raw
typec_ucsi ccp r8169 cec sp5100_tco nvme_common typec i2c_hid_acpi
video i2c_hid wmi ip6_tables ip_tables fuse
[ 57.627702] CPU: 10 PID: 831 Comm: kworker/u32:10 Tainted: G
W L 6.1.0-rc5-13-cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae+ #13
[ 57.627706] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
[ 57.627708] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[ 57.627720] RIP: 0010:iommu_dma_unmap_page+0x79/0x90
[ 57.627724] Code: 2b 48 3b 28 72 26 48 3b 68 08 73 20 4d 89 f8 44
89 f1 4c 89 ea 48 89 ee 48 89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d7
76 7e ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66 0f 1f
44 00
[ 57.627727] RSP: 0018:ffffaa0501a8fcb8 EFLAGS: 00010246
[ 57.627730] RAX: 0000000000000000 RBX: ffff8c43933500d0 RCX: 0000000000000000
[ 57.627732] RDX: 0000000000000000 RSI: 0000000000000177 RDI: ffffaa0501a8fca0
[ 57.627734] RBP: ffff8c43933500d0 R08: 00000000ffd77800 R09: 0000000000000081
[ 57.627735] R10: 0000000000000001 R11: 000ffffffffff000 R12: 00000000ffd77800
[ 57.627737] R13: 00000000000006c0 R14: 0000000000000002 R15: 0000000000000000
[ 57.627739] FS: 0000000000000000(0000) GS:ffff8c5258a00000(0000)
knlGS:0000000000000000
[ 57.627740] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 57.627742] CR2: 000055bcc13dc800 CR3: 0000000479228000 CR4: 0000000000750ee0
[ 57.627744] PKRU: 55555554
[ 57.627745] Call Trace:
[ 57.627749] <TASK>
[ 57.627753] dma_unmap_page_attrs+0x4c/0x1d0
[ 57.627763] mt76_dma_get_buf+0xaf/0x190 [mt76]
[ 57.627774] ? free_unref_page+0x1a7/0x280
[ 57.627780] mt76_dma_rx_cleanup+0xa0/0x150 [mt76]
[ 57.627787] mt7921_wpdma_reset+0xb6/0x1d0 [mt7921e]
[ 57.627795] mt7921e_mac_reset+0x141/0x2e0 [mt7921e]
[ 57.627800] mt7921_mac_reset_work+0x8b/0x160 [mt7921_common]
[ 57.627808] process_one_work+0x294/0x5b0
[ 57.627817] worker_thread+0x4f/0x3a0
[ 57.627820] ? process_one_work+0x5b0/0x5b0
[ 57.627822] kthread+0xf5/0x120
[ 57.627826] ? kthread_complete_and_exit+0x20/0x20
[ 57.627830] ret_from_fork+0x22/0x30
[ 57.627838] </TASK>
[ 57.627840] irq event stamp: 135539
[ 57.627841] hardirqs last enabled at (135539):
[<ffffffff92f7a214>] _raw_spin_unlock_irq+0x24/0x50
[ 57.627848] hardirqs last disabled at (135538):
[<ffffffff92f79f18>] _raw_spin_lock_irq+0x68/0x90
[ 57.627851] softirqs last enabled at (135534):
[<ffffffffc2fc2fe8>] __ieee80211_tx_skb_tid_band+0x68/0x250 [mac80211]
[ 57.627896] softirqs last disabled at (135494):
[<ffffffffc2fc2fe8>] __ieee80211_tx_skb_tid_band+0x68/0x250 [mac80211]
[ 57.627924] ---[ end trace 0000000000000000 ]---
[ 57.711796] mt7921e 0000:05:00.0: HW/SW Version: 0x8a108a10, Build
Time: 20220908210919a

Full kernel log is here: https://pastebin.com/ALHUDvSQ

I hope my report helps fix the problem quickly.

--
Best Regards,
Mike Gavrilov.


2022-12-21 10:55:06

by Felix Fietkau

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On 21.12.22 02:10, Mikhail Gavrilov wrote:
> Hi,
> The kernel 6.2 preparation cycle has begun.
> And after the kernel was updated on my laptop, the wifi stopped working.
>
> Bisecting blames this commit:
> cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae is the first bad commit
> commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae
> Author: Lorenzo Bianconi <[email protected]>
> Date: Sat Nov 12 16:40:35 2022 +0100
>
> wifi: mt76: add WED RX support to mt76_dma_{add,get}_buf
>
> Introduce the capability to configure RX WED in mt76_dma_{add,get}_buf
> utility routines.
>
> Tested-by: Daniel Golle <[email protected]>
> Co-developed-by: Sujuan Chen <[email protected]>
> Signed-off-by: Sujuan Chen <[email protected]>
> Signed-off-by: Lorenzo Bianconi <[email protected]>
> Signed-off-by: Felix Fietkau <[email protected]>
>
> drivers/net/wireless/mediatek/mt76/dma.c | 125 ++++++++++++++++++++----------
> drivers/net/wireless/mediatek/mt76/mt76.h | 2 +
> 2 files changed, 88 insertions(+), 39 deletions(-)
>
> Unfortunately, I can't be sure that revert this commit will fix the
> problem. Because after the revert, compile of kernel failing with
> follow error:
> drivers/net/wireless/mediatek/mt76/mt7915/dma.c: In function ‘mt7915_dma_init’:
> drivers/net/wireless/mediatek/mt76/mt7915/dma.c:489:33: error:
> implicit declaration of function ‘MT_WED_Q_RX’; did you mean
> ‘MT_WED_Q_TX’? [-Werror=implicit-function-declaration]
> 489 | MT_WED_Q_RX(MT7915_RXQ_BAND0);
> | ^~~~~~~~~~~
> | MT_WED_Q_TX
> cc1: some warnings being treated as errors
> CC [M] drivers/net/ethernet/intel/igb/e1000_phy.o
> make[7]: *** [scripts/Makefile.build:252:
> drivers/net/wireless/mediatek/mt76/mt7915/dma.o] Error 1
> make[7]: *** Waiting for unfinished jobs....
I'm pretty sure that commit is unrelated to this issue. However, while
looking at the code I found a bug that would explain your issue.

Please try this patch:
---
--- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
@@ -422,15 +422,15 @@ void mt7921_roc_timer(struct timer_list *timer)

static int mt7921_abort_roc(struct mt7921_phy *phy, struct mt7921_vif *vif)
{
- int err;
-
- if (!test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
- return 0;
+ int err = 0;

del_timer_sync(&phy->roc_timer);
cancel_work_sync(&phy->roc_work);
- err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
- clear_bit(MT76_STATE_ROC, &phy->mt76->state);
+
+ mt7921_mutex_acquire(phy->dev);
+ if (test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
+ err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
+ mt7921_mutex_release(phy->dev);

return err;
}
@@ -487,13 +487,8 @@ static int mt7921_cancel_remain_on_channel(struct ieee80211_hw *hw,
{
struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
struct mt7921_phy *phy = mt7921_hw_phy(hw);
- int err;

- mt7921_mutex_acquire(phy->dev);
- err = mt7921_abort_roc(phy, mvif);
- mt7921_mutex_release(phy->dev);
-
- return err;
+ return mt7921_abort_roc(phy, mvif);
}

static int mt7921_set_channel(struct mt7921_phy *phy)
@@ -1778,11 +1773,8 @@ static void mt7921_mgd_complete_tx(struct ieee80211_hw *hw,
struct ieee80211_prep_tx_info *info)
{
struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
- struct mt7921_dev *dev = mt7921_hw_dev(hw);

- mt7921_mutex_acquire(dev);
mt7921_abort_roc(mvif->phy, mvif);
- mt7921_mutex_release(dev);
}

const struct ieee80211_ops mt7921_ops = {

2022-12-21 11:29:52

by Lorenzo Bianconi

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

> On 21.12.22 02:10, Mikhail Gavrilov wrote:
> > Hi,
> > The kernel 6.2 preparation cycle has begun.
> > And after the kernel was updated on my laptop, the wifi stopped working.
> >
> > Bisecting blames this commit:
> > cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae is the first bad commit
> > commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae
> > Author: Lorenzo Bianconi <[email protected]>
> > Date: Sat Nov 12 16:40:35 2022 +0100
> >
> > wifi: mt76: add WED RX support to mt76_dma_{add,get}_buf
> >
> > Introduce the capability to configure RX WED in mt76_dma_{add,get}_buf
> > utility routines.
> >
> > Tested-by: Daniel Golle <[email protected]>
> > Co-developed-by: Sujuan Chen <[email protected]>
> > Signed-off-by: Sujuan Chen <[email protected]>
> > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > Signed-off-by: Felix Fietkau <[email protected]>
> >
> > drivers/net/wireless/mediatek/mt76/dma.c | 125 ++++++++++++++++++++----------
> > drivers/net/wireless/mediatek/mt76/mt76.h | 2 +
> > 2 files changed, 88 insertions(+), 39 deletions(-)
> >
> > Unfortunately, I can't be sure that revert this commit will fix the
> > problem. Because after the revert, compile of kernel failing with
> > follow error:
> > drivers/net/wireless/mediatek/mt76/mt7915/dma.c: In function ‘mt7915_dma_init’:
> > drivers/net/wireless/mediatek/mt76/mt7915/dma.c:489:33: error:
> > implicit declaration of function ‘MT_WED_Q_RX’; did you mean
> > ‘MT_WED_Q_TX’? [-Werror=implicit-function-declaration]
> > 489 | MT_WED_Q_RX(MT7915_RXQ_BAND0);
> > | ^~~~~~~~~~~
> > | MT_WED_Q_TX
> > cc1: some warnings being treated as errors
> > CC [M] drivers/net/ethernet/intel/igb/e1000_phy.o
> > make[7]: *** [scripts/Makefile.build:252:
> > drivers/net/wireless/mediatek/mt76/mt7915/dma.o] Error 1
> > make[7]: *** Waiting for unfinished jobs....
> I'm pretty sure that commit is unrelated to this issue. However, while
> looking at the code I found a bug that would explain your issue.
>
> Please try this patch:
> ---
> --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> @@ -422,15 +422,15 @@ void mt7921_roc_timer(struct timer_list *timer)
> static int mt7921_abort_roc(struct mt7921_phy *phy, struct mt7921_vif *vif)
> {
> - int err;
> -
> - if (!test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
> - return 0;
> + int err = 0;
> del_timer_sync(&phy->roc_timer);
> cancel_work_sync(&phy->roc_work);
> - err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
> - clear_bit(MT76_STATE_ROC, &phy->mt76->state);
> +
> + mt7921_mutex_acquire(phy->dev);
> + if (test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
> + err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
> + mt7921_mutex_release(phy->dev);
> return err;
> }
> @@ -487,13 +487,8 @@ static int mt7921_cancel_remain_on_channel(struct ieee80211_hw *hw,
> {
> struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
> struct mt7921_phy *phy = mt7921_hw_phy(hw);
> - int err;
> - mt7921_mutex_acquire(phy->dev);
> - err = mt7921_abort_roc(phy, mvif);
> - mt7921_mutex_release(phy->dev);
> -
> - return err;
> + return mt7921_abort_roc(phy, mvif);
> }
> static int mt7921_set_channel(struct mt7921_phy *phy)
> @@ -1778,11 +1773,8 @@ static void mt7921_mgd_complete_tx(struct ieee80211_hw *hw,
> struct ieee80211_prep_tx_info *info)
> {
> struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
> - struct mt7921_dev *dev = mt7921_hw_dev(hw);
> - mt7921_mutex_acquire(dev);
> mt7921_abort_roc(mvif->phy, mvif);
> - mt7921_mutex_release(dev);
> }
> const struct ieee80211_ops mt7921_ops = {
>

I guess we have a similar issue for 7663 too:


diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/main.c b/drivers/net/wireless/mediatek/mt76/mt7615/main.c
index ab4c1b4478aa..0405a31fcfd1 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/main.c
@@ -1175,16 +1175,14 @@ static int mt7615_cancel_remain_on_channel(struct ieee80211_hw *hw,
struct ieee80211_vif *vif)
{
struct mt7615_phy *phy = mt7615_hw_phy(hw);
- int err;
-
- if (!test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
- return 0;
+ int err = 0;

del_timer_sync(&phy->roc_timer);
cancel_work_sync(&phy->roc_work);

mt7615_mutex_acquire(phy->dev);
- err = mt7615_mcu_set_roc(phy, vif, NULL, 0);
+ if (test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
+ err = mt7615_mcu_set_roc(phy, vif, NULL, 0);
mt7615_mutex_release(phy->dev);

return err;


Attachments:
(No filename) (4.76 kB)
signature.asc (235.00 B)
Download all attachments

2022-12-21 13:15:09

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On Wed, Dec 21, 2022 at 3:45 PM Felix Fietkau <[email protected]> wrote:
>
> I'm pretty sure that commit is unrelated to this issue. However, while
> looking at the code I found a bug that would explain your issue.
>
> Please try this patch:
> ---
> --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> @@ -422,15 +422,15 @@ void mt7921_roc_timer(struct timer_list *timer)
>
> static int mt7921_abort_roc(struct mt7921_phy *phy, struct mt7921_vif *vif)
> {
> - int err;
> -
> - if (!test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
> - return 0;
> + int err = 0;
>
> del_timer_sync(&phy->roc_timer);
> cancel_work_sync(&phy->roc_work);
> - err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
> - clear_bit(MT76_STATE_ROC, &phy->mt76->state);
> +
> + mt7921_mutex_acquire(phy->dev);
> + if (test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
> + err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
> + mt7921_mutex_release(phy->dev);
>
> return err;
> }
> @@ -487,13 +487,8 @@ static int mt7921_cancel_remain_on_channel(struct ieee80211_hw *hw,
> {
> struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
> struct mt7921_phy *phy = mt7921_hw_phy(hw);
> - int err;
>
> - mt7921_mutex_acquire(phy->dev);
> - err = mt7921_abort_roc(phy, mvif);
> - mt7921_mutex_release(phy->dev);
> -
> - return err;
> + return mt7921_abort_roc(phy, mvif);
> }
>
> static int mt7921_set_channel(struct mt7921_phy *phy)
> @@ -1778,11 +1773,8 @@ static void mt7921_mgd_complete_tx(struct ieee80211_hw *hw,
> struct ieee80211_prep_tx_info *info)
> {
> struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
> - struct mt7921_dev *dev = mt7921_hw_dev(hw);
>
> - mt7921_mutex_acquire(dev);
> mt7921_abort_roc(mvif->phy, mvif);
> - mt7921_mutex_release(dev);
> }
>
> const struct ieee80211_ops mt7921_ops = {
>

Unfortunately this patch did not fix the issue.
There are still many messages in the logs "mt7921e 0000:05:00.0:
AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xffdc6a80
flags=0x0050]"

[ 100.178131] wlp5s0: authenticate with 24:cf:24:c2:72:d0
[ 100.365876] wlp5s0: send auth to 24:cf:24:c2:72:d0 (try 1/3)
[ 100.389318] mt7921e 0000:05:00.0: AMD-Vi: Event logged
[IO_PAGE_FAULT domain=0x0010 address=0xffdc6a80 flags=0x0050]
[ 101.544116] wlp5s0: send auth to 24:cf:24:c2:72:d0 (try 2/3)
[ 102.568019] wlp5s0: send auth to 24:cf:24:c2:72:d0 (try 3/3)
[ 103.591880] wlp5s0: authentication with 24:cf:24:c2:72:d0 timed out
[ 106.600014] mt7921e 0000:05:00.0: Message 00020003 (seq 9) timeout
[ 109.607845] mt7921e 0000:05:00.0: Message 00020002 (seq 10) timeout
[ 109.620007] ------------[ cut here ]------------
[ 109.620022] WARNING: CPU: 3 PID: 9 at
drivers/iommu/dma-iommu.c:1035 iommu_dma_unmap_page+0x79/0x90
[ 109.620043] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi
bnep sunrpc intel_rapl_msr intel_rapl_common snd_sof_amd_rembrandt
snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci mt7921e
snd_sof_xtensa_dsp edac_mce_amd binfmt_misc mt7921_common snd_sof
mt76_connac_lib vfat snd_sof_utils fat snd_soc_core kvm_amd mt76 btusb
snd_hda_intel snd_intel_dspcfg btrtl snd_intel_sdw_acpi snd_hda_codec
btbcm snd_compress mac80211 ac97_bus kvm snd_seq btintel snd_hda_core
snd_pcm_dmaengine snd_pci_ps snd_rpl_pci_acp6x snd_seq_device btmtk
libarc4 irqbypass snd_hwdep snd_pci_acp6x bluetooth snd_pcm
snd_pci_acp5x rapl cfg80211 snd_rn_pci_acp3x snd_acp_config snd_timer
asus_nb_wmi
[ 109.620311] snd_soc_acpi pcspkr wmi_bmof snd i2c_piix4
snd_pci_acp3x k10temp soundcore amd_pmc acpi_cpufreq asus_wireless
joydev zram amdgpu hid_asus asus_wmi ledtrig_audio sparse_keymap
drm_ttm_helper platform_profile crct10dif_pclmul ttm crc32_pclmul
crc32c_intel polyval_clmulni polyval_generic iommu_v2 drm_buddy rfkill
nvme gpu_sched ghash_clmulni_intel ucsi_acpi hid_multitouch
drm_display_helper sha512_ssse3 typec_ucsi serio_raw nvme_core ccp
r8169 sp5100_tco cec typec nvme_common i2c_hid_acpi i2c_hid video wmi
ip6_tables ip_tables fuse
[ 109.620495] CPU: 3 PID: 9 Comm: kworker/u32:0 Tainted: G W
L ------- --- 6.2.0-0.rc0.20221220git6feb57c2fd7c.10.fc38.x86_64
#1
[ 109.620507] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
[ 109.620516] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[ 109.620543] RIP: 0010:iommu_dma_unmap_page+0x79/0x90
[ 109.620555] Code: 2b 48 3b 28 72 26 48 3b 68 08 73 20 4d 89 f8 44
89 f1 4c 89 ea 48 89 ee 48 89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 e7
40 73 ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66 0f 1f
44 00
[ 109.620565] RSP: 0018:ffff9d8840147cb8 EFLAGS: 00010246
[ 109.620577] RAX: 0000000000000000 RBX: ffff8a33d334d0d0 RCX: 0000000000000000
[ 109.620586] RDX: 0000000000000000 RSI: 00000000000001c6 RDI: ffff9d8840147ca0
[ 109.620593] RBP: ffff8a33d334d0d0 R08: 00000000ffdc6000 R09: 0000000000000081
[ 109.620602] R10: 0000000000000001 R11: 000ffffffffff000 R12: 00000000ffdc6000
[ 109.620609] R13: 00000000000006c0 R14: 0000000000000002 R15: 0000000000000000
[ 109.620617] FS: 0000000000000000(0000) GS:ffff8a4296e00000(0000)
knlGS:0000000000000000
[ 109.620665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.620674] CR2: 00000091cba8e000 CR3: 0000000fc4428000 CR4: 0000000000750ee0
[ 109.620682] PKRU: 55555554
[ 109.620690] Call Trace:
[ 109.620699] <TASK>
[ 109.620719] dma_unmap_page_attrs+0x4c/0x1d0
[ 109.620744] mt76_dma_get_buf+0xaf/0x190 [mt76]
[ 109.620777] mt76_dma_rx_cleanup+0xa0/0x150 [mt76]
[ 109.620808] mt7921_wpdma_reset+0xb6/0x1d0 [mt7921e]
[ 109.620837] mt7921e_mac_reset+0x141/0x2e0 [mt7921e]
[ 109.620860] mt7921_mac_reset_work+0x8b/0x160 [mt7921_common]
[ 109.620893] process_one_work+0x294/0x5b0
[ 109.620927] worker_thread+0x4f/0x3a0
[ 109.620946] ? __pfx_worker_thread+0x10/0x10
[ 109.620957] kthread+0xf5/0x120
[ 109.620967] ? __pfx_kthread+0x10/0x10
[ 109.620982] ret_from_fork+0x2c/0x50
[ 109.621022] </TASK>
[ 109.621032] irq event stamp: 1066916
[ 109.621043] hardirqs last enabled at (1066924):
[<ffffffffba1a957e>] __up_console_sem+0x5e/0x70
[ 109.621064] hardirqs last disabled at (1066931):
[<ffffffffba1a9563>] __up_console_sem+0x43/0x70
[ 109.621081] softirqs last enabled at (1063892):
[<ffffffffc16fae9f>] mt76_dma_rx_cleanup+0xcf/0x150 [mt76]
[ 109.621102] softirqs last disabled at (1063910):
[<ffffffffc16fae0d>] mt76_dma_rx_cleanup+0x3d/0x150 [mt76]
[ 109.621135] ---[ end trace 0000000000000000 ]---


Full kernel log: https://pastebin.com/Qfhq6KDc

--
Best Regards,
Mike Gavrilov.

2022-12-21 14:15:12

by Felix Fietkau

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On 21.12.22 14:10, Mikhail Gavrilov wrote:
> On Wed, Dec 21, 2022 at 3:45 PM Felix Fietkau <[email protected]> wrote:
>>
>> I'm pretty sure that commit is unrelated to this issue. However, while
>> looking at the code I found a bug that would explain your issue.
>>
>> Please try this patch:
>> ---
>> --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
>> @@ -422,15 +422,15 @@ void mt7921_roc_timer(struct timer_list *timer)
>>
>> static int mt7921_abort_roc(struct mt7921_phy *phy, struct mt7921_vif *vif)
>> {
>> - int err;
>> -
>> - if (!test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
>> - return 0;
>> + int err = 0;
>>
>> del_timer_sync(&phy->roc_timer);
>> cancel_work_sync(&phy->roc_work);
>> - err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
>> - clear_bit(MT76_STATE_ROC, &phy->mt76->state);
>> +
>> + mt7921_mutex_acquire(phy->dev);
>> + if (test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
>> + err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
>> + mt7921_mutex_release(phy->dev);
>>
>> return err;
>> }
>> @@ -487,13 +487,8 @@ static int mt7921_cancel_remain_on_channel(struct ieee80211_hw *hw,
>> {
>> struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
>> struct mt7921_phy *phy = mt7921_hw_phy(hw);
>> - int err;
>>
>> - mt7921_mutex_acquire(phy->dev);
>> - err = mt7921_abort_roc(phy, mvif);
>> - mt7921_mutex_release(phy->dev);
>> -
>> - return err;
>> + return mt7921_abort_roc(phy, mvif);
>> }
>>
>> static int mt7921_set_channel(struct mt7921_phy *phy)
>> @@ -1778,11 +1773,8 @@ static void mt7921_mgd_complete_tx(struct ieee80211_hw *hw,
>> struct ieee80211_prep_tx_info *info)
>> {
>> struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
>> - struct mt7921_dev *dev = mt7921_hw_dev(hw);
>>
>> - mt7921_mutex_acquire(dev);
>> mt7921_abort_roc(mvif->phy, mvif);
>> - mt7921_mutex_release(dev);
>> }
>>
>> const struct ieee80211_ops mt7921_ops = {
>>
>
> Unfortunately this patch did not fix the issue.
> There are still many messages in the logs "mt7921e 0000:05:00.0:
> AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xffdc6a80
> flags=0x0050]"
Thanks! I guess I focused on the wrong part of your kernel log
initially. After more code review, I found that there is in fact a DMA
related bug in the commit that your bisection pointed to, which happened
to uncover and trigger the deadlock fixed by my other patch.

So here's my fix for the DMA issue:
---
--- a/drivers/net/wireless/mediatek/mt76/dma.c
+++ b/drivers/net/wireless/mediatek/mt76/dma.c
@@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct mt76_queue *q)
mt76_dma_sync_idx(dev, q);
}

+static int
+mt76_dma_add_rx_buf(struct mt76_dev *dev, struct mt76_queue *q,
+ struct mt76_queue_buf *buf, void *data)
+{
+ struct mt76_desc *desc = &q->desc[q->head];
+ struct mt76_queue_entry *entry = &q->entry[q->head];
+ struct mt76_txwi_cache *txwi = NULL;
+ u32 buf1 = 0, ctrl;
+ int idx = q->head;
+ int rx_token;
+
+ ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
+
+ if ((q->flags & MT_QFLAG_WED) &&
+ FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
+ txwi = mt76_get_rxwi(dev);
+ if (!txwi)
+ return -ENOMEM;
+
+ rx_token = mt76_rx_token_consume(dev, data, txwi, buf->addr);
+ if (rx_token < 0) {
+ mt76_put_rxwi(dev, txwi);
+ return -ENOMEM;
+ }
+
+ buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
+ ctrl |= MT_DMA_CTL_TO_HOST;
+ }
+
+ WRITE_ONCE(desc->buf0, cpu_to_le32(buf->addr));
+ WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
+ WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
+ WRITE_ONCE(desc->info, 0);
+
+ entry->dma_addr[0] = buf->addr;
+ entry->dma_len[0] = buf->len;
+ entry->txwi = txwi;
+ entry->buf = data;
+ entry->wcid = 0xffff;
+ entry->skip_buf1 = true;
+ q->head = (q->head + 1) % q->ndesc;
+ q->queued++;
+
+ return idx;
+}
+
static int
mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
struct mt76_queue_buf *buf, int nbufs, u32 info,
@@ -215,6 +261,11 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
int i, idx = -1;
u32 ctrl, next;

+ if (txwi) {
+ q->entry[q->head].txwi = DMA_DUMMY_DATA;
+ q->entry[q->head].skip_buf0 = true;
+ }
+
for (i = 0; i < nbufs; i += 2, buf += 2) {
u32 buf0 = buf[0].addr, buf1 = 0;

@@ -224,51 +275,28 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
desc = &q->desc[idx];
entry = &q->entry[idx];

- if ((q->flags & MT_QFLAG_WED) &&
- FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
- struct mt76_txwi_cache *t = txwi;
- int rx_token;
-
- if (!t)
- return -ENOMEM;
-
- rx_token = mt76_rx_token_consume(dev, (void *)skb, t,
- buf[0].addr);
- if (rx_token < 0)
- return -ENOMEM;
-
- buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
- ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len) |
- MT_DMA_CTL_TO_HOST;
- } else {
- if (txwi) {
- q->entry[next].txwi = DMA_DUMMY_DATA;
- q->entry[next].skip_buf0 = true;
- }
-
- if (buf[0].skip_unmap)
- entry->skip_buf0 = true;
- entry->skip_buf1 = i == nbufs - 1;
-
- entry->dma_addr[0] = buf[0].addr;
- entry->dma_len[0] = buf[0].len;
-
- ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
- if (i < nbufs - 1) {
- entry->dma_addr[1] = buf[1].addr;
- entry->dma_len[1] = buf[1].len;
- buf1 = buf[1].addr;
- ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
- if (buf[1].skip_unmap)
- entry->skip_buf1 = true;
- }
-
- if (i == nbufs - 1)
- ctrl |= MT_DMA_CTL_LAST_SEC0;
- else if (i == nbufs - 2)
- ctrl |= MT_DMA_CTL_LAST_SEC1;
+ if (buf[0].skip_unmap)
+ entry->skip_buf0 = true;
+ entry->skip_buf1 = i == nbufs - 1;
+
+ entry->dma_addr[0] = buf[0].addr;
+ entry->dma_len[0] = buf[0].len;
+
+ ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
+ if (i < nbufs - 1) {
+ entry->dma_addr[1] = buf[1].addr;
+ entry->dma_len[1] = buf[1].len;
+ buf1 = buf[1].addr;
+ ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
+ if (buf[1].skip_unmap)
+ entry->skip_buf1 = true;
}

+ if (i == nbufs - 1)
+ ctrl |= MT_DMA_CTL_LAST_SEC0;
+ else if (i == nbufs - 2)
+ ctrl |= MT_DMA_CTL_LAST_SEC1;
+
WRITE_ONCE(desc->buf0, cpu_to_le32(buf0));
WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
WRITE_ONCE(desc->info, cpu_to_le32(info));
@@ -567,17 +595,9 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue *q)
spin_lock_bh(&q->lock);

while (q->queued < q->ndesc - 1) {
- struct mt76_txwi_cache *t = NULL;
struct mt76_queue_buf qbuf;
void *buf = NULL;

- if ((q->flags & MT_QFLAG_WED) &&
- FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
- t = mt76_get_rxwi(dev);
- if (!t)
- break;
- }
-
buf = page_frag_alloc(&q->rx_page, q->buf_size, GFP_ATOMIC);
if (!buf)
break;
@@ -591,7 +611,7 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue *q)
qbuf.addr = addr + offset;
qbuf.len = len - offset;
qbuf.skip_unmap = false;
- if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
+ if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
dma_unmap_single(dev->dma_dev, addr, len,
DMA_FROM_DEVICE);
skb_free_frag(buf);

2022-12-21 16:16:40

by Lorenzo Bianconi

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

> On 21.12.22 14:10, Mikhail Gavrilov wrote:
> > On Wed, Dec 21, 2022 at 3:45 PM Felix Fietkau <[email protected]> wrote:
> > >
> > > I'm pretty sure that commit is unrelated to this issue. However, while
> > > looking at the code I found a bug that would explain your issue.
> > >
> > > Please try this patch:
> > > ---
> > > --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> > > +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> > > @@ -422,15 +422,15 @@ void mt7921_roc_timer(struct timer_list *timer)
> > >
> > > static int mt7921_abort_roc(struct mt7921_phy *phy, struct mt7921_vif *vif)
> > > {
> > > - int err;
> > > -
> > > - if (!test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
> > > - return 0;
> > > + int err = 0;
> > >
> > > del_timer_sync(&phy->roc_timer);
> > > cancel_work_sync(&phy->roc_work);
> > > - err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
> > > - clear_bit(MT76_STATE_ROC, &phy->mt76->state);
> > > +
> > > + mt7921_mutex_acquire(phy->dev);
> > > + if (test_and_clear_bit(MT76_STATE_ROC, &phy->mt76->state))
> > > + err = mt7921_mcu_abort_roc(phy, vif, phy->roc_token_id);
> > > + mt7921_mutex_release(phy->dev);
> > >
> > > return err;
> > > }
> > > @@ -487,13 +487,8 @@ static int mt7921_cancel_remain_on_channel(struct ieee80211_hw *hw,
> > > {
> > > struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
> > > struct mt7921_phy *phy = mt7921_hw_phy(hw);
> > > - int err;
> > >
> > > - mt7921_mutex_acquire(phy->dev);
> > > - err = mt7921_abort_roc(phy, mvif);
> > > - mt7921_mutex_release(phy->dev);
> > > -
> > > - return err;
> > > + return mt7921_abort_roc(phy, mvif);
> > > }
> > >
> > > static int mt7921_set_channel(struct mt7921_phy *phy)
> > > @@ -1778,11 +1773,8 @@ static void mt7921_mgd_complete_tx(struct ieee80211_hw *hw,
> > > struct ieee80211_prep_tx_info *info)
> > > {
> > > struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
> > > - struct mt7921_dev *dev = mt7921_hw_dev(hw);
> > >
> > > - mt7921_mutex_acquire(dev);
> > > mt7921_abort_roc(mvif->phy, mvif);
> > > - mt7921_mutex_release(dev);
> > > }
> > >
> > > const struct ieee80211_ops mt7921_ops = {
> > >
> >
> > Unfortunately this patch did not fix the issue.
> > There are still many messages in the logs "mt7921e 0000:05:00.0:
> > AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xffdc6a80
> > flags=0x0050]"
> Thanks! I guess I focused on the wrong part of your kernel log
> initially. After more code review, I found that there is in fact a DMA
> related bug in the commit that your bisection pointed to, which happened
> to uncover and trigger the deadlock fixed by my other patch.
>
> So here's my fix for the DMA issue:

Thx for fixing this issue, I tested the patch with mt7986 w and w/o WED enabled and it works
fine.

Tested-by: Lorenzo Bianconi <[email protected]>

> ---
> --- a/drivers/net/wireless/mediatek/mt76/dma.c
> +++ b/drivers/net/wireless/mediatek/mt76/dma.c
> @@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct mt76_queue *q)
> mt76_dma_sync_idx(dev, q);
> }
> +static int
> +mt76_dma_add_rx_buf(struct mt76_dev *dev, struct mt76_queue *q,
> + struct mt76_queue_buf *buf, void *data)
> +{
> + struct mt76_desc *desc = &q->desc[q->head];
> + struct mt76_queue_entry *entry = &q->entry[q->head];
> + struct mt76_txwi_cache *txwi = NULL;
> + u32 buf1 = 0, ctrl;
> + int idx = q->head;
> + int rx_token;
> +
> + ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
> +
> + if ((q->flags & MT_QFLAG_WED) &&
> + FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
> + txwi = mt76_get_rxwi(dev);
> + if (!txwi)
> + return -ENOMEM;
> +
> + rx_token = mt76_rx_token_consume(dev, data, txwi, buf->addr);
> + if (rx_token < 0) {
> + mt76_put_rxwi(dev, txwi);
> + return -ENOMEM;
> + }
> +
> + buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
> + ctrl |= MT_DMA_CTL_TO_HOST;
> + }
> +
> + WRITE_ONCE(desc->buf0, cpu_to_le32(buf->addr));
> + WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
> + WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
> + WRITE_ONCE(desc->info, 0);
> +
> + entry->dma_addr[0] = buf->addr;
> + entry->dma_len[0] = buf->len;
> + entry->txwi = txwi;
> + entry->buf = data;
> + entry->wcid = 0xffff;
> + entry->skip_buf1 = true;
> + q->head = (q->head + 1) % q->ndesc;
> + q->queued++;
> +
> + return idx;
> +}
> +
> static int
> mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
> struct mt76_queue_buf *buf, int nbufs, u32 info,
> @@ -215,6 +261,11 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
> int i, idx = -1;
> u32 ctrl, next;
> + if (txwi) {
> + q->entry[q->head].txwi = DMA_DUMMY_DATA;
> + q->entry[q->head].skip_buf0 = true;
> + }
> +
> for (i = 0; i < nbufs; i += 2, buf += 2) {
> u32 buf0 = buf[0].addr, buf1 = 0;
> @@ -224,51 +275,28 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
> desc = &q->desc[idx];
> entry = &q->entry[idx];
> - if ((q->flags & MT_QFLAG_WED) &&
> - FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
> - struct mt76_txwi_cache *t = txwi;
> - int rx_token;
> -
> - if (!t)
> - return -ENOMEM;
> -
> - rx_token = mt76_rx_token_consume(dev, (void *)skb, t,
> - buf[0].addr);
> - if (rx_token < 0)
> - return -ENOMEM;
> -
> - buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
> - ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len) |
> - MT_DMA_CTL_TO_HOST;
> - } else {
> - if (txwi) {
> - q->entry[next].txwi = DMA_DUMMY_DATA;
> - q->entry[next].skip_buf0 = true;
> - }
> -
> - if (buf[0].skip_unmap)
> - entry->skip_buf0 = true;
> - entry->skip_buf1 = i == nbufs - 1;
> -
> - entry->dma_addr[0] = buf[0].addr;
> - entry->dma_len[0] = buf[0].len;
> -
> - ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
> - if (i < nbufs - 1) {
> - entry->dma_addr[1] = buf[1].addr;
> - entry->dma_len[1] = buf[1].len;
> - buf1 = buf[1].addr;
> - ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
> - if (buf[1].skip_unmap)
> - entry->skip_buf1 = true;
> - }
> -
> - if (i == nbufs - 1)
> - ctrl |= MT_DMA_CTL_LAST_SEC0;
> - else if (i == nbufs - 2)
> - ctrl |= MT_DMA_CTL_LAST_SEC1;
> + if (buf[0].skip_unmap)
> + entry->skip_buf0 = true;
> + entry->skip_buf1 = i == nbufs - 1;
> +
> + entry->dma_addr[0] = buf[0].addr;
> + entry->dma_len[0] = buf[0].len;
> +
> + ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
> + if (i < nbufs - 1) {
> + entry->dma_addr[1] = buf[1].addr;
> + entry->dma_len[1] = buf[1].len;
> + buf1 = buf[1].addr;
> + ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
> + if (buf[1].skip_unmap)
> + entry->skip_buf1 = true;
> }
> + if (i == nbufs - 1)
> + ctrl |= MT_DMA_CTL_LAST_SEC0;
> + else if (i == nbufs - 2)
> + ctrl |= MT_DMA_CTL_LAST_SEC1;
> +
> WRITE_ONCE(desc->buf0, cpu_to_le32(buf0));
> WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
> WRITE_ONCE(desc->info, cpu_to_le32(info));
> @@ -567,17 +595,9 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue *q)
> spin_lock_bh(&q->lock);
> while (q->queued < q->ndesc - 1) {
> - struct mt76_txwi_cache *t = NULL;
> struct mt76_queue_buf qbuf;
> void *buf = NULL;
> - if ((q->flags & MT_QFLAG_WED) &&
> - FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
> - t = mt76_get_rxwi(dev);
> - if (!t)
> - break;
> - }
> -
> buf = page_frag_alloc(&q->rx_page, q->buf_size, GFP_ATOMIC);
> if (!buf)
> break;
> @@ -591,7 +611,7 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue *q)
> qbuf.addr = addr + offset;
> qbuf.len = len - offset;
> qbuf.skip_unmap = false;
> - if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
> + if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
> dma_unmap_single(dev->dma_dev, addr, len,
> DMA_FROM_DEVICE);
> skb_free_frag(buf);
>


Attachments:
(No filename) (8.18 kB)
signature.asc (235.00 B)
Download all attachments

2022-12-21 16:51:33

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On Wed, Dec 21, 2022 at 7:12 PM Felix Fietkau <[email protected]> wrote:
>
> Thanks! I guess I focused on the wrong part of your kernel log
> initially. After more code review, I found that there is in fact a DMA
> related bug in the commit that your bisection pointed to, which happened
> to uncover and trigger the deadlock fixed by my other patch.
>
> So here's my fix for the DMA issue:
> ---
[cutted]
> qbuf.skip_unmap = false;
> - if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
> + if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
> dma_unmap_single(dev->dma_dev, addr, len,
> DMA_FROM_DEVICE);
> skb_free_frag(buf);
>

Sorry for stupid question.

Do you have a separate branch?
I see that the code is differ between master branch and the patch.

For example in patch the line:
- if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
replaced by the line:
+ if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {

But in master branch
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/mediatek/mt76/dma.c?id=b6bb9676f2165d518b35ba3bea5f1fcfc0d969bf#n604
after line:
qbuf.skip_unmap = false;
followed the line:
mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
without if condition.

So I'm stuck applying the patch :(

--
Best Regards,
Mike Gavrilov.

2022-12-21 17:20:59

by Felix Fietkau

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On 21.12.22 17:46, Mikhail Gavrilov wrote:
> On Wed, Dec 21, 2022 at 7:12 PM Felix Fietkau <[email protected]> wrote:
>>
>> Thanks! I guess I focused on the wrong part of your kernel log
>> initially. After more code review, I found that there is in fact a DMA
>> related bug in the commit that your bisection pointed to, which happened
>> to uncover and trigger the deadlock fixed by my other patch.
>>
>> So here's my fix for the DMA issue:
>> ---
> [cutted]
>> qbuf.skip_unmap = false;
>> - if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
>> + if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>> dma_unmap_single(dev->dma_dev, addr, len,
>> DMA_FROM_DEVICE);
>> skb_free_frag(buf);
>>
>
> Sorry for stupid question.
>
> Do you have a separate branch?
> I see that the code is differ between master branch and the patch.
>
> For example in patch the line:
> - if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
> replaced by the line:
> + if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>
> But in master branch
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/mediatek/mt76/dma.c?id=b6bb9676f2165d518b35ba3bea5f1fcfc0d969bf#n604
> after line:
> qbuf.skip_unmap = false;
> followed the line:
> mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
> without if condition.
>
> So I'm stuck applying the patch :(
Sorry, I worked on a tree that had other pending fixes applied.
Please try this:


--- a/drivers/net/wireless/mediatek/mt76/dma.c
+++ b/drivers/net/wireless/mediatek/mt76/dma.c
@@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct mt76_queue *q)
mt76_dma_sync_idx(dev, q);
}

+static int
+mt76_dma_add_rx_buf(struct mt76_dev *dev, struct mt76_queue *q,
+ struct mt76_queue_buf *buf, void *data)
+{
+ struct mt76_desc *desc = &q->desc[q->head];
+ struct mt76_queue_entry *entry = &q->entry[q->head];
+ struct mt76_txwi_cache *txwi = NULL;
+ u32 buf1 = 0, ctrl;
+ int idx = q->head;
+ int rx_token;
+
+ ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
+
+ if ((q->flags & MT_QFLAG_WED) &&
+ FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
+ txwi = mt76_get_rxwi(dev);
+ if (!txwi)
+ return -ENOMEM;
+
+ rx_token = mt76_rx_token_consume(dev, data, txwi, buf->addr);
+ if (rx_token < 0) {
+ mt76_put_rxwi(dev, txwi);
+ return -ENOMEM;
+ }
+
+ buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
+ ctrl |= MT_DMA_CTL_TO_HOST;
+ }
+
+ WRITE_ONCE(desc->buf0, cpu_to_le32(buf->addr));
+ WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
+ WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
+ WRITE_ONCE(desc->info, 0);
+
+ entry->dma_addr[0] = buf->addr;
+ entry->dma_len[0] = buf->len;
+ entry->txwi = txwi;
+ entry->buf = data;
+ entry->wcid = 0xffff;
+ entry->skip_buf1 = true;
+ q->head = (q->head + 1) % q->ndesc;
+ q->queued++;
+
+ return idx;
+}
+
static int
mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
struct mt76_queue_buf *buf, int nbufs, u32 info,
@@ -212,65 +258,51 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
{
struct mt76_queue_entry *entry;
struct mt76_desc *desc;
- u32 ctrl;
int i, idx = -1;
+ u32 ctrl, next;
+
+ if (txwi) {
+ q->entry[q->head].txwi = DMA_DUMMY_DATA;
+ q->entry[q->head].skip_buf0 = true;
+ }

for (i = 0; i < nbufs; i += 2, buf += 2) {
u32 buf0 = buf[0].addr, buf1 = 0;

idx = q->head;
- q->head = (q->head + 1) % q->ndesc;
+ next = (q->head + 1) % q->ndesc;

desc = &q->desc[idx];
entry = &q->entry[idx];

- if ((q->flags & MT_QFLAG_WED) &&
- FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
- struct mt76_txwi_cache *t = txwi;
- int rx_token;
-
- if (!t)
- return -ENOMEM;
-
- rx_token = mt76_rx_token_consume(dev, (void *)skb, t,
- buf[0].addr);
- buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
- ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len) |
- MT_DMA_CTL_TO_HOST;
- } else {
- if (txwi) {
- q->entry[q->head].txwi = DMA_DUMMY_DATA;
- q->entry[q->head].skip_buf0 = true;
- }
-
- if (buf[0].skip_unmap)
- entry->skip_buf0 = true;
- entry->skip_buf1 = i == nbufs - 1;
-
- entry->dma_addr[0] = buf[0].addr;
- entry->dma_len[0] = buf[0].len;
-
- ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
- if (i < nbufs - 1) {
- entry->dma_addr[1] = buf[1].addr;
- entry->dma_len[1] = buf[1].len;
- buf1 = buf[1].addr;
- ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
- if (buf[1].skip_unmap)
- entry->skip_buf1 = true;
- }
-
- if (i == nbufs - 1)
- ctrl |= MT_DMA_CTL_LAST_SEC0;
- else if (i == nbufs - 2)
- ctrl |= MT_DMA_CTL_LAST_SEC1;
+ if (buf[0].skip_unmap)
+ entry->skip_buf0 = true;
+ entry->skip_buf1 = i == nbufs - 1;
+
+ entry->dma_addr[0] = buf[0].addr;
+ entry->dma_len[0] = buf[0].len;
+
+ ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
+ if (i < nbufs - 1) {
+ entry->dma_addr[1] = buf[1].addr;
+ entry->dma_len[1] = buf[1].len;
+ buf1 = buf[1].addr;
+ ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
+ if (buf[1].skip_unmap)
+ entry->skip_buf1 = true;
}

+ if (i == nbufs - 1)
+ ctrl |= MT_DMA_CTL_LAST_SEC0;
+ else if (i == nbufs - 2)
+ ctrl |= MT_DMA_CTL_LAST_SEC1;
+
WRITE_ONCE(desc->buf0, cpu_to_le32(buf0));
WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
WRITE_ONCE(desc->info, cpu_to_le32(info));
WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));

+ q->head = next;
q->queued++;
}

@@ -577,17 +609,9 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue *q)
spin_lock_bh(&q->lock);

while (q->queued < q->ndesc - 1) {
- struct mt76_txwi_cache *t = NULL;
struct mt76_queue_buf qbuf;
void *buf = NULL;

- if ((q->flags & MT_QFLAG_WED) &&
- FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
- t = mt76_get_rxwi(dev);
- if (!t)
- break;
- }
-
buf = page_frag_alloc(rx_page, q->buf_size, GFP_ATOMIC);
if (!buf)
break;
@@ -601,7 +625,12 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue *q)
qbuf.addr = addr + offset;
qbuf.len = len - offset;
qbuf.skip_unmap = false;
- mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
+ if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
+ dma_unmap_single(dev->dma_dev, addr, len,
+ DMA_FROM_DEVICE);
+ skb_free_frag(buf);
+ break;
+ }
frames++;
}


--- a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
@@ -653,6 +653,13 @@ static u32 mt7915_mmio_wed_init_rx_buf(struct mtk_wed_device *wed, int size)

desc->buf0 = cpu_to_le32(phy_addr);
token = mt76_rx_token_consume(&dev->mt76, ptr, t, phy_addr);
+ if (token < 0) {
+ dma_unmap_single(dev->mt76.dma_dev, phy_addr,
+ wed->wlan.rx_size, DMA_TO_DEVICE);
+ skb_free_frag(ptr);
+ goto unmap;
+ }
+
desc->token |= cpu_to_le32(FIELD_PREP(MT_DMA_CTL_TOKEN,
token));
desc++;

--- a/drivers/net/wireless/mediatek/mt76/tx.c
+++ b/drivers/net/wireless/mediatek/mt76/tx.c
@@ -764,11 +764,12 @@ int mt76_rx_token_consume(struct mt76_dev *dev, void *ptr,
spin_lock_bh(&dev->rx_token_lock);
token = idr_alloc(&dev->rx_token, t, 0, dev->rx_token_size,
GFP_ATOMIC);
+ if (token >= 0) {
+ t->ptr = ptr;
+ t->dma_addr = phys;
+ }
spin_unlock_bh(&dev->rx_token_lock);

- t->ptr = ptr;
- t->dma_addr = phys;
-
return token;
}
EXPORT_SYMBOL_GPL(mt76_rx_token_consume);

2022-12-22 06:51:44

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On Wed, Dec 21, 2022 at 10:17 PM Felix Fietkau <[email protected]> wrote:
>
> Sorry, I worked on a tree that had other pending fixes applied.
> Please try this:
>
>
> --- a/drivers/net/wireless/mediatek/mt76/dma.c
> +++ b/drivers/net/wireless/mediatek/mt76/dma.c
> @@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct mt76_queue *q)
> mt76_dma_sync_idx(dev, q);
> }
[cutted]
> EXPORT_SYMBOL_GPL(mt76_rx_token_consume);
>

I confirms after applying this patch the issue was gone (wifi works as
before commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae).

Tested-by: Mikhail Gavrilov <[email protected]>

--
Best Regards,
Mike Gavrilov.

2022-12-22 12:47:21

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e #forregzbot

[Note: this mail contains only information for Linux kernel regression
tracking. Mails like these contain '#forregzbot' in the subject to make
then easy to spot and filter out. The author also tried to remove most
or all individuals from the list of recipients to spare them the hassle.]

On 21.12.22 02:10, Mikhail Gavrilov wrote:
> Hi,
> The kernel 6.2 preparation cycle has begun.
> And after the kernel was updated on my laptop, the wifi stopped working.
>
> Bisecting blames this commit:
> cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae is the first bad commit
> commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae
> Author: Lorenzo Bianconi <[email protected]>
> Date: Sat Nov 12 16:40:35 2022 +0100
>
> wifi: mt76: add WED RX support to mt76_dma_{add,get}_buf
>

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot introduced cd372b8c99c5a5 ^
https://bugzilla.kernel.org/show_bug.cgi?id=216829
#regzbot title wifi: mt76: wifi stopped working
#regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

2022-12-24 08:01:41

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e



On 22.12.22 07:47, Mikhail Gavrilov wrote:
> On Wed, Dec 21, 2022 at 10:17 PM Felix Fietkau <[email protected]> wrote:
>>
>> Sorry, I worked on a tree that had other pending fixes applied.
>> Please try this:
>>
>>
>> --- a/drivers/net/wireless/mediatek/mt76/dma.c
>> +++ b/drivers/net/wireless/mediatek/mt76/dma.c
>> @@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct mt76_queue *q)
>> mt76_dma_sync_idx(dev, q);
>> }
> [cutted]
>> EXPORT_SYMBOL_GPL(mt76_rx_token_consume);
>>
>
> I confirms after applying this patch the issue was gone (wifi works as
> before commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae).
>
> Tested-by: Mikhail Gavrilov <[email protected]>
>

TWIMC, there are two more reports that at least to my eyes look like
they are about the problem discussed here:

https://bugzilla.kernel.org/show_bug.cgi?id=216829
https://bugzilla.kernel.org/show_bug.cgi?id=216839

I pointed both reporters to this thread.

Ciao, thorsten

2022-12-26 11:06:24

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e



On 24.12.22 08:55, Thorsten Leemhuis wrote:
>
>
> On 22.12.22 07:47, Mikhail Gavrilov wrote:
>> On Wed, Dec 21, 2022 at 10:17 PM Felix Fietkau <[email protected]> wrote:
>>>
>>> Sorry, I worked on a tree that had other pending fixes applied.
>>> Please try this:
>>>
>>>
>>> --- a/drivers/net/wireless/mediatek/mt76/dma.c
>>> +++ b/drivers/net/wireless/mediatek/mt76/dma.c
>>> @@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct mt76_queue *q)
>>> mt76_dma_sync_idx(dev, q);
>>> }
>> [cutted]
>>> EXPORT_SYMBOL_GPL(mt76_rx_token_consume);
>>>
>>
>> I confirms after applying this patch the issue was gone (wifi works as
>> before commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae).
>>
>> Tested-by: Mikhail Gavrilov <[email protected]>
>>
>
> TWIMC, there are two more reports that at least to my eyes look like
> they are about the problem discussed here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216829

Stupid me, this one...

> https://bugzilla.kernel.org/show_bug.cgi?id=216839

...is about 6.1.y and thus likely something else. Apologies.

FWIW & for completeness, according to a comment from Lorenzo Bianconi
the latter bug report is fixed by:
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
("wifi: mac80211: fix initialization of rx->link and rx->link_sta")

Ciao, Thorsten

2023-01-04 14:21:33

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Felix, Lorenzo, did below fix for the regression Mikhail reported make
any progress to get mainlined? It doesn't look like it from here, but I
suspect I missed something, that's why I'm asking.

Ciao, Thorsten
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 21.12.22 18:17, Felix Fietkau wrote:
> On 21.12.22 17:46, Mikhail Gavrilov wrote:
>> On Wed, Dec 21, 2022 at 7:12 PM Felix Fietkau <[email protected]> wrote:
>>>
>>> Thanks! I guess I focused on the wrong part of your kernel log
>>> initially. After more code review, I found that there is in fact a DMA
>>> related bug in the commit that your bisection pointed to, which happened
>>> to uncover and trigger the deadlock fixed by my other patch.
>>>
>>> So here's my fix for the DMA issue:
>>> ---
>> [cutted]
>>>                 qbuf.skip_unmap = false;
>>> -               if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
>>> +               if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>>>                         dma_unmap_single(dev->dma_dev, addr, len,
>>>                                          DMA_FROM_DEVICE);
>>>                         skb_free_frag(buf);
>>>
>>
>> Sorry for stupid question.
>>
>> Do you have a separate branch?
>> I see that the code is differ between master branch and the patch.
>>
>> For example in patch the line:
>> - if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
>> replaced by the line:
>> + if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>>
>> But in master branch
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/mediatek/mt76/dma.c?id=b6bb9676f2165d518b35ba3bea5f1fcfc0d969bf#n604
>> after line:
>> qbuf.skip_unmap = false;
>> followed the line:
>> mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
>> without if condition.
>>
>> So I'm stuck applying the patch :(
> Sorry, I worked on a tree that had other pending fixes applied.
> Please try this:
>
>
> --- a/drivers/net/wireless/mediatek/mt76/dma.c
> +++ b/drivers/net/wireless/mediatek/mt76/dma.c
> @@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct
> mt76_queue *q)
>      mt76_dma_sync_idx(dev, q);
>  }
>  
> +static int
> +mt76_dma_add_rx_buf(struct mt76_dev *dev, struct mt76_queue *q,
> +            struct mt76_queue_buf *buf, void *data)
> +{
> +    struct mt76_desc *desc = &q->desc[q->head];
> +    struct mt76_queue_entry *entry = &q->entry[q->head];
> +    struct mt76_txwi_cache *txwi = NULL;
> +    u32 buf1 = 0, ctrl;
> +    int idx = q->head;
> +    int rx_token;
> +
> +    ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
> +
> +    if ((q->flags & MT_QFLAG_WED) &&
> +        FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
> +        txwi = mt76_get_rxwi(dev);
> +        if (!txwi)
> +            return -ENOMEM;
> +
> +        rx_token = mt76_rx_token_consume(dev, data, txwi, buf->addr);
> +        if (rx_token < 0) {
> +            mt76_put_rxwi(dev, txwi);
> +            return -ENOMEM;
> +        }
> +
> +        buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
> +        ctrl |= MT_DMA_CTL_TO_HOST;
> +    }
> +
> +    WRITE_ONCE(desc->buf0, cpu_to_le32(buf->addr));
> +    WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
> +    WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
> +    WRITE_ONCE(desc->info, 0);
> +
> +    entry->dma_addr[0] = buf->addr;
> +    entry->dma_len[0] = buf->len;
> +    entry->txwi = txwi;
> +    entry->buf = data;
> +    entry->wcid = 0xffff;
> +    entry->skip_buf1 = true;
> +    q->head = (q->head + 1) % q->ndesc;
> +    q->queued++;
> +
> +    return idx;
> +}
> +
>  static int
>  mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
>           struct mt76_queue_buf *buf, int nbufs, u32 info,
> @@ -212,65 +258,51 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct
> mt76_queue *q,
>  {
>      struct mt76_queue_entry *entry;
>      struct mt76_desc *desc;
> -    u32 ctrl;
>      int i, idx = -1;
> +    u32 ctrl, next;
> +
> +    if (txwi) {
> +        q->entry[q->head].txwi = DMA_DUMMY_DATA;
> +        q->entry[q->head].skip_buf0 = true;
> +    }
>  
>      for (i = 0; i < nbufs; i += 2, buf += 2) {
>          u32 buf0 = buf[0].addr, buf1 = 0;
>  
>          idx = q->head;
> -        q->head = (q->head + 1) % q->ndesc;
> +        next = (q->head + 1) % q->ndesc;
>  
>          desc = &q->desc[idx];
>          entry = &q->entry[idx];
>  
> -        if ((q->flags & MT_QFLAG_WED) &&
> -            FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
> -            struct mt76_txwi_cache *t = txwi;
> -            int rx_token;
> -
> -            if (!t)
> -                return -ENOMEM;
> -
> -            rx_token = mt76_rx_token_consume(dev, (void *)skb, t,
> -                             buf[0].addr);
> -            buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
> -            ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len) |
> -                   MT_DMA_CTL_TO_HOST;
> -        } else {
> -            if (txwi) {
> -                q->entry[q->head].txwi = DMA_DUMMY_DATA;
> -                q->entry[q->head].skip_buf0 = true;
> -            }
> -
> -            if (buf[0].skip_unmap)
> -                entry->skip_buf0 = true;
> -            entry->skip_buf1 = i == nbufs - 1;
> -
> -            entry->dma_addr[0] = buf[0].addr;
> -            entry->dma_len[0] = buf[0].len;
> -
> -            ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
> -            if (i < nbufs - 1) {
> -                entry->dma_addr[1] = buf[1].addr;
> -                entry->dma_len[1] = buf[1].len;
> -                buf1 = buf[1].addr;
> -                ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
> -                if (buf[1].skip_unmap)
> -                    entry->skip_buf1 = true;
> -            }
> -
> -            if (i == nbufs - 1)
> -                ctrl |= MT_DMA_CTL_LAST_SEC0;
> -            else if (i == nbufs - 2)
> -                ctrl |= MT_DMA_CTL_LAST_SEC1;
> +        if (buf[0].skip_unmap)
> +            entry->skip_buf0 = true;
> +        entry->skip_buf1 = i == nbufs - 1;
> +
> +        entry->dma_addr[0] = buf[0].addr;
> +        entry->dma_len[0] = buf[0].len;
> +
> +        ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
> +        if (i < nbufs - 1) {
> +            entry->dma_addr[1] = buf[1].addr;
> +            entry->dma_len[1] = buf[1].len;
> +            buf1 = buf[1].addr;
> +            ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
> +            if (buf[1].skip_unmap)
> +                entry->skip_buf1 = true;
>          }
>  
> +        if (i == nbufs - 1)
> +            ctrl |= MT_DMA_CTL_LAST_SEC0;
> +        else if (i == nbufs - 2)
> +            ctrl |= MT_DMA_CTL_LAST_SEC1;
> +
>          WRITE_ONCE(desc->buf0, cpu_to_le32(buf0));
>          WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
>          WRITE_ONCE(desc->info, cpu_to_le32(info));
>          WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
>  
> +        q->head = next;
>          q->queued++;
>      }
>  
> @@ -577,17 +609,9 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct
> mt76_queue *q)
>      spin_lock_bh(&q->lock);
>  
>      while (q->queued < q->ndesc - 1) {
> -        struct mt76_txwi_cache *t = NULL;
>          struct mt76_queue_buf qbuf;
>          void *buf = NULL;
>  
> -        if ((q->flags & MT_QFLAG_WED) &&
> -            FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
> -            t = mt76_get_rxwi(dev);
> -            if (!t)
> -                break;
> -        }
> -
>          buf = page_frag_alloc(rx_page, q->buf_size, GFP_ATOMIC);
>          if (!buf)
>              break;
> @@ -601,7 +625,12 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct
> mt76_queue *q)
>          qbuf.addr = addr + offset;
>          qbuf.len = len - offset;
>          qbuf.skip_unmap = false;
> -        mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
> +        if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
> +            dma_unmap_single(dev->dma_dev, addr, len,
> +                     DMA_FROM_DEVICE);
> +            skb_free_frag(buf);
> +            break;
> +        }
>          frames++;
>      }
>  
>
> --- a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
> @@ -653,6 +653,13 @@ static u32 mt7915_mmio_wed_init_rx_buf(struct
> mtk_wed_device *wed, int size)
>  
>          desc->buf0 = cpu_to_le32(phy_addr);
>          token = mt76_rx_token_consume(&dev->mt76, ptr, t, phy_addr);
> +        if (token < 0) {
> +            dma_unmap_single(dev->mt76.dma_dev, phy_addr,
> +                     wed->wlan.rx_size, DMA_TO_DEVICE);
> +            skb_free_frag(ptr);
> +            goto unmap;
> +        }
> +
>          desc->token |= cpu_to_le32(FIELD_PREP(MT_DMA_CTL_TOKEN,
>                                token));
>          desc++;
>
> --- a/drivers/net/wireless/mediatek/mt76/tx.c
> +++ b/drivers/net/wireless/mediatek/mt76/tx.c
> @@ -764,11 +764,12 @@ int mt76_rx_token_consume(struct mt76_dev *dev,
> void *ptr,
>      spin_lock_bh(&dev->rx_token_lock);
>      token = idr_alloc(&dev->rx_token, t, 0, dev->rx_token_size,
>                GFP_ATOMIC);
> +    if (token >= 0) {
> +        t->ptr = ptr;
> +        t->dma_addr = phys;
> +    }
>      spin_unlock_bh(&dev->rx_token_lock);
>  
> -    t->ptr = ptr;
> -    t->dma_addr = phys;
> -
>      return token;
>  }
>  EXPORT_SYMBOL_GPL(mt76_rx_token_consume);
>

2023-01-09 07:34:19

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On 04.01.23 15:20, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> Felix, Lorenzo, did below fix for the regression

There is another report about an issue with mediatek wifi in 6.2-rc:
https://bugzilla.kernel.org/show_bug.cgi?id=216901
To me this looks like a duplicate of the report that started this thread.

(side note: there was another, earlier report that might be a dupe, too:
https://bugzilla.kernel.org/show_bug.cgi?id=216829 )

> Mikhail reported make
> any progress to get mainlined? It doesn't look like it from here, but I
> suspect I missed something, that's why I'm asking.

No reply. :-((

That lack of feedback is another reason why I'm CCing the network
maintainers now, as the mediatek wifi issues in 6.2-rc (this one) and
6.1 ([1]) are already hitting a nerve here because the fixes are
progressing so slowly. I known, it was holiday season, but seems quite a
few people ran into these regressions already, hence we IMHO should
really try to aim fixing both this week.

[1] see
https://lore.kernel.org/all/[email protected]/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 21.12.22 18:17, Felix Fietkau wrote:
>> On 21.12.22 17:46, Mikhail Gavrilov wrote:
>>> On Wed, Dec 21, 2022 at 7:12 PM Felix Fietkau <[email protected]> wrote:
>>>>
>>>> Thanks! I guess I focused on the wrong part of your kernel log
>>>> initially. After more code review, I found that there is in fact a DMA
>>>> related bug in the commit that your bisection pointed to, which happened
>>>> to uncover and trigger the deadlock fixed by my other patch.
>>>>
>>>> So here's my fix for the DMA issue:
>>>> ---
>>> [cutted]
>>>>                 qbuf.skip_unmap = false;
>>>> -               if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
>>>> +               if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>>>>                         dma_unmap_single(dev->dma_dev, addr, len,
>>>>                                          DMA_FROM_DEVICE);
>>>>                         skb_free_frag(buf);
>>>>
>>>
>>> Sorry for stupid question.
>>>
>>> Do you have a separate branch?
>>> I see that the code is differ between master branch and the patch.
>>>
>>> For example in patch the line:
>>> - if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
>>> replaced by the line:
>>> + if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>>>
>>> But in master branch
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/mediatek/mt76/dma.c?id=b6bb9676f2165d518b35ba3bea5f1fcfc0d969bf#n604
>>> after line:
>>> qbuf.skip_unmap = false;
>>> followed the line:
>>> mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
>>> without if condition.
>>>
>>> So I'm stuck applying the patch :(
>> Sorry, I worked on a tree that had other pending fixes applied.
>> Please try this:
>>
>>
>> --- a/drivers/net/wireless/mediatek/mt76/dma.c
>> +++ b/drivers/net/wireless/mediatek/mt76/dma.c
>> @@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct
>> mt76_queue *q)
>>      mt76_dma_sync_idx(dev, q);
>>  }
>>  
>> +static int
>> +mt76_dma_add_rx_buf(struct mt76_dev *dev, struct mt76_queue *q,
>> +            struct mt76_queue_buf *buf, void *data)
>> +{
>> +    struct mt76_desc *desc = &q->desc[q->head];
>> +    struct mt76_queue_entry *entry = &q->entry[q->head];
>> +    struct mt76_txwi_cache *txwi = NULL;
>> +    u32 buf1 = 0, ctrl;
>> +    int idx = q->head;
>> +    int rx_token;
>> +
>> +    ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
>> +
>> +    if ((q->flags & MT_QFLAG_WED) &&
>> +        FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
>> +        txwi = mt76_get_rxwi(dev);
>> +        if (!txwi)
>> +            return -ENOMEM;
>> +
>> +        rx_token = mt76_rx_token_consume(dev, data, txwi, buf->addr);
>> +        if (rx_token < 0) {
>> +            mt76_put_rxwi(dev, txwi);
>> +            return -ENOMEM;
>> +        }
>> +
>> +        buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
>> +        ctrl |= MT_DMA_CTL_TO_HOST;
>> +    }
>> +
>> +    WRITE_ONCE(desc->buf0, cpu_to_le32(buf->addr));
>> +    WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
>> +    WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
>> +    WRITE_ONCE(desc->info, 0);
>> +
>> +    entry->dma_addr[0] = buf->addr;
>> +    entry->dma_len[0] = buf->len;
>> +    entry->txwi = txwi;
>> +    entry->buf = data;
>> +    entry->wcid = 0xffff;
>> +    entry->skip_buf1 = true;
>> +    q->head = (q->head + 1) % q->ndesc;
>> +    q->queued++;
>> +
>> +    return idx;
>> +}
>> +
>>  static int
>>  mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
>>           struct mt76_queue_buf *buf, int nbufs, u32 info,
>> @@ -212,65 +258,51 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct
>> mt76_queue *q,
>>  {
>>      struct mt76_queue_entry *entry;
>>      struct mt76_desc *desc;
>> -    u32 ctrl;
>>      int i, idx = -1;
>> +    u32 ctrl, next;
>> +
>> +    if (txwi) {
>> +        q->entry[q->head].txwi = DMA_DUMMY_DATA;
>> +        q->entry[q->head].skip_buf0 = true;
>> +    }
>>  
>>      for (i = 0; i < nbufs; i += 2, buf += 2) {
>>          u32 buf0 = buf[0].addr, buf1 = 0;
>>  
>>          idx = q->head;
>> -        q->head = (q->head + 1) % q->ndesc;
>> +        next = (q->head + 1) % q->ndesc;
>>  
>>          desc = &q->desc[idx];
>>          entry = &q->entry[idx];
>>  
>> -        if ((q->flags & MT_QFLAG_WED) &&
>> -            FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
>> -            struct mt76_txwi_cache *t = txwi;
>> -            int rx_token;
>> -
>> -            if (!t)
>> -                return -ENOMEM;
>> -
>> -            rx_token = mt76_rx_token_consume(dev, (void *)skb, t,
>> -                             buf[0].addr);
>> -            buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
>> -            ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len) |
>> -                   MT_DMA_CTL_TO_HOST;
>> -        } else {
>> -            if (txwi) {
>> -                q->entry[q->head].txwi = DMA_DUMMY_DATA;
>> -                q->entry[q->head].skip_buf0 = true;
>> -            }
>> -
>> -            if (buf[0].skip_unmap)
>> -                entry->skip_buf0 = true;
>> -            entry->skip_buf1 = i == nbufs - 1;
>> -
>> -            entry->dma_addr[0] = buf[0].addr;
>> -            entry->dma_len[0] = buf[0].len;
>> -
>> -            ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
>> -            if (i < nbufs - 1) {
>> -                entry->dma_addr[1] = buf[1].addr;
>> -                entry->dma_len[1] = buf[1].len;
>> -                buf1 = buf[1].addr;
>> -                ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
>> -                if (buf[1].skip_unmap)
>> -                    entry->skip_buf1 = true;
>> -            }
>> -
>> -            if (i == nbufs - 1)
>> -                ctrl |= MT_DMA_CTL_LAST_SEC0;
>> -            else if (i == nbufs - 2)
>> -                ctrl |= MT_DMA_CTL_LAST_SEC1;
>> +        if (buf[0].skip_unmap)
>> +            entry->skip_buf0 = true;
>> +        entry->skip_buf1 = i == nbufs - 1;
>> +
>> +        entry->dma_addr[0] = buf[0].addr;
>> +        entry->dma_len[0] = buf[0].len;
>> +
>> +        ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
>> +        if (i < nbufs - 1) {
>> +            entry->dma_addr[1] = buf[1].addr;
>> +            entry->dma_len[1] = buf[1].len;
>> +            buf1 = buf[1].addr;
>> +            ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
>> +            if (buf[1].skip_unmap)
>> +                entry->skip_buf1 = true;
>>          }
>>  
>> +        if (i == nbufs - 1)
>> +            ctrl |= MT_DMA_CTL_LAST_SEC0;
>> +        else if (i == nbufs - 2)
>> +            ctrl |= MT_DMA_CTL_LAST_SEC1;
>> +
>>          WRITE_ONCE(desc->buf0, cpu_to_le32(buf0));
>>          WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
>>          WRITE_ONCE(desc->info, cpu_to_le32(info));
>>          WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
>>  
>> +        q->head = next;
>>          q->queued++;
>>      }
>>  
>> @@ -577,17 +609,9 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct
>> mt76_queue *q)
>>      spin_lock_bh(&q->lock);
>>  
>>      while (q->queued < q->ndesc - 1) {
>> -        struct mt76_txwi_cache *t = NULL;
>>          struct mt76_queue_buf qbuf;
>>          void *buf = NULL;
>>  
>> -        if ((q->flags & MT_QFLAG_WED) &&
>> -            FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
>> -            t = mt76_get_rxwi(dev);
>> -            if (!t)
>> -                break;
>> -        }
>> -
>>          buf = page_frag_alloc(rx_page, q->buf_size, GFP_ATOMIC);
>>          if (!buf)
>>              break;
>> @@ -601,7 +625,12 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct
>> mt76_queue *q)
>>          qbuf.addr = addr + offset;
>>          qbuf.len = len - offset;
>>          qbuf.skip_unmap = false;
>> -        mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
>> +        if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>> +            dma_unmap_single(dev->dma_dev, addr, len,
>> +                     DMA_FROM_DEVICE);
>> +            skb_free_frag(buf);
>> +            break;
>> +        }
>>          frames++;
>>      }
>>  
>>
>> --- a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
>> +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
>> @@ -653,6 +653,13 @@ static u32 mt7915_mmio_wed_init_rx_buf(struct
>> mtk_wed_device *wed, int size)
>>  
>>          desc->buf0 = cpu_to_le32(phy_addr);
>>          token = mt76_rx_token_consume(&dev->mt76, ptr, t, phy_addr);
>> +        if (token < 0) {
>> +            dma_unmap_single(dev->mt76.dma_dev, phy_addr,
>> +                     wed->wlan.rx_size, DMA_TO_DEVICE);
>> +            skb_free_frag(ptr);
>> +            goto unmap;
>> +        }
>> +
>>          desc->token |= cpu_to_le32(FIELD_PREP(MT_DMA_CTL_TOKEN,
>>                                token));
>>          desc++;
>>
>> --- a/drivers/net/wireless/mediatek/mt76/tx.c
>> +++ b/drivers/net/wireless/mediatek/mt76/tx.c
>> @@ -764,11 +764,12 @@ int mt76_rx_token_consume(struct mt76_dev *dev,
>> void *ptr,
>>      spin_lock_bh(&dev->rx_token_lock);
>>      token = idr_alloc(&dev->rx_token, t, 0, dev->rx_token_size,
>>                GFP_ATOMIC);
>> +    if (token >= 0) {
>> +        t->ptr = ptr;
>> +        t->dma_addr = phys;
>> +    }
>>      spin_unlock_bh(&dev->rx_token_lock);
>>  
>> -    t->ptr = ptr;
>> -    t->dma_addr = phys;
>> -
>>      return token;
>>  }
>>  EXPORT_SYMBOL_GPL(mt76_rx_token_consume);
>>

2023-01-10 07:25:54

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

[CCing [email protected]]

On 09.01.23 08:32, Linux kernel regression tracking (Thorsten Leemhuis)
wrote:
> On 04.01.23 15:20, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> Felix, Lorenzo, did below fix for the regression
>
> There is another report about an issue with mediatek wifi in 6.2-rc:
> https://bugzilla.kernel.org/show_bug.cgi?id=216901

FWIW, "spasswolf" in that ticket posted a patch that according to the
reporter of that bug fixes the issue:
https://bugzilla.kernel.org/show_bug.cgi?id=216901#c5

I only took a brief look, but it seems it does a subset of what Felix
patch does.

> To me this looks like a duplicate of the report that started this thread.
>
> (side note: there was another, earlier report that might be a dupe, too:
> https://bugzilla.kernel.org/show_bug.cgi?id=216829 )>
>> Mikhail reported make
>> any progress to get mainlined? It doesn't look like it from here, but I
>> suspect I missed something, that's why I'm asking.
>
> No reply. :-((

Still no reply. I wonder if I'm holding things wrong. But well, let's
wait one more day before escalating this further.

Ciao, Thorsten

> That lack of feedback is another reason why I'm CCing the network
> maintainers now, as the mediatek wifi issues in 6.2-rc (this one) and
> 6.1 ([1]) are already hitting a nerve here because the fixes are
> progressing so slowly. I known, it was holiday season, but seems quite a
> few people ran into these regressions already, hence we IMHO should
> really try to aim fixing both this week.
>
> [1] see
> https://lore.kernel.org/all/[email protected]/
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot poke
>>
>> On 21.12.22 18:17, Felix Fietkau wrote:
>>> On 21.12.22 17:46, Mikhail Gavrilov wrote:
>>>> On Wed, Dec 21, 2022 at 7:12 PM Felix Fietkau <[email protected]> wrote:
>>>>>
>>>>> Thanks! I guess I focused on the wrong part of your kernel log
>>>>> initially. After more code review, I found that there is in fact a DMA
>>>>> related bug in the commit that your bisection pointed to, which happened
>>>>> to uncover and trigger the deadlock fixed by my other patch.
>>>>>
>>>>> So here's my fix for the DMA issue:
>>>>> ---
>>>> [cutted]
>>>>>                 qbuf.skip_unmap = false;
>>>>> -               if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
>>>>> +               if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>>>>>                         dma_unmap_single(dev->dma_dev, addr, len,
>>>>>                                          DMA_FROM_DEVICE);
>>>>>                         skb_free_frag(buf);
>>>>>
>>>>
>>>> Sorry for stupid question.
>>>>
>>>> Do you have a separate branch?
>>>> I see that the code is differ between master branch and the patch.
>>>>
>>>> For example in patch the line:
>>>> - if (mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t) < 0) {
>>>> replaced by the line:
>>>> + if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>>>>
>>>> But in master branch
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/mediatek/mt76/dma.c?id=b6bb9676f2165d518b35ba3bea5f1fcfc0d969bf#n604
>>>> after line:
>>>> qbuf.skip_unmap = false;
>>>> followed the line:
>>>> mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
>>>> without if condition.
>>>>
>>>> So I'm stuck applying the patch :(
>>> Sorry, I worked on a tree that had other pending fixes applied.
>>> Please try this:
>>>
>>>
>>> --- a/drivers/net/wireless/mediatek/mt76/dma.c
>>> +++ b/drivers/net/wireless/mediatek/mt76/dma.c
>>> @@ -205,6 +205,52 @@ mt76_dma_queue_reset(struct mt76_dev *dev, struct
>>> mt76_queue *q)
>>>      mt76_dma_sync_idx(dev, q);
>>>  }
>>>  
>>> +static int
>>> +mt76_dma_add_rx_buf(struct mt76_dev *dev, struct mt76_queue *q,
>>> +            struct mt76_queue_buf *buf, void *data)
>>> +{
>>> +    struct mt76_desc *desc = &q->desc[q->head];
>>> +    struct mt76_queue_entry *entry = &q->entry[q->head];
>>> +    struct mt76_txwi_cache *txwi = NULL;
>>> +    u32 buf1 = 0, ctrl;
>>> +    int idx = q->head;
>>> +    int rx_token;
>>> +
>>> +    ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
>>> +
>>> +    if ((q->flags & MT_QFLAG_WED) &&
>>> +        FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
>>> +        txwi = mt76_get_rxwi(dev);
>>> +        if (!txwi)
>>> +            return -ENOMEM;
>>> +
>>> +        rx_token = mt76_rx_token_consume(dev, data, txwi, buf->addr);
>>> +        if (rx_token < 0) {
>>> +            mt76_put_rxwi(dev, txwi);
>>> +            return -ENOMEM;
>>> +        }
>>> +
>>> +        buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
>>> +        ctrl |= MT_DMA_CTL_TO_HOST;
>>> +    }
>>> +
>>> +    WRITE_ONCE(desc->buf0, cpu_to_le32(buf->addr));
>>> +    WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
>>> +    WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
>>> +    WRITE_ONCE(desc->info, 0);
>>> +
>>> +    entry->dma_addr[0] = buf->addr;
>>> +    entry->dma_len[0] = buf->len;
>>> +    entry->txwi = txwi;
>>> +    entry->buf = data;
>>> +    entry->wcid = 0xffff;
>>> +    entry->skip_buf1 = true;
>>> +    q->head = (q->head + 1) % q->ndesc;
>>> +    q->queued++;
>>> +
>>> +    return idx;
>>> +}
>>> +
>>>  static int
>>>  mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q,
>>>           struct mt76_queue_buf *buf, int nbufs, u32 info,
>>> @@ -212,65 +258,51 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct
>>> mt76_queue *q,
>>>  {
>>>      struct mt76_queue_entry *entry;
>>>      struct mt76_desc *desc;
>>> -    u32 ctrl;
>>>      int i, idx = -1;
>>> +    u32 ctrl, next;
>>> +
>>> +    if (txwi) {
>>> +        q->entry[q->head].txwi = DMA_DUMMY_DATA;
>>> +        q->entry[q->head].skip_buf0 = true;
>>> +    }
>>>  
>>>      for (i = 0; i < nbufs; i += 2, buf += 2) {
>>>          u32 buf0 = buf[0].addr, buf1 = 0;
>>>  
>>>          idx = q->head;
>>> -        q->head = (q->head + 1) % q->ndesc;
>>> +        next = (q->head + 1) % q->ndesc;
>>>  
>>>          desc = &q->desc[idx];
>>>          entry = &q->entry[idx];
>>>  
>>> -        if ((q->flags & MT_QFLAG_WED) &&
>>> -            FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
>>> -            struct mt76_txwi_cache *t = txwi;
>>> -            int rx_token;
>>> -
>>> -            if (!t)
>>> -                return -ENOMEM;
>>> -
>>> -            rx_token = mt76_rx_token_consume(dev, (void *)skb, t,
>>> -                             buf[0].addr);
>>> -            buf1 |= FIELD_PREP(MT_DMA_CTL_TOKEN, rx_token);
>>> -            ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len) |
>>> -                   MT_DMA_CTL_TO_HOST;
>>> -        } else {
>>> -            if (txwi) {
>>> -                q->entry[q->head].txwi = DMA_DUMMY_DATA;
>>> -                q->entry[q->head].skip_buf0 = true;
>>> -            }
>>> -
>>> -            if (buf[0].skip_unmap)
>>> -                entry->skip_buf0 = true;
>>> -            entry->skip_buf1 = i == nbufs - 1;
>>> -
>>> -            entry->dma_addr[0] = buf[0].addr;
>>> -            entry->dma_len[0] = buf[0].len;
>>> -
>>> -            ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
>>> -            if (i < nbufs - 1) {
>>> -                entry->dma_addr[1] = buf[1].addr;
>>> -                entry->dma_len[1] = buf[1].len;
>>> -                buf1 = buf[1].addr;
>>> -                ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
>>> -                if (buf[1].skip_unmap)
>>> -                    entry->skip_buf1 = true;
>>> -            }
>>> -
>>> -            if (i == nbufs - 1)
>>> -                ctrl |= MT_DMA_CTL_LAST_SEC0;
>>> -            else if (i == nbufs - 2)
>>> -                ctrl |= MT_DMA_CTL_LAST_SEC1;
>>> +        if (buf[0].skip_unmap)
>>> +            entry->skip_buf0 = true;
>>> +        entry->skip_buf1 = i == nbufs - 1;
>>> +
>>> +        entry->dma_addr[0] = buf[0].addr;
>>> +        entry->dma_len[0] = buf[0].len;
>>> +
>>> +        ctrl = FIELD_PREP(MT_DMA_CTL_SD_LEN0, buf[0].len);
>>> +        if (i < nbufs - 1) {
>>> +            entry->dma_addr[1] = buf[1].addr;
>>> +            entry->dma_len[1] = buf[1].len;
>>> +            buf1 = buf[1].addr;
>>> +            ctrl |= FIELD_PREP(MT_DMA_CTL_SD_LEN1, buf[1].len);
>>> +            if (buf[1].skip_unmap)
>>> +                entry->skip_buf1 = true;
>>>          }
>>>  
>>> +        if (i == nbufs - 1)
>>> +            ctrl |= MT_DMA_CTL_LAST_SEC0;
>>> +        else if (i == nbufs - 2)
>>> +            ctrl |= MT_DMA_CTL_LAST_SEC1;
>>> +
>>>          WRITE_ONCE(desc->buf0, cpu_to_le32(buf0));
>>>          WRITE_ONCE(desc->buf1, cpu_to_le32(buf1));
>>>          WRITE_ONCE(desc->info, cpu_to_le32(info));
>>>          WRITE_ONCE(desc->ctrl, cpu_to_le32(ctrl));
>>>  
>>> +        q->head = next;
>>>          q->queued++;
>>>      }
>>>  
>>> @@ -577,17 +609,9 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct
>>> mt76_queue *q)
>>>      spin_lock_bh(&q->lock);
>>>  
>>>      while (q->queued < q->ndesc - 1) {
>>> -        struct mt76_txwi_cache *t = NULL;
>>>          struct mt76_queue_buf qbuf;
>>>          void *buf = NULL;
>>>  
>>> -        if ((q->flags & MT_QFLAG_WED) &&
>>> -            FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) == MT76_WED_Q_RX) {
>>> -            t = mt76_get_rxwi(dev);
>>> -            if (!t)
>>> -                break;
>>> -        }
>>> -
>>>          buf = page_frag_alloc(rx_page, q->buf_size, GFP_ATOMIC);
>>>          if (!buf)
>>>              break;
>>> @@ -601,7 +625,12 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct
>>> mt76_queue *q)
>>>          qbuf.addr = addr + offset;
>>>          qbuf.len = len - offset;
>>>          qbuf.skip_unmap = false;
>>> -        mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, t);
>>> +        if (mt76_dma_add_rx_buf(dev, q, &qbuf, buf) < 0) {
>>> +            dma_unmap_single(dev->dma_dev, addr, len,
>>> +                     DMA_FROM_DEVICE);
>>> +            skb_free_frag(buf);
>>> +            break;
>>> +        }
>>>          frames++;
>>>      }
>>>  
>>>
>>> --- a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
>>> +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
>>> @@ -653,6 +653,13 @@ static u32 mt7915_mmio_wed_init_rx_buf(struct
>>> mtk_wed_device *wed, int size)
>>>  
>>>          desc->buf0 = cpu_to_le32(phy_addr);
>>>          token = mt76_rx_token_consume(&dev->mt76, ptr, t, phy_addr);
>>> +        if (token < 0) {
>>> +            dma_unmap_single(dev->mt76.dma_dev, phy_addr,
>>> +                     wed->wlan.rx_size, DMA_TO_DEVICE);
>>> +            skb_free_frag(ptr);
>>> +            goto unmap;
>>> +        }
>>> +
>>>          desc->token |= cpu_to_le32(FIELD_PREP(MT_DMA_CTL_TOKEN,
>>>                                token));
>>>          desc++;
>>>
>>> --- a/drivers/net/wireless/mediatek/mt76/tx.c
>>> +++ b/drivers/net/wireless/mediatek/mt76/tx.c
>>> @@ -764,11 +764,12 @@ int mt76_rx_token_consume(struct mt76_dev *dev,
>>> void *ptr,
>>>      spin_lock_bh(&dev->rx_token_lock);
>>>      token = idr_alloc(&dev->rx_token, t, 0, dev->rx_token_size,
>>>                GFP_ATOMIC);
>>> +    if (token >= 0) {
>>> +        t->ptr = ptr;
>>> +        t->dma_addr = phys;
>>> +    }
>>>      spin_unlock_bh(&dev->rx_token_lock);
>>>  
>>> -    t->ptr = ptr;
>>> -    t->dma_addr = phys;
>>> -
>>>      return token;
>>>  }
>>>  EXPORT_SYMBOL_GPL(mt76_rx_token_consume);
>>>

2023-01-10 08:07:39

by Felix Fietkau

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e


> On 10. Jan 2023, at 08:17, Linux kernel regression tracking (Thorsten Leemhuis) <[email protected]> wrote:
>
> [CCing [email protected]]
>
>> On 09.01.23 08:32, Linux kernel regression tracking (Thorsten Leemhuis)
>> wrote:
>>> On 04.01.23 15:20, Thorsten Leemhuis wrote:
>>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>>> to make this easily accessible to everyone.
>>>
>>> Felix, Lorenzo, did below fix for the regression
>>
>> There is another report about an issue with mediatek wifi in 6.2-rc:
>> https://bugzilla.kernel.org/show_bug.cgi?id=216901
>
> FWIW, "spasswolf" in that ticket posted a patch that according to the
> reporter of that bug fixes the issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=216901#c5
>
> I only took a brief look, but it seems it does a subset of what Felix
> patch does.
>
>> To me this looks like a duplicate of the report that started this thread.
>>
>> (side note: there was another, earlier report that might be a dupe, too:
>> https://bugzilla.kernel.org/show_bug.cgi?id=216829 )>
>>> Mikhail reported make
>>> any progress to get mainlined? It doesn't look like it from here, but I
>>> suspect I missed something, that's why I'm asking.
>>
>> No reply. :-((
>
> Still no reply. I wonder if I'm holding things wrong. But well, let's
> wait one more day before escalating this further.

Johannes told me on IRC that he will review my patch soon. He simply has too many things to do at the moment.

- Felix

2023-01-10 08:43:28

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e



On 10.01.23 09:00, Felix Fietkau wrote:
>
>> On 10. Jan 2023, at 08:17, Linux kernel regression tracking (Thorsten Leemhuis) <[email protected]> wrote:
>>
>> [CCing [email protected]]
>>
>>> On 09.01.23 08:32, Linux kernel regression tracking (Thorsten Leemhuis)
>>> wrote:
>>>> On 04.01.23 15:20, Thorsten Leemhuis wrote:
>>>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>>>> to make this easily accessible to everyone.
>>>>
>>>> Felix, Lorenzo, did below fix for the regression
>>>
>>> There is another report about an issue with mediatek wifi in 6.2-rc:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=216901
>>
>> FWIW, "spasswolf" in that ticket posted a patch that according to the
>> reporter of that bug fixes the issue:
>> https://bugzilla.kernel.org/show_bug.cgi?id=216901#c5
>>
>> I only took a brief look, but it seems it does a subset of what Felix
>> patch does.
>>
>>> To me this looks like a duplicate of the report that started this thread.
>>>
>>> (side note: there was another, earlier report that might be a dupe, too:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=216829 )>
>>>> Mikhail reported make
>>>> any progress to get mainlined? It doesn't look like it from here, but I
>>>> suspect I missed something, that's why I'm asking.
>>>
>>> No reply. :-((
>>
>> Still no reply. I wonder if I'm holding things wrong. But well, let's
>> wait one more day before escalating this further.
>
> Johannes told me on IRC that he will review my patch soon. He simply has too many things to do at the moment.

Great, thx. And sorry for prodding so much, but that is part of the job
when it takes so long to fix regressions -- even in cases where that's
mainly caused by a holiday season (which I took into account, otherwise
I likely would have made more noise earlier already).

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

2023-01-10 22:00:39

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On Tue, Jan 10, 2023 at 1:00 PM Felix Fietkau <[email protected]> wrote:
>
>
> Johannes told me on IRC that he will review my patch soon. He simply has too many things to do at the moment.
>

Hi Felix,
I sometimes get this kernel oops.

[ 15.658988] mt7921e 0000:05:00.0: enabling device (0000 -> 0002)
[ 15.686595] BUG: unable to handle page fault for address: ffffb2758525a6a9
[ 15.687243] #PF: supervisor read access in kernel mode
[ 15.687806] #PF: error_code(0x0000) - not-present page
[ 15.687806] PGD 100000067 P4D 100000067 PUD 10020f067 PMD 11f02a067 PTE 0
[ 15.688647] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 15.688647] CPU: 10 PID: 728 Comm: systemd-udevd Tainted: G
W L ------- ---
6.2.0-0.rc3.20230110git5a41237ad1d4.25.fc38.x86_64 #1
[ 15.689537] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
[ 15.689537] RIP: 0010:mt7921_check_offload_capability+0xcb/0x100
[mt7921_common]
[ 15.689537] Code: 38 0f b7 03 0f b6 53 02 01 d0 48 98 48 8d 5c 05
00 48 39 cb 73 23 80 7b 03 04 48 8d 6b 04 75 e1 e8 fa 06 fe ee 48 85
ed 74 14 <0f> b6 43 05 48 83 c4 08 5b 5d c3 cc cc cc cc e8 e1 06 fe ee
48 83
[ 15.691541] RSP: 0018:ffffb27581b77b38 EFLAGS: 00010282
[ 15.691541] RAX: 0000000000000001 RBX: ffffb2758525a6a4 RCX: 000000008080003c
[ 15.691541] RDX: 000000008080003d RSI: ffffd478c5fe82c0 RDI: 0000000040000000
[ 15.691541] RBP: ffffb2758525a6a8 R08: 0000000000000000 R09: 000000008080003c
[ 15.693538] R10: ffff98463fa0bb60 R11: 0000000000000000 R12: ffff9845d33490d0
[ 15.693538] R13: ffff9845d33490d0 R14: ffff9845dbb9dda8 R15: ffffb27581b77da0
[ 15.693538] FS: 00007f498fed3b40(0000) GS:ffff985498a00000(0000)
knlGS:0000000000000000
[ 15.693538] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 15.693538] CR2: ffffb2758525a6a9 CR3: 000000017ebbc000 CR4: 0000000000750ee0
[ 15.695563] PKRU: 55555554
[ 15.695563] Call Trace:
[ 15.696548] <TASK>
[ 15.696548] mt7921_pci_probe+0xa6/0x340 [mt7921e]
[ 15.697540] ? __pm_runtime_resume+0x54/0x90
[ 15.697540] local_pci_probe+0x41/0x80
[ 15.698542] pci_device_probe+0xb3/0x220
[ 15.698542] really_probe+0xde/0x380
[ 15.698542] ? pm_runtime_barrier+0x50/0x90
[ 15.699559] __driver_probe_device+0x78/0x170
[ 15.699559] driver_probe_device+0x1f/0x90
[ 15.700547] __driver_attach+0xd2/0x1c0
[ 15.700547] ? __pfx___driver_attach+0x10/0x10
[ 15.701539] bus_for_each_dev+0x76/0xa0
[ 15.701539] bus_add_driver+0x1b1/0x200
[ 15.701539] driver_register+0x89/0xe0
[ 15.702537] ? __pfx_init_module+0x10/0x10 [mt7921e]
[ 15.703136] do_one_initcall+0x6e/0x330
[ 15.703562] do_init_module+0x4a/0x200
[ 15.703583] __do_sys_init_module+0x16a/0x1a0
[ 15.704344] ? sched_clock_local+0xe/0x80
[ 15.704565] do_syscall_64+0x5b/0x80
[ 15.704565] ? lock_release+0x14b/0x440
[ 15.704565] ? up_read+0x17/0x20
[ 15.705541] ? lock_is_held_type+0xe8/0x140
[ 15.705541] ? asm_exc_page_fault+0x22/0x30
[ 15.705541] ? lockdep_hardirqs_on+0x7d/0x100
[ 15.705541] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 15.705541] RIP: 0033:0x7f499091100e
[ 15.707560] Code: 48 8b 0d fd 4d 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca 4d 0c 00 f7 d8 64 89
01 48
[ 15.707560] RSP: 002b:00007ffd7b33ed98 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[ 15.707560] RAX: ffffffffffffffda RBX: 00005626f8635cf0 RCX: 00007f499091100e
[ 15.707560] RDX: 00007f4990de2453 RSI: 0000000000030f76 RDI: 00005626f8b50d70
[ 15.707560] RBP: 00007f4990de2453 R08: 00005626f8634050 R09: ffffd6297d846130
[ 15.709539] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
[ 15.709539] R13: 00005626f8634210 R14: 0000000000000000 R15: 00005626f8a11670
[ 15.709539] </TASK>
[ 15.709539] Modules linked in: snd_intel_dspcfg mt7921e(+)
snd_intel_sdw_acpi mt7921_common amd64_edac(-) btusb binfmt_misc
edac_mce_amd snd_soc_core snd_hda_codec mt76_connac_lib btrtl
snd_compress mt76 ac97_bus btbcm kvm_amd snd_pcm_dmaengine
snd_hda_core btintel snd_pci_ps snd_rpl_pci_acp6x btmtk mac80211
snd_pci_acp6x snd_hwdep kvm snd_seq bluetooth irqbypass libarc4
snd_seq_device snd_pcm vfat rapl snd_pci_acp5x asus_nb_wmi fat
wmi_bmof cfg80211 pcspkr snd_timer snd_rn_pci_acp3x snd_acp_config
k10temp snd_soc_acpi snd i2c_piix4 snd_pci_acp3x soundcore
asus_wireless joydev amd_pmc acpi_cpufreq zram amdgpu drm_ttm_helper
ttm hid_asus nvme asus_wmi iommu_v2 drm_buddy ledtrig_audio
sparse_keymap gpu_sched platform_profile nvme_core crct10dif_pclmul
drm_display_helper crc32_pclmul crc32c_intel polyval_clmulni rfkill
polyval_generic ucsi_acpi hid_multitouch ghash_clmulni_intel
sha512_ssse3 typec_ucsi serio_raw ccp sp5100_tco cec r8169 nvme_common
typec video i2c_hid_acpi i2c_hid wmi
[ 15.709539] ip6_tables ip_tables fuse
[ 15.712545] CR2: ffffb2758525a6a9
[ 15.712545] ---[ end trace 0000000000000000 ]---
[ 15.713540] RIP: 0010:mt7921_check_offload_capability+0xcb/0x100
[mt7921_common]
[ 15.713540] Code: 38 0f b7 03 0f b6 53 02 01 d0 48 98 48 8d 5c 05
00 48 39 cb 73 23 80 7b 03 04 48 8d 6b 04 75 e1 e8 fa 06 fe ee 48 85
ed 74 14 <0f> b6 43 05 48 83 c4 08 5b 5d c3 cc cc cc cc e8 e1 06 fe ee
48 83
[ 15.714539] RSP: 0018:ffffb27581b77b38 EFLAGS: 00010282
[ 15.714539] RAX: 0000000000000001 RBX: ffffb2758525a6a4 RCX: 000000008080003c
[ 15.714539] RDX: 000000008080003d RSI: ffffd478c5fe82c0 RDI: 0000000040000000
[ 15.715558] RBP: ffffb2758525a6a8 R08: 0000000000000000 R09: 000000008080003c
[ 15.715558] R10: ffff98463fa0bb60 R11: 0000000000000000 R12: ffff9845d33490d0
[ 15.716656] R13: ffff9845d33490d0 R14: ffff9845dbb9dda8 R15: ffffb27581b77da0
[ 15.716656] FS: 00007f498fed3b40(0000) GS:ffff985498a00000(0000)
knlGS:0000000000000000
[ 15.717536] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 15.717536] CR2: ffffb2758525a6a9 CR3: 000000017ebbc000 CR4: 0000000000750ee0
[ 15.717536] PKRU: 55555554
[ 15.759488] intel_rapl_common: Found RAPL domain package
[ 15.760623] intel_rapl_common: Found RAPL domain core


Is it somehow related to the tested patch?
Unfortunately, it happens too rarely and randomly that I do not have a
reproduction scenario for its exact repetition.
One thing I can say when this happens WiFi disappears.
I also attached the full kernel log here.

--
Best Regards,
Mike Gavrilov.


Attachments:
dmesg.txt (185.40 kB)

2023-01-17 00:41:52

by Mike Lothian

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

Hi

I'm struggling to find these patches on Patchwork, or apply the saved
raw patches to rc4

If I'm missing them, would you mind posting the link

Cheers

Mike

2023-01-17 05:54:46

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On Tue, Jan 17, 2023 at 5:33 AM Mike Lothian <[email protected]> wrote:
>
> Hi
>
> I'm struggling to find these patches on Patchwork, or apply the saved
> raw patches to rc4
>
> If I'm missing them, would you mind posting the link

https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

--
Best Regards,
Mike Gavrilov.

2023-01-17 06:44:29

by Kalle Valo

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

Mikhail Gavrilov <[email protected]> writes:

> On Tue, Jan 17, 2023 at 5:33 AM Mike Lothian <[email protected]> wrote:
>>
>> Hi
>>
>> I'm struggling to find these patches on Patchwork, or apply the saved
>> raw patches to rc4
>>
>> If I'm missing them, would you mind posting the link
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

And the patches are now applied to the wireless tree:

https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git/

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-01-17 13:17:11

by Mike Lothian

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On Tue, 17 Jan 2023 at 05:43, Mikhail Gavrilov
<[email protected]> wrote:
>
> On Tue, Jan 17, 2023 at 5:33 AM Mike Lothian <[email protected]> wrote:
> >
> > Hi
> >
> > I'm struggling to find these patches on Patchwork, or apply the saved
> > raw patches to rc4
> >
> > If I'm missing them, would you mind posting the link
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
> --
> Best Regards,
> Mike Gavrilov.

I can confirm this fixes things for me, thanks

2023-01-17 13:13:36

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e

On Tue, Jan 17, 2023 at 6:06 PM Mike Lothian <[email protected]> wrote:
>
> I can confirm this fixes things for me, thanks

Sorry for offtop.
Maybe somebody knows, is it possible to extend channel width to 160MHz
on this chip?

❯ sudo iw wlp5s0 info
Interface wlp5s0
ifindex 3
wdev 0x1
addr 48:e7:da:57:9a:33
type managed
wiphy 0
channel 44 (5220 MHz), width: 80 MHz, center1: 5210 MHz
txpower 3.00 dBm
multicast TXQ:
qsz-byt qsz-pkt flows drops marks overlmt hashcol tx-bytes tx-packets
0 0 0 0 0 0 0 0 0

❯ sudo iw wlp5s0 set channel 36 160MHz
Usage: iw [options] dev <devname> set channel <channel>
[NOHT|HT20|HT40+|HT40-|5MHz|10MHz|80MHz]
Options:
--debug enable netlink debugging

--
Best Regards,
Mike Gavrilov.

2023-01-27 12:23:42

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [6.2][regression] after commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae stopping working wifi mt7921e #forregzbot

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 22.12.22 13:36, Thorsten Leemhuis wrote:
> [Note: this mail contains only information for Linux kernel regression
> tracking. Mails like these contain '#forregzbot' in the subject to make
> then easy to spot and filter out. The author also tried to remove most
> or all individuals from the list of recipients to spare them the hassle.]
>
> On 21.12.22 02:10, Mikhail Gavrilov wrote:
>> Hi,
>> The kernel 6.2 preparation cycle has begun.
>> And after the kernel was updated on my laptop, the wifi stopped working.
>>
>> Bisecting blames this commit:
>> cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae is the first bad commit
>> commit cd372b8c99c5a5cf6a464acebb7e4a79af7ec8ae
>> Author: Lorenzo Bianconi <[email protected]>
>> Date: Sat Nov 12 16:40:35 2022 +0100
>>
>> wifi: mt76: add WED RX support to mt76_dma_{add,get}_buf
>>
>
> Thanks for the report. To be sure below issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
> tracking bot:
>
> #regzbot introduced cd372b8c99c5a5 ^
> https://bugzilla.kernel.org/show_bug.cgi?id=216829
> #regzbot title wifi: mt76: wifi stopped working
> #regzbot ignore-activity
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

Regzbot for some reason missed to notice the fix properly, so point to
it manually:

#regzbot fix 953519b35227
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.