2023-09-21 19:14:40

by Antoine Beaupré

[permalink] [raw]
Subject: Microcode SW error since Linux 6.5

Hi,

I've found what I feel might be a regression between Linux 6.1 and
6.5. For other reasons, I upgraded the kernel on my Debian 12
("bookworm", stale) laptop from the distribution 6.1.52 to the unstable
("sid") version, 6.5.3.

After the upgrade, I started to notice stuttering in my audio player, I
tracked it down and managed to correlate it with some kernel errors
related to the iwlwifi driver.

What's interesting is that this happens regardless of whether or not the
NIC is connected to a network. In at least one of the traces, the
computer was connected over a wire and wireless was not associated in
Network Manager.

Here's an example of the problem:

sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0.
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Start IWL Error Log Dump:
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Transport status: 0x0000004B, valid: 6
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Loaded firmware version: 73.35c0a2c6.0 ty-a0-gf-a0-73.ucode
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000071 | NMI_INTERRUPT_UMAC_FATAL
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000002F0 | trm_hw_status0
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | trm_hw_status1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x004DB676 | branchlink2
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x004DAFCA | interruptlink1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x004DAFCA | interruptlink2
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00016B8A | data1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000010 | data2
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | data3
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x003CF2BA | beacon time
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x01AF1EAD | tsf low
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | tsf hi
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | time gp1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x01B06739 | time gp2
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000001 | uCode revision type
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000049 | uCode version major
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x35C0A2C6 | uCode version minor
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000420 | hw version
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00C80002 | board version
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x80BBFF00 | hcmd
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00020000 | isr0
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | isr1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x48F04002 | isr2
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00C3000C | isr3
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | isr4
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00220103 | last cmd Id
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00016B8A | wait_event
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | l2p_control
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | l2p_duration
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | l2p_mhvalid
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | l2p_addr_match
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000018 | lmpm_pmg_sel
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | timestamp
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x0000284C | flow_handler
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Start IWL Error Log Dump:
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Transport status: 0x0000004B, valid: 7
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x20103600 | ADVANCED_SYSASSERT
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | umac branchlink1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x8045F174 | umac branchlink2
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x8047A13C | umac interruptlink1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | umac interruptlink2
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000000FF | umac data1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0xDEADBEEF | umac data2
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0xDEADBEEF | umac data3
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000049 | umac major
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x35C0A2C6 | umac minor
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x01B06733 | frame pointer
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0xC0886C0C | stack pointer
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x002B010D | last host cmd
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | isr status reg
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: IML/ROM dump:
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000B03 | IML/ROM error/state
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00008510 | IML/ROM data1
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Fseq Registers:
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x60000100 | FSEQ_ERROR_CODE
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00440007 | FSEQ_TOP_INIT_VERSION
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00080009 | FSEQ_CNVIO_INIT_VERSION
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x0000A652 | FSEQ_OTP_VERSION
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000002 | FSEQ_TOP_CONTENT_VERSION
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x4552414E | FSEQ_ALIVE_TOKEN
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00400410 | FSEQ_CNVI_ID
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00400410 | FSEQ_CNVR_ID
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00400410 | CNVI_AUX_MISC_CHIP
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00400410 | CNVR_AUX_MISC_CHIP
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00009061 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000061 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00080009 | FSEQ_PREV_CNVIO_INIT_VERSION
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00440005 | FSEQ_WIFI_FSEQ_VERSION
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00440007 | FSEQ_BT_FSEQ_VERSION
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000000DC | FSEQ_CLASS_TP_VERSION
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).
sep 21 09:33:14 angela kernel: ieee80211 phy0: Hardware restart was requested
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: FW error in SYNC CMD SCAN_REQ_UMAC
sep 21 09:33:14 angela kernel: CPU: 7 PID: 1563 Comm: wpa_supplicant Tainted: G W 6.5.0-1-amd64 #1 Debian 6.5.3-1
sep 21 09:33:14 angela kernel: Hardware name: Framework Laptop (12th Gen Intel Core)/FRANMACP04, BIOS 03.06 11/10/2022
sep 21 09:33:14 angela kernel: Call Trace:
sep 21 09:33:14 angela kernel: <TASK>
sep 21 09:33:14 angela kernel: dump_stack_lvl+0x47/0x60
sep 21 09:33:14 angela kernel: iwl_trans_txq_send_hcmd+0x33d/0x450 [iwlwifi]
sep 21 09:33:14 angela kernel: ? __pfx_autoremove_wake_function+0x10/0x10
sep 21 09:33:14 angela kernel: iwl_trans_send_cmd+0x66/0x100 [iwlwifi]
sep 21 09:33:14 angela kernel: iwl_mvm_send_cmd+0x16/0x40 [iwlmvm]
sep 21 09:33:14 angela kernel: iwl_mvm_reg_scan_start+0x3f0/0x610 [iwlmvm]
sep 21 09:33:14 angela kernel: iwl_mvm_mac_hw_scan+0x4e/0x70 [iwlmvm]
sep 21 09:33:14 angela kernel: drv_hw_scan+0x98/0x150 [mac80211]
sep 21 09:33:14 angela kernel: __ieee80211_start_scan+0x255/0x6f0 [mac80211]
sep 21 09:33:14 angela kernel: ieee80211_request_scan+0x2f/0x50 [mac80211]
sep 21 09:33:14 angela kernel: rdev_scan+0x25/0xd0 [cfg80211]
sep 21 09:33:14 angela kernel: nl80211_trigger_scan+0x409/0x7e0 [cfg80211]
sep 21 09:33:14 angela kernel: genl_family_rcv_msg_doit.isra.0+0xe3/0x140
sep 21 09:33:14 angela kernel: genl_rcv_msg+0x1b1/0x2c0
sep 21 09:33:14 angela kernel: ? __pfx_nl80211_pre_doit+0x10/0x10 [cfg80211]
sep 21 09:33:14 angela kernel: ? __pfx_nl80211_trigger_scan+0x10/0x10 [cfg80211]
sep 21 09:33:14 angela kernel: ? __pfx_nl80211_post_doit+0x10/0x10 [cfg80211]
sep 21 09:33:14 angela kernel: ? __pfx_genl_rcv_msg+0x10/0x10
sep 21 09:33:14 angela kernel: netlink_rcv_skb+0x58/0x110
sep 21 09:33:14 angela kernel: genl_rcv+0x28/0x40
sep 21 09:33:14 angela kernel: netlink_unicast+0x19e/0x290
sep 21 09:33:14 angela kernel: netlink_sendmsg+0x254/0x4d0
sep 21 09:33:14 angela kernel: sock_sendmsg+0x93/0xa0
sep 21 09:33:14 angela kernel: ____sys_sendmsg+0x278/0x300
sep 21 09:33:14 angela kernel: ? copy_msghdr_from_user+0x7d/0xc0
sep 21 09:33:14 angela kernel: ___sys_sendmsg+0x9a/0xe0
sep 21 09:33:14 angela kernel: __sys_sendmsg+0x7a/0xd0
sep 21 09:33:14 angela kernel: do_syscall_64+0x5d/0xc0
sep 21 09:33:14 angela kernel: ? do_syscall_64+0x6c/0xc0
sep 21 09:33:14 angela kernel: ? do_syscall_64+0x6c/0xc0
sep 21 09:33:14 angela kernel: ? fpregs_assert_state_consistent+0x26/0x50
sep 21 09:33:14 angela kernel: ? exit_to_user_mode_prepare+0x40/0x1d0
sep 21 09:33:14 angela kernel: ? syscall_exit_to_user_mode+0x2b/0x40
sep 21 09:33:14 angela kernel: ? do_syscall_64+0x6c/0xc0
sep 21 09:33:14 angela kernel: ? do_syscall_64+0x6c/0xc0
sep 21 09:33:14 angela kernel: ? do_syscall_64+0x6c/0xc0
sep 21 09:33:14 angela kernel: ? do_syscall_64+0x6c/0xc0
sep 21 09:33:14 angela kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
sep 21 09:33:14 angela kernel: RIP: 0033:0x7fdbfbf29910
sep 21 09:33:14 angela kernel: Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 80 3d d1 fc 0c 00 00 74 17 b8 2e 00 00 00>
sep 21 09:33:14 angela kernel: RSP: 002b:00007ffedc261e78 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
sep 21 09:33:14 angela kernel: RAX: ffffffffffffffda RBX: 00005568b18cf900 RCX: 00007fdbfbf29910
sep 21 09:33:14 angela kernel: RDX: 0000000000000000 RSI: 00007ffedc261eb0 RDI: 0000000000000006
sep 21 09:33:14 angela kernel: RBP: 00005568b1949360 R08: 0000000000000004 R09: 0000000000000000
sep 21 09:33:14 angela kernel: R10: 00007ffedc261f94 R11: 0000000000000202 R12: 00005568b18cfbe0
sep 21 09:33:14 angela kernel: R13: 00007ffedc261eb0 R14: 0000000000000000 R15: 00007ffedc261f94
sep 21 09:33:14 angela kernel: </TASK>
sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Scan failed! ret -5
sep 21 09:33:14 angela kernel: ------------[ cut here ]------------
sep 21 09:33:14 angela kernel: WARNING: CPU: 15 PID: 133361 at net/mac80211/scan.c:423 __ieee80211_scan_completed+0x308/0x320 [mac80211]
sep 21 09:33:14 angela kernel: Modules linked in: cdc_mbim cdc_wdm xfrm_user xfrm_algo nvme_fabrics ctr ccm rfcomm cmac algif_hash algif_skcipher af_alg tls>
sep 21 09:33:14 angela kernel: nf_reject_ipv4 intel_uncore_frequency_common snd_hda_codec_idt snd_soc_acpi nf_reject_ipv6 x86_pkg_temp_thermal snd_hda_code>
sep 21 09:33:14 angela kernel: configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic dm_crypt dm_mod efivarfs raid10 raid456 as>
sep 21 09:33:14 angela kernel: CPU: 15 PID: 133361 Comm: kworker/u32:22 Tainted: G W 6.5.0-1-amd64 #1 Debian 6.5.3-1
sep 21 09:33:14 angela kernel: Hardware name: Framework Laptop (12th Gen Intel Core)/FRANMACP04, BIOS 03.06 11/10/2022
sep 21 09:33:14 angela kernel: Workqueue: phy0 ieee80211_scan_work [mac80211]
sep 21 09:33:14 angela kernel: RIP: 0010:__ieee80211_scan_completed+0x308/0x320 [mac80211]
sep 21 09:33:14 angela kernel: Code: 0a 00 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 6f 49 04 00 65 ff 0d 10 49 2a 3e 0f 85 b1 fe ff ff 0f 1f 44 00 00 e9 a7 fe>
sep 21 09:33:14 angela kernel: RSP: 0018:ffffb42640ab7dc8 EFLAGS: 00010246
sep 21 09:33:14 angela kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
sep 21 09:33:14 angela kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9399481e8900
sep 21 09:33:14 angela kernel: RBP: ffff9399481e8900 R08: 0000000000000000 R09: 0000000000000000
sep 21 09:33:14 angela kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9399481e9b00
sep 21 09:33:14 angela kernel: R13: 0000000000000000 R14: ffff9399634ff335 R15: 0000000000000001
sep 21 09:33:14 angela kernel: FS: 0000000000000000(0000) GS:ffff93a0ef9c0000(0000) knlGS:0000000000000000
sep 21 09:33:14 angela kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
sep 21 09:33:14 angela kernel: CR2: 00005614e1215588 CR3: 00000008a8020000 CR4: 0000000000f50ee0
sep 21 09:33:14 angela kernel: PKRU: 55555554
sep 21 09:33:14 angela kernel: Call Trace:
sep 21 09:33:14 angela kernel: <TASK>
sep 21 09:33:14 angela kernel: ? __ieee80211_scan_completed+0x308/0x320 [mac80211]
sep 21 09:33:14 angela kernel: ? __warn+0x81/0x130
sep 21 09:33:14 angela kernel: ? __ieee80211_scan_completed+0x308/0x320 [mac80211]
sep 21 09:33:14 angela kernel: ? report_bug+0x191/0x1c0
sep 21 09:33:14 angela kernel: ? handle_bug+0x3c/0x80
sep 21 09:33:14 angela kernel: ? exc_invalid_op+0x17/0x70
sep 21 09:33:14 angela kernel: ? asm_exc_invalid_op+0x1a/0x20
sep 21 09:33:14 angela kernel: ? __ieee80211_scan_completed+0x308/0x320 [mac80211]
sep 21 09:33:14 angela kernel: ieee80211_scan_work+0x127/0x5d0 [mac80211]
sep 21 09:33:14 angela kernel: ? __schedule+0x3e7/0xb80
sep 21 09:33:14 angela kernel: process_one_work+0x1de/0x3f0
sep 21 09:33:14 angela kernel: worker_thread+0x51/0x390
sep 21 09:33:14 angela kernel: ? __pfx_worker_thread+0x10/0x10
sep 21 09:33:14 angela kernel: kthread+0xf4/0x130
sep 21 09:33:14 angela kernel: ? __pfx_kthread+0x10/0x10
sep 21 09:33:14 angela kernel: ret_from_fork+0x31/0x50
sep 21 09:33:14 angela kernel: ? __pfx_kthread+0x10/0x10
sep 21 09:33:14 angela kernel: ret_from_fork_asm+0x1b/0x30
sep 21 09:33:14 angela kernel: </TASK>
sep 21 09:33:14 angela kernel: ---[ end trace 0000000000000000 ]---
sep 21 09:33:14 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-5
sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f
sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_AUTH_KEY_0: 0x80
sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
sep 21 09:33:15 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=DRIVER type=WORLD


Thanks for any ideas!

--
La démocratie réelle se définit d'abord et avant tout par la
participation massive des citoyens à la gestion des affaires de la cité.
Elle est directe et participative. Elle trouve son expression la plus
authentique dans l'assemblée populaire et le dialogue permanent sur
l'organisation de la vie en commun. - De la servitude moderne


2023-09-21 20:05:58

by Johannes Berg

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

On Thu, 2023-09-21 at 13:24 -0400, Antoine Beaupré wrote:
> Hi,
>
> I've found what I feel might be a regression between Linux 6.1 and
> 6.5. For other reasons, I upgraded the kernel on my Debian 12
> ("bookworm", stale) laptop from the distribution 6.1.52 to the unstable
> ("sid") version, 6.5.3.
>
> After the upgrade, I started to notice stuttering in my audio player, I
> tracked it down and managed to correlate it with some kernel errors
> related to the iwlwifi driver.
>
> What's interesting is that this happens regardless of whether or not the
> NIC is connected to a network. In at least one of the traces, the
> computer was connected over a wire and wireless was not associated in
> Network Manager.

This happens when scanning.


> Here's an example of the problem:
>
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0.

Can you give a few wpa_supplicant lines (there were some below) above
this? Just want to make sure it really is scanning on wlan0, not
something with P2P device.

> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x20103600 | ADVANCED_SYSASSERT

> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000000FF | umac data1

This means that somehow scan_start_mac_or_link_id in the driver ended up
0xff which is invalid, but I'm not sure I see immediately how that
happened, since it looks like in 6.5.3 we do assign it reasonably. I
guess somehow in the code link_info->fw_link_id must be 0xff (invalid
ID), but I'm not sure I see how that could happen.

*thinks*

Oh.. This is an older firmware, so it doesn't have
IWL_UCODE_TLV_CAPA_MLD_API_SUPPORT! Hah. I feel like I had some concerns
in this area before ... but maybe the other way around.

I think something like this, perhaps:

--- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
@@ -2342,7 +2342,7 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm,
if (gen_flags & IWL_UMAC_SCAN_GEN_FLAGS_V2_FRAGMENTED_LMAC2)
gp->num_of_fragments[SCAN_HB_LMAC_IDX] = IWL_SCAN_NUM_OF_FRAGS;

- if (version < 12) {
+ if (version < 12 || !iwl_mvm_has_mld_api(mvm->fw)) {
gp->scan_start_mac_or_link_id = scan_vif->id;
} else {
struct iwl_mvm_vif_link_info *link_info;

johannes

2023-09-22 15:48:12

by Greenman, Gregory

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

Hi,

On Thu, 2023-09-21 at 13:24 -0400, Antoine Beaupré wrote:
> Hi,
>
> I've found what I feel might be a regression between Linux 6.1 and
> 6.5. For other reasons, I upgraded the kernel on my Debian 12
> ("bookworm", stale) laptop from the distribution 6.1.52 to the unstable
> ("sid") version, 6.5.3.
>
> After the upgrade, I started to notice stuttering in my audio player, I
> tracked it down and managed to correlate it with some kernel errors
> related to the iwlwifi driver.
>
> What's interesting is that this happens regardless of whether or not the
> NIC is connected to a network. In at least one of the traces, the
> computer was connected over a wire and wireless was not associated in
> Network Manager.
>
> Here's an example of the problem:
>
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0.
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Start IWL Error Log Dump:
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Transport status: 0x0000004B, valid: 6
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Loaded firmware version: 73.35c0a2c6.0 ty-a0-gf-a0-73.ucode
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000071 | NMI_INTERRUPT_UMAC_FATAL   
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000002F0 | trm_hw_status0
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | trm_hw_status1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x004DB676 | branchlink2
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x004DAFCA | interruptlink1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x004DAFCA | interruptlink2
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00016B8A | data1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000010 | data2
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | data3
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x003CF2BA | beacon time
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x01AF1EAD | tsf low
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | tsf hi
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | time gp1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x01B06739 | time gp2
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000001 | uCode revision type
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000049 | uCode version major
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x35C0A2C6 | uCode version minor
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000420 | hw version
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00C80002 | board version
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x80BBFF00 | hcmd
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00020000 | isr0
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | isr1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x48F04002 | isr2
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00C3000C | isr3
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | isr4
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00220103 | last cmd Id
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00016B8A | wait_event
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | l2p_control
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | l2p_duration
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | l2p_mhvalid
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | l2p_addr_match
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000018 | lmpm_pmg_sel
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | timestamp
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x0000284C | flow_handler
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Start IWL Error Log Dump:
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Transport status: 0x0000004B, valid: 7
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x20103600 | ADVANCED_SYSASSERT
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | umac branchlink1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x8045F174 | umac branchlink2
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x8047A13C | umac interruptlink1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | umac interruptlink2
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000000FF | umac data1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0xDEADBEEF | umac data2
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0xDEADBEEF | umac data3
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000049 | umac major
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x35C0A2C6 | umac minor
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x01B06733 | frame pointer
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0xC0886C0C | stack pointer
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x002B010D | last host cmd
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000000 | isr status reg
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: IML/ROM dump:
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000B03 | IML/ROM error/state
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00008510 | IML/ROM data1
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Fseq Registers:
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x60000100 | FSEQ_ERROR_CODE
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00440007 | FSEQ_TOP_INIT_VERSION
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00080009 | FSEQ_CNVIO_INIT_VERSION
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x0000A652 | FSEQ_OTP_VERSION
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000002 | FSEQ_TOP_CONTENT_VERSION
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x4552414E | FSEQ_ALIVE_TOKEN
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00400410 | FSEQ_CNVI_ID
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00400410 | FSEQ_CNVR_ID
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00400410 | CNVI_AUX_MISC_CHIP
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00400410 | CNVR_AUX_MISC_CHIP
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00009061 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00000061 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00080009 | FSEQ_PREV_CNVIO_INIT_VERSION
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00440005 | FSEQ_WIFI_FSEQ_VERSION
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x00440007 | FSEQ_BT_FSEQ_VERSION
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000000DC | FSEQ_CLASS_TP_VERSION
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).
> sep 21 09:33:14 angela kernel: ieee80211 phy0: Hardware restart was requested
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: FW error in SYNC CMD SCAN_REQ_UMAC
> sep 21 09:33:14 angela kernel: CPU: 7 PID: 1563 Comm: wpa_supplicant Tainted: G        W          6.5.0-1-amd64 #1  Debian 6.5.3-1
> sep 21 09:33:14 angela kernel: Hardware name: Framework Laptop (12th Gen Intel Core)/FRANMACP04, BIOS 03.06 11/10/2022
> sep 21 09:33:14 angela kernel: Call Trace:
> sep 21 09:33:14 angela kernel:  <TASK>
> sep 21 09:33:14 angela kernel:  dump_stack_lvl+0x47/0x60
> sep 21 09:33:14 angela kernel:  iwl_trans_txq_send_hcmd+0x33d/0x450 [iwlwifi]
> sep 21 09:33:14 angela kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
> sep 21 09:33:14 angela kernel:  iwl_trans_send_cmd+0x66/0x100 [iwlwifi]
> sep 21 09:33:14 angela kernel:  iwl_mvm_send_cmd+0x16/0x40 [iwlmvm]
> sep 21 09:33:14 angela kernel:  iwl_mvm_reg_scan_start+0x3f0/0x610 [iwlmvm]
> sep 21 09:33:14 angela kernel:  iwl_mvm_mac_hw_scan+0x4e/0x70 [iwlmvm]
> sep 21 09:33:14 angela kernel:  drv_hw_scan+0x98/0x150 [mac80211]
> sep 21 09:33:14 angela kernel:  __ieee80211_start_scan+0x255/0x6f0 [mac80211]
> sep 21 09:33:14 angela kernel:  ieee80211_request_scan+0x2f/0x50 [mac80211]
> sep 21 09:33:14 angela kernel:  rdev_scan+0x25/0xd0 [cfg80211]
> sep 21 09:33:14 angela kernel:  nl80211_trigger_scan+0x409/0x7e0 [cfg80211]
> sep 21 09:33:14 angela kernel:  genl_family_rcv_msg_doit.isra.0+0xe3/0x140
> sep 21 09:33:14 angela kernel:  genl_rcv_msg+0x1b1/0x2c0
> sep 21 09:33:14 angela kernel:  ? __pfx_nl80211_pre_doit+0x10/0x10 [cfg80211]
> sep 21 09:33:14 angela kernel:  ? __pfx_nl80211_trigger_scan+0x10/0x10 [cfg80211]
> sep 21 09:33:14 angela kernel:  ? __pfx_nl80211_post_doit+0x10/0x10 [cfg80211]
> sep 21 09:33:14 angela kernel:  ? __pfx_genl_rcv_msg+0x10/0x10
> sep 21 09:33:14 angela kernel:  netlink_rcv_skb+0x58/0x110
> sep 21 09:33:14 angela kernel:  genl_rcv+0x28/0x40
> sep 21 09:33:14 angela kernel:  netlink_unicast+0x19e/0x290
> sep 21 09:33:14 angela kernel:  netlink_sendmsg+0x254/0x4d0
> sep 21 09:33:14 angela kernel:  sock_sendmsg+0x93/0xa0
> sep 21 09:33:14 angela kernel:  ____sys_sendmsg+0x278/0x300
> sep 21 09:33:14 angela kernel:  ? copy_msghdr_from_user+0x7d/0xc0
> sep 21 09:33:14 angela kernel:  ___sys_sendmsg+0x9a/0xe0
> sep 21 09:33:14 angela kernel:  __sys_sendmsg+0x7a/0xd0
> sep 21 09:33:14 angela kernel:  do_syscall_64+0x5d/0xc0
> sep 21 09:33:14 angela kernel:  ? do_syscall_64+0x6c/0xc0
> sep 21 09:33:14 angela kernel:  ? do_syscall_64+0x6c/0xc0
> sep 21 09:33:14 angela kernel:  ? fpregs_assert_state_consistent+0x26/0x50
> sep 21 09:33:14 angela kernel:  ? exit_to_user_mode_prepare+0x40/0x1d0
> sep 21 09:33:14 angela kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
> sep 21 09:33:14 angela kernel:  ? do_syscall_64+0x6c/0xc0
> sep 21 09:33:14 angela kernel:  ? do_syscall_64+0x6c/0xc0
> sep 21 09:33:14 angela kernel:  ? do_syscall_64+0x6c/0xc0
> sep 21 09:33:14 angela kernel:  ? do_syscall_64+0x6c/0xc0
> sep 21 09:33:14 angela kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> sep 21 09:33:14 angela kernel: RIP: 0033:0x7fdbfbf29910
> sep 21 09:33:14 angela kernel: Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 80 3d d1 fc 0c 00 00 74 17 b8 2e 00 00 00>
> sep 21 09:33:14 angela kernel: RSP: 002b:00007ffedc261e78 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> sep 21 09:33:14 angela kernel: RAX: ffffffffffffffda RBX: 00005568b18cf900 RCX: 00007fdbfbf29910
> sep 21 09:33:14 angela kernel: RDX: 0000000000000000 RSI: 00007ffedc261eb0 RDI: 0000000000000006
> sep 21 09:33:14 angela kernel: RBP: 00005568b1949360 R08: 0000000000000004 R09: 0000000000000000
> sep 21 09:33:14 angela kernel: R10: 00007ffedc261f94 R11: 0000000000000202 R12: 00005568b18cfbe0
> sep 21 09:33:14 angela kernel: R13: 00007ffedc261eb0 R14: 0000000000000000 R15: 00007ffedc261f94
> sep 21 09:33:14 angela kernel:  </TASK>
> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Scan failed! ret -5
> sep 21 09:33:14 angela kernel: ------------[ cut here ]------------
> sep 21 09:33:14 angela kernel: WARNING: CPU: 15 PID: 133361 at net/mac80211/scan.c:423 __ieee80211_scan_completed+0x308/0x320 [mac80211]
> sep 21 09:33:14 angela kernel: Modules linked in: cdc_mbim cdc_wdm xfrm_user xfrm_algo nvme_fabrics ctr ccm rfcomm cmac algif_hash algif_skcipher af_alg tls>
> sep 21 09:33:14 angela kernel:  nf_reject_ipv4 intel_uncore_frequency_common snd_hda_codec_idt snd_soc_acpi nf_reject_ipv6 x86_pkg_temp_thermal snd_hda_code>
> sep 21 09:33:14 angela kernel:  configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic dm_crypt dm_mod efivarfs raid10 raid456 as>
> sep 21 09:33:14 angela kernel: CPU: 15 PID: 133361 Comm: kworker/u32:22 Tainted: G        W          6.5.0-1-amd64 #1  Debian 6.5.3-1
> sep 21 09:33:14 angela kernel: Hardware name: Framework Laptop (12th Gen Intel Core)/FRANMACP04, BIOS 03.06 11/10/2022
> sep 21 09:33:14 angela kernel: Workqueue: phy0 ieee80211_scan_work [mac80211]
> sep 21 09:33:14 angela kernel: RIP: 0010:__ieee80211_scan_completed+0x308/0x320 [mac80211]
> sep 21 09:33:14 angela kernel: Code: 0a 00 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 6f 49 04 00 65 ff 0d 10 49 2a 3e 0f 85 b1 fe ff ff 0f 1f 44 00 00 e9 a7 fe>
> sep 21 09:33:14 angela kernel: RSP: 0018:ffffb42640ab7dc8 EFLAGS: 00010246
> sep 21 09:33:14 angela kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> sep 21 09:33:14 angela kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9399481e8900
> sep 21 09:33:14 angela kernel: RBP: ffff9399481e8900 R08: 0000000000000000 R09: 0000000000000000
> sep 21 09:33:14 angela kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9399481e9b00
> sep 21 09:33:14 angela kernel: R13: 0000000000000000 R14: ffff9399634ff335 R15: 0000000000000001
> sep 21 09:33:14 angela kernel: FS:  0000000000000000(0000) GS:ffff93a0ef9c0000(0000) knlGS:0000000000000000
> sep 21 09:33:14 angela kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> sep 21 09:33:14 angela kernel: CR2: 00005614e1215588 CR3: 00000008a8020000 CR4: 0000000000f50ee0
> sep 21 09:33:14 angela kernel: PKRU: 55555554
> sep 21 09:33:14 angela kernel: Call Trace:
> sep 21 09:33:14 angela kernel:  <TASK>
> sep 21 09:33:14 angela kernel:  ? __ieee80211_scan_completed+0x308/0x320 [mac80211]
> sep 21 09:33:14 angela kernel:  ? __warn+0x81/0x130
> sep 21 09:33:14 angela kernel:  ? __ieee80211_scan_completed+0x308/0x320 [mac80211]
> sep 21 09:33:14 angela kernel:  ? report_bug+0x191/0x1c0
> sep 21 09:33:14 angela kernel:  ? handle_bug+0x3c/0x80
> sep 21 09:33:14 angela kernel:  ? exc_invalid_op+0x17/0x70
> sep 21 09:33:14 angela kernel:  ? asm_exc_invalid_op+0x1a/0x20
> sep 21 09:33:14 angela kernel:  ? __ieee80211_scan_completed+0x308/0x320 [mac80211]
> sep 21 09:33:14 angela kernel:  ieee80211_scan_work+0x127/0x5d0 [mac80211]
> sep 21 09:33:14 angela kernel:  ? __schedule+0x3e7/0xb80
> sep 21 09:33:14 angela kernel:  process_one_work+0x1de/0x3f0
> sep 21 09:33:14 angela kernel:  worker_thread+0x51/0x390
> sep 21 09:33:14 angela kernel:  ? __pfx_worker_thread+0x10/0x10
> sep 21 09:33:14 angela kernel:  kthread+0xf4/0x130
> sep 21 09:33:14 angela kernel:  ? __pfx_kthread+0x10/0x10
> sep 21 09:33:14 angela kernel:  ret_from_fork+0x31/0x50
> sep 21 09:33:14 angela kernel:  ? __pfx_kthread+0x10/0x10
> sep 21 09:33:14 angela kernel:  ret_from_fork_asm+0x1b/0x30
> sep 21 09:33:14 angela kernel:  </TASK>
> sep 21 09:33:14 angela kernel: ---[ end trace 0000000000000000 ]---
> sep 21 09:33:14 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-5
> sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f
> sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
> sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_AUTH_KEY_0: 0x80
> sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
> sep 21 09:33:15 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=DRIVER type=WORLD
>
>
> Thanks for any ideas!
>

This firmware assert is related to the periodic scan wifi performs occasionally.
But, this firmware is rather old, it was relased in June 2022 and a lot of fixes and other changes were
introduced since then. The default driver from kernel 6.5 should load one of the latest firmwares,
availble at [1].

[1] http://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

2023-09-25 02:43:40

by Antoine Beaupré

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

On 2023-09-21 21:29:27, Johannes Berg wrote:
> On Thu, 2023-09-21 at 13:24 -0400, Antoine Beaupré wrote:
>> Hi,
>>
>> I've found what I feel might be a regression between Linux 6.1 and
>> 6.5. For other reasons, I upgraded the kernel on my Debian 12
>> ("bookworm", stale) laptop from the distribution 6.1.52 to the unstable
>> ("sid") version, 6.5.3.
>>
>> After the upgrade, I started to notice stuttering in my audio player, I
>> tracked it down and managed to correlate it with some kernel errors
>> related to the iwlwifi driver.
>>
>> What's interesting is that this happens regardless of whether or not the
>> NIC is connected to a network. In at least one of the traces, the
>> computer was connected over a wire and wireless was not associated in
>> Network Manager.
>
> This happens when scanning.

Ah, that makes sense!

>> Here's an example of the problem:
>>
>> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0.
>
> Can you give a few wpa_supplicant lines (there were some below) above
> this? Just want to make sure it really is scanning on wlan0, not
> something with P2P device.

Interestingly, for the above fault, there's no wpa_supplicant line just
*before*. There's this *after*:

sep 21 09:33:14 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-5
sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f
sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: WFPM_AUTH_KEY_0: 0x80
sep 21 09:33:15 angela kernel: iwlwifi 0000:a6:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
sep 21 09:33:15 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=DRIVER type=WORLD

But an earlier one is preceeded by:

sep 21 09:32:45 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-5
sep 21 09:32:45 angela kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0.
[...]

>> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x20103600 | ADVANCED_SYSASSERT
>
>> sep 21 09:33:14 angela kernel: iwlwifi 0000:a6:00.0: 0x000000FF | umac data1
>
> This means that somehow scan_start_mac_or_link_id in the driver ended up
> 0xff which is invalid, but I'm not sure I see immediately how that
> happened, since it looks like in 6.5.3 we do assign it reasonably. I
> guess somehow in the code link_info->fw_link_id must be 0xff (invalid
> ID), but I'm not sure I see how that could happen.
>
> *thinks*
>
> Oh.. This is an older firmware, so it doesn't have
> IWL_UCODE_TLV_CAPA_MLD_API_SUPPORT! Hah. I feel like I had some concerns
> in this area before ... but maybe the other way around.
>
> I think something like this, perhaps:
>
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> @@ -2342,7 +2342,7 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm,
> if (gen_flags & IWL_UMAC_SCAN_GEN_FLAGS_V2_FRAGMENTED_LMAC2)
> gp->num_of_fragments[SCAN_HB_LMAC_IDX] = IWL_SCAN_NUM_OF_FRAGS;
>
> - if (version < 12) {
> + if (version < 12 || !iwl_mvm_has_mld_api(mvm->fw)) {
> gp->scan_start_mac_or_link_id = scan_vif->id;
> } else {
> struct iwl_mvm_vif_link_info *link_info;

Interesting! In any case, the firmware is certainly out of date in
Debian stable, and I guess it's to be expected that having it out of
sync with the running kernel is a Bad Idea, it's just not something I've
thought of before. :)

Thanks for the debugging, I'll make sure to keep the firmware and kernel
in better lockstep in the future!

a.

--
Lorsque l'on range des objets dans des tiroirs, et que l'on a plus
d'objets que de tiroirs, alors un tiroir au moins contient deux
objets.
- Lejeune-Dirichlet, Peter Gustav

2023-09-25 18:55:03

by Antoine Beaupré

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

On 2023-09-25 08:03:09, Johannes Berg wrote:
> On Sun, 2023-09-24 at 22:43 -0400, Antoine Beaupré wrote:

[...]

>> > I think something like this, perhaps:
>> >
>> > --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
>> > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
>> > @@ -2342,7 +2342,7 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm,
>> > if (gen_flags & IWL_UMAC_SCAN_GEN_FLAGS_V2_FRAGMENTED_LMAC2)
>> > gp->num_of_fragments[SCAN_HB_LMAC_IDX] = IWL_SCAN_NUM_OF_FRAGS;
>> >
>> > - if (version < 12) {
>> > + if (version < 12 || !iwl_mvm_has_mld_api(mvm->fw)) {
>> > gp->scan_start_mac_or_link_id = scan_vif->id;
>> > } else {
>> > struct iwl_mvm_vif_link_info *link_info;
>>
>> Interesting! In any case, the firmware is certainly out of date in
>> Debian stable, and I guess it's to be expected that having it out of
>> sync with the running kernel is a Bad Idea, it's just not something I've
>> thought of before. :)
>
> Did that patch help?

I haven't tried, to be honest. It's been years since I compiled a
kernel, so I haven't actually seriously considered it. But I can
definitely put that on my list if it would be useful for you!

Quite a bit of scaffolding to build to get back down there though...

>> Thanks for the debugging, I'll make sure to keep the firmware and kernel
>> in better lockstep in the future!
>
> Yes, you can ... but I mean, we still do want to consider this a bug, I
> think, since we explicitly built the thing to load the older firmware.
> It just gets _far_ less testing.

Got it, happy to play guinea pig a little further if that's useful.

a.

--
The desire to sacrifice an entire lifetime to the noblest of ideals
serves no purpose if one works alone.
- Che Guevara

2023-09-26 07:48:38

by Johannes Berg

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

On Sun, 2023-09-24 at 22:43 -0400, Antoine Beaupré wrote:
>
> Interestingly, for the above fault, there's no wpa_supplicant line just
> *before*. There's this *after*:
>
> sep 21 09:33:14 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-5

Yeah I saw this, just wasn't sure where it was scanning, hence the
question. But I don't think it mattered.

> But an earlier one is preceeded by:
>
> sep 21 09:32:45 angela wpa_supplicant[1563]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-5
> sep 21 09:32:45 angela kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0.
> [...]

Well I guess there was just another scan before that one :)

>
> > I think something like this, perhaps:
> >
> > --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> > @@ -2342,7 +2342,7 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm,
> > if (gen_flags & IWL_UMAC_SCAN_GEN_FLAGS_V2_FRAGMENTED_LMAC2)
> > gp->num_of_fragments[SCAN_HB_LMAC_IDX] = IWL_SCAN_NUM_OF_FRAGS;
> >
> > - if (version < 12) {
> > + if (version < 12 || !iwl_mvm_has_mld_api(mvm->fw)) {
> > gp->scan_start_mac_or_link_id = scan_vif->id;
> > } else {
> > struct iwl_mvm_vif_link_info *link_info;
>
> Interesting! In any case, the firmware is certainly out of date in
> Debian stable, and I guess it's to be expected that having it out of
> sync with the running kernel is a Bad Idea, it's just not something I've
> thought of before. :)

Did that patch help?

> Thanks for the debugging, I'll make sure to keep the firmware and kernel
> in better lockstep in the future!

Yes, you can ... but I mean, we still do want to consider this a bug, I
think, since we explicitly built the thing to load the older firmware.
It just gets _far_ less testing.

johannes

2023-09-26 19:22:42

by Johannes Berg

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

On Mon, 2023-09-25 at 14:39 -0400, Antoine Beaupré wrote:
> > Did that patch help?
>
> I haven't tried, to be honest. It's been years since I compiled a
> kernel, so I haven't actually seriously considered it. But I can
> definitely put that on my list if it would be useful for you!

Heh ok. Well we can also try it I guess. Or reason our way through it,
but I'm pretty sure something like that should be done here.

> > Yes, you can ... but I mean, we still do want to consider this a bug, I
> > think, since we explicitly built the thing to load the older firmware.
> > It just gets _far_ less testing.
>
> Got it, happy to play guinea pig a little further if that's useful.
>

Well you might want to go for the extra stability instead :-) I'm kind
of thinking along the lines of "if you can download the next wifi
firmware with the current one, that's probably good enough". :)

johannes

2023-10-03 11:52:44

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

[CCing the regressions list]

On 25.09.23 21:03, Johannes Berg wrote:
> On Mon, 2023-09-25 at 14:39 -0400, Antoine Beaupré wrote:
>>> Did that patch help?
>>
>> I haven't tried, to be honest. It's been years since I compiled a
>> kernel, so I haven't actually seriously considered it. But I can
>> definitely put that on my list if it would be useful for you!
> Heh ok. Well we can also try it I guess. Or reason our way through it,
> but I'm pretty sure something like that should be done here.
>
>>> Yes, you can ... but I mean, we still do want to consider this a bug, I
>>> think, since we explicitly built the thing to load the older firmware.
>>> It just gets _far_ less testing.
>>
>> Got it, happy to play guinea pig a little further if that's useful.
> Well you might want to go for the extra stability instead :-) I'm kind
> of thinking along the lines of "if you can download the next wifi
> firmware with the current one, that's probably good enough". :)

Well, that's not how Linus wants things to be, as due to this "no
regression" rule things should continue to work if people update the
kernel without updating linux-firmware (as also explained in
Documentation/driver-api/firmware/firmware-usage-guidelines.rst).

That's what I though when I saw that message initially, but I thought:
well, not worth a comment, the reporter apparently doesn't mind that
much. But since then I saw more and more reports that looked related to
my untrained eyes (and thus might not be related at all!).

https://bugzilla.kernel.org/show_bug.cgi?id=217894
https://bugzilla.kernel.org/show_bug.cgi?id=217894#c6 (same ticket)
https://bugzilla.kernel.org/show_bug.cgi?id=217963
https://lore.kernel.org/all/[email protected]/

Gregory, Johannes, is there something wrong here with 6.5.y? If yes: is
anything already been done to improve the state of things?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

2023-10-04 07:51:54

by Greenman, Gregory

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

Hi,

On Tue, 2023-10-03 at 13:52 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> [CCing the regressions list]
>
> On 25.09.23 21:03, Johannes Berg wrote:
> > On Mon, 2023-09-25 at 14:39 -0400, Antoine Beaupré wrote:
> > > > Did that patch help?
> > >
> > > I haven't tried, to be honest. It's been years since I compiled a
> > > kernel, so I haven't actually seriously considered it. But I can
> > > definitely put that on my list if it would be useful for you!
> > Heh ok. Well we can also try it I guess. Or reason our way through it,
> > but I'm pretty sure something like that should be done here.
> >
> > > > Yes, you can ... but I mean, we still do want to consider this a bug, I
> > > > think, since we explicitly built the thing to load the older firmware.
> > > > It just gets _far_ less testing.
> > >
> > > Got it, happy to play guinea pig a little further if that's useful.
> > Well you might want to go for the extra stability instead :-) I'm kind
> > of thinking along the lines of "if you can download the next wifi
> > firmware with the current one, that's probably good enough". :)
>
> Well, that's not how Linus wants things to be, as due to this "no
> regression" rule things should continue to work if people update the
> kernel without updating linux-firmware (as also explained in
> Documentation/driver-api/firmware/firmware-usage-guidelines.rst).
>
> That's what I though when I saw that message initially, but I thought:
> well, not worth a comment, the reporter apparently doesn't mind that
> much. But since then I saw more and more reports that looked related to
> my untrained eyes (and thus might not be related at all!).
>
> https://bugzilla.kernel.org/show_bug.cgi?id=217894
> https://bugzilla.kernel.org/show_bug.cgi?id=217894#c6 (same ticket)
> https://bugzilla.kernel.org/show_bug.cgi?id=217963
> https://lore.kernel.org/all/[email protected]/
>
> Gregory, Johannes, is there something wrong here with 6.5.y? If yes: is
> anything already been done to improve the state of things?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.

The patch in [1] should solve the issue. The problem was that wrong version
of Firmware API was used for some older versions.

[1] https://patchwork.kernel.org/project/linux-wireless/patch/20230926165546.086e635fbbe6.Ia660f35ca0b1079f2c2ea92fd8d14d8101a89d03@changeid/

2023-10-04 08:36:08

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

[CCing the net maintainers due to the related merge request not yet
processed]

On 04.10.23 09:51, Greenman, Gregory wrote:
> On Tue, 2023-10-03 at 13:52 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 25.09.23 21:03, Johannes Berg wrote:
>>> On Mon, 2023-09-25 at 14:39 -0400, Antoine Beaupré wrote:
>>> [...]
>>>>> Yes, you can ... but I mean, we still do want to consider this a bug, I
>>>>> think, since we explicitly built the thing to load the older firmware.
>>>>> It just gets _far_ less testing.
>>>> Got it, happy to play guinea pig a little further if that's useful.
>>> Well you might want to go for the extra stability instead :-) I'm kind
>>> of thinking along the lines of "if you can download the next wifi
>>> firmware with the current one, that's probably good enough". :)
>>
>> Well, that's not how Linus wants things to be, as due to this "no
>> regression" rule things should continue to work if people update the
>> kernel without updating linux-firmware (as also explained in
>> Documentation/driver-api/firmware/firmware-usage-guidelines.rst).
>>
>> That's what I though when I saw that message initially, but I thought:
>> well, not worth a comment, the reporter apparently doesn't mind that
>> much. But since then I saw more and more reports that looked related to
>> my untrained eyes (and thus might not be related at all!).
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=217894
>> https://bugzilla.kernel.org/show_bug.cgi?id=217894#c6 (same ticket)
>> https://bugzilla.kernel.org/show_bug.cgi?id=217963
>> https://lore.kernel.org/all/[email protected]/
>>
>> Gregory, Johannes, is there something wrong here with 6.5.y? If yes: is
>> anything already been done to improve the state of things?

> The patch in [1] should solve the issue. The problem was that wrong version
> of Firmware API was used for some older versions.
>
> [1] https://patchwork.kernel.org/project/linux-wireless/patch/20230926165546.086e635fbbe6.Ia660f35ca0b1079f2c2ea92fd8d14d8101a89d03@changeid/

Ahh, great, many thx for the reply. Two things then:

(1) I wanted to ask why this patch is not in mainline or -net yet, but
then noticed the fix is among the patches of a one week old pull request
that afaics was not handled yet:
https://lore.kernel.org/all/[email protected]/

Guess that one fell through the cracks due to the Netconf 2023 workshop?
Maybe this msg gets things rolling again.


(2) Would have been great if that fix would have had a stable tag to
ensure a quick backport to 6.5. :-/ Made me wonder if this is an
accident, but doesn't look like it:

$ git log --oneline --grep 'CC:.*[email protected]' mainline/master
-- drivers/net/wireless/intel/iwlwifi/ | wc -l
11

Is there a reason why iwlwifi patches nearly never contain stable tags?
Reminder: a "Fixes:" tag alone does not guarantee that the fix is
backported.


Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

2023-10-04 13:10:12

by Kalle Valo

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

"Linux regression tracking (Thorsten Leemhuis)"
<[email protected]> writes:
>> The patch in [1] should solve the issue. The problem was that wrong version
>> of Firmware API was used for some older versions.
>>
>> [1]
>> https://patchwork.kernel.org/project/linux-wireless/patch/20230926165546.086e635fbbe6.Ia660f35ca0b1079f2c2ea92fd8d14d8101a89d03@changeid/
>
> Ahh, great, many thx for the reply. Two things then:
>
> (1) I wanted to ask why this patch is not in mainline or -net yet, but
> then noticed the fix is among the patches of a one week old pull request
> that afaics was not handled yet:
> https://lore.kernel.org/all/[email protected]/
>
> Guess that one fell through the cracks due to the Netconf 2023 workshop?
> Maybe this msg gets things rolling again.

At least on Monday there were something 500 patches in netdev patchwork
and today I see still 340 patches. Understandably it takes time to
process all that, let's just wait patiently and let the maintainers do
their job in peace.

I see our pull request is in active state in patchwork so I don't think
it has fell through the cracks:

https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-10-04 15:34:30

by Johannes Berg

[permalink] [raw]
Subject: Re: Microcode SW error since Linux 6.5

Thorsten,

You know ... when you started this regression tracking project, I
actually naively thought it might be a good idea, and _help_ us.

Honestly though, now after seeing it for a while, for me it's done the
opposite.

Pretty much all interactions we have with it are like this:

> (2) Would have been great if that fix would have had a stable tag to
> ensure a quick backport to 6.5. :-/ Made me wonder if this is an
> accident, but doesn't look like it:
>
> $ git log --oneline --grep 'CC:.*[email protected]' mainline/master
> -- drivers/net/wireless/intel/iwlwifi/ | wc -l
> 11
>
> Is there a reason why iwlwifi patches nearly never contain stable tags?
> Reminder: a "Fixes:" tag alone does not guarantee that the fix is
> backported.

Yeah, that's wrapped in a "question", but really it's not, it's a
demand. You're demanding that we do our job differently, you might say
better, but you're demanding things of us.

Seeing this seriously makes me want to reconsider being a maintainer at
all. Why bother? All people do is feel entitled to you doing _more_.

Never mind that the the whole phrasing/reminder thing is pretty
condescending, like we don't know that?

Yeah, sure, I can already see how you're going to apologise, say you
didn't mean it that way, you're just doing your job etc., like last time
when you did the same to Kalle, etc. And I'm pretty sure we're _far_
from the only ones.

But honestly, _you_ chose this job. _You_ chose to be demanding in this
way. _You_ chose to complain to people about these things. _You_ decided
(as far as I can tell) to demand that stable maintenance was now the
subsystem maintainer job, rather than the stable maintainer's job.

This isn't helping. In fact, it's the exact opposite of helping.

You _could_ have decided that the patch was indeed worth it for stable,
submitted it there (once in mainline), and then maybe send a nudge
saying something like "I've proposed this patch for stable, it would've
been nice if you could add a Cc tag for future similar patches", or so?
But no, you decided to complain, counted wrongly (*), accused us of not
doing a job (that you signed us up for to boot!), etc.

(*) I'll leave it as an exercise to figure out why, because really, this
email isn't about the actual count or proportion patches or anything,
it's about the fact that you decided to count at all.

And also, "Cc'ing netdev because it's not merged" - have you paid any
attention at all? There are conferences going on, some parts of the
world have fall/autumn vacations, etc. It's been what, a week? Even
Linus noted there wasn't a net pull request, but I'll note that nowhere
did he actually _complain_ about that, unlike you. Relax!


So ... in summary: Can you please just stop? In fact, I'm tempted to
just ask you to simply stop rendering your services unto wireless, but I
haven't spoken with Kalle about it now.

Thanks,
johannes