2024-03-28 01:55:48

by Aaron Ma

[permalink] [raw]
Subject: [PATCHi 6.6 0/1] Cherrypick one more commit to fix iwlwifi crash on BE200

Hi Levin,

On Stable 6.6.23 kernel, iwlwifi crashed with the following error:

[ 290.279712] ------------[ cut here ]------------
[ 290.279726] Invalid rxb from HW 0
[ 290.279816] WARNING: CPU: 19 PID: 477 at
drivers/net/wireless/intel/iwlwifi/pcie/rx.c:1489
iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi]
[ 290.279885] Modules linked in: snd_ctl_led snd_soc_skl_hda_dsp
snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes
snd_hda_codec_realtek snd_hda_codec_generic rfcomm nvme_fabrics ccm cmac
algif_hash algif_skcipher af_alg bnep uvcvideo videobuf2_vmalloc uvc
videobuf2_memops videobuf2_v4l2 videodev btusb btrtl btintel btbcm btmtk
videobuf2_common bluetooth mc ecdh_generic ecc joydev snd_soc_dmic
intel_uncore_frequency intel_uncore_frequency_common
snd_sof_pci_intel_mtl snd_sof_intel_hda_common x86_pkg_temp_thermal
soundwire_intel intel_powerclamp soundwire_generic_allocation
snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci
snd_sof_xtensa_dsp coretemp snd_sof snd_sof_utils snd_soc_hdac_hda
snd_hda_ext_core kvm_intel snd_soc_acpi_intel_match snd_soc_acpi
soundwire_bus kvm snd_soc_core irqbypass snd_compress crct10dif_pclmul
ac97_bus crc32_pclmul snd_pcm_dmaengine polyval_clmulni polyval_generic
ghash_clmulni_intel sha512_ssse3 iwlmvm snd_hda_intel sha256_ssse3
binfmt_misc sha1_ssse3 i915
[ 290.279973] snd_intel_dspcfg drm_buddy aesni_intel
snd_intel_sdw_acpi ttm mac80211 crypto_simd processor_thermal_device_pci
snd_hda_codec drm_display_helper spi_nor hid_multitouch
processor_thermal_device cryptd think_lmi pmt_telemetry hid_generic
libarc4 mtd intel_rapl_msr pmt_class iwlwifi firmware_attributes_class
wmi_bmof snd_hda_core cec processor_thermal_rfim rapl snd_hwdep psmouse
thinkpad_acpi input_leds rc_core mei_me snd_seq_midi ucsi_acpi
processor_thermal_mbox intel_cstate intel_lpss_pci snd_seq_midi_event
typec_ucsi processor_thermal_rapl nvram i2c_i801 intel_lpss xhci_pci
drm_kms_helper spi_intel_pci ledtrig_audio cfg80211 snd_pcm e1000e
thunderbolt serio_raw platform_profile mei i2c_smbus spi_intel typec
idma64 xhci_pci_renesas i2c_algo_bit intel_vsec intel_rapl_common
snd_rawmidi mac_hid snd_seq snd_seq_device snd_timer i2c_hid_acpi
i2c_hid hid snd soundcore video int3403_thermal int340x_thermal_zone wmi
acpi_tad acpi_pad intel_pmc_core intel_hid int3400_thermal
pinctrl_meteorlake sparse_keymap
[ 290.280076] acpi_thermal_rel sch_fq_codel msr parport_pc ppdev lp
parport drm efi_pstore ip_tables x_tables autofs4
[ 290.280097] CPU: 19 PID: 477 Comm: irq/182-iwlwifi Not tainted 6.6.23
#75
[ 290.280104] Hardware name: LENOVO 21ML0SIT12/21ML0SIT12, BIOS
N47ET13W (1.02 ) 02/17/2024
[ 290.280108] RIP: 0010:iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi]
[ 290.280156] Code: 8b 8d 6c ff ff ff 4c 89 f2 4c 89 e6 4c 89 ef e8 4a
f4 ff ff e9 08 fe ff ff 4d 89 ef 89 d6 48 c7 c7 c5 6c bf c0 e8 44 1d 36
fb <0f> 0b 4c 89 ff e8 1a 48 ff ff e9 9e fe ff ff 0f 1f 44 00 00 e9 f6
[ 290.280161] RSP: 0018:ffffc900004e0de8 EFLAGS: 00010246
[ 290.280167] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[ 290.280170] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 290.280173] RBP: ffffc900004e0e98 R08: 0000000000000000 R09:
0000000000000000
[ 290.280176] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff88811dd27f88
[ 290.280179] R13: ffff888119ae8028 R14: ffff88812c4a0000 R15:
ffff888119ae8028
[ 290.280182] FS: 0000000000000000(0000) GS:ffff8882214c0000(0000)
knlGS:0000000000000000
[ 290.280187] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 290.280191] CR2: 00007f914800ba8c CR3: 000000020e83a005 CR4:
0000000000770ee0
[ 290.280195] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 290.280198] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
0000000000000400
[ 290.280201] PKRU: 55555554
[ 290.280204] Call Trace:
[ 290.280208] <IRQ>
[ 290.280214] ? show_regs+0x72/0x90
[ 290.280225] ? iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi]
[ 290.280269] ? __warn+0x8d/0x160
[ 290.280278] ? iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi]
[ 290.280324] ? report_bug+0x1bb/0x1d0
[ 290.280335] ? console_unlock+0x77/0x130
[ 290.280346] ? handle_bug+0x46/0x90
[ 290.280354] ? exc_invalid_op+0x19/0x80
[ 290.280360] ? asm_exc_invalid_op+0x1b/0x20
[ 290.280369] ? iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi]
[ 290.280412] ? iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi]
[ 290.280457] iwl_pcie_napi_poll_msix+0x30/0x100 [iwlwifi]
[ 290.280500] ? try_to_wake_up+0x278/0x6c0
[ 290.280507] __napi_poll+0x30/0x1f0
[ 290.280515] net_rx_action+0x190/0x300
[ 290.280521] ? __irq_wake_thread+0x42/0x50
[ 290.280529] __do_softirq+0xda/0x330
[ 290.280533] ? handle_edge_irq+0xda/0x250
[ 290.280540] ? __pfx_irq_thread_fn+0x10/0x10
[ 290.280547] do_softirq.part.0+0x41/0x80
[ 290.280557] </IRQ>
[ 290.280559] <TASK>
[ 290.280562] __local_bh_enable_ip+0x72/0x80
[ 290.280570] iwl_pcie_irq_rx_msix_handler+0xd7/0x1c0 [iwlwifi]
[ 290.280644] irq_thread_fn+0x25/0x70
[ 290.280653] irq_thread+0xea/0x1c0
[ 290.280660] ? __pfx_irq_thread_dtor+0x10/0x10
[ 290.280668] ? __pfx_irq_thread+0x10/0x10
[ 290.280675] kthread+0xf4/0x130
[ 290.280683] ? __pfx_kthread+0x10/0x10
[ 290.280690] ret_from_fork+0x43/0x70
[ 290.280697] ? __pfx_kthread+0x10/0x10
[ 290.280704] ret_from_fork_asm+0x1b/0x30
[ 290.280712] </TASK>
[ 290.280715] ---[ end trace 0000000000000000 ]---
[ 290.281118] iwlwifi 0000:09:00.0: Microcode SW error detected. Restarting 0x0.

Found the first bad
commit c1c1039135c3 ("wifi: iwlwifi: increase number of RX buffers for EHT devices")
Another commit should be along with it:
commit 9f9797c7de18 ("wifi: iwlwifi: pcie: fix RB status reading")

BugLink: https://bugs.launchpad.net/bugs/2058808

Johannes Berg (1):
wifi: iwlwifi: pcie: fix RB status reading

drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 8 ++++----
drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +-
drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 12 ++++--------
3 files changed, 9 insertions(+), 13 deletions(-)

--
2.34.1



2024-03-28 01:55:53

by Aaron Ma

[permalink] [raw]
Subject: [PATCH 6.6 1/1] wifi: iwlwifi: pcie: fix RB status reading

From: Johannes Berg <[email protected]>

[ Upstream commit 9f9797c7de18d2ec6be4ef6e0abbaea585040b39 ]

On newer hardware, a queue's RB status / write pointer
can be bigger than 4095 (0xFFF), so we cannot mask the
value by 0xFFF unconditionally. Since anyway that's
only necessary on older hardware, move the masking to
the helper function and apply it only for older HW.
This also moves the endian conversion in to handle it
more easily.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Gregory Greenman <[email protected]>
Link: https://lore.kernel.org/r/20230830112059.7be2a3fff6f4.I94f11dee314a4f7c1941d2d223936b1fa8aa9ee4@changeid
Signed-off-by: Johannes Berg <[email protected]>
Cc: [email protected] # 6.6.y
Signed-off-by: Aaron Ma <[email protected]>
---
drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 8 ++++----
drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +-
drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 12 ++++--------
3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
index 5602441df2b7e..8408e4ddddedd 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
@@ -190,17 +190,17 @@ struct iwl_rb_allocator {
* iwl_get_closed_rb_stts - get closed rb stts from different structs
* @rxq - the rxq to get the rb stts from
*/
-static inline __le16 iwl_get_closed_rb_stts(struct iwl_trans *trans,
- struct iwl_rxq *rxq)
+static inline u16 iwl_get_closed_rb_stts(struct iwl_trans *trans,
+ struct iwl_rxq *rxq)
{
if (trans->trans_cfg->device_family >= IWL_DEVICE_FAMILY_AX210) {
__le16 *rb_stts = rxq->rb_stts;

- return READ_ONCE(*rb_stts);
+ return le16_to_cpu(READ_ONCE(*rb_stts));
} else {
struct iwl_rb_status *rb_stts = rxq->rb_stts;

- return READ_ONCE(rb_stts->closed_rb_num);
+ return le16_to_cpu(READ_ONCE(rb_stts->closed_rb_num)) & 0xFFF;
}
}

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
index 63091c45a576d..be9b5a19e2a7c 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
@@ -1510,7 +1510,7 @@ static int iwl_pcie_rx_handle(struct iwl_trans *trans, int queue, int budget)
spin_lock(&rxq->lock);
/* uCode's read index (stored in shared DRAM) indicates the last Rx
* buffer that the driver may process (last buffer filled by ucode). */
- r = le16_to_cpu(iwl_get_closed_rb_stts(trans, rxq)) & 0x0FFF;
+ r = iwl_get_closed_rb_stts(trans, rxq);
i = rxq->read;

/* W/A 9000 device step A0 wrap-around bug */
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index 1bc4a0089c6ff..e9807fcca6ad1 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2714,11 +2714,9 @@ static ssize_t iwl_dbgfs_rx_queue_read(struct file *file,
pos += scnprintf(buf + pos, bufsz - pos, "\tfree_count: %u\n",
rxq->free_count);
if (rxq->rb_stts) {
- u32 r = __le16_to_cpu(iwl_get_closed_rb_stts(trans,
- rxq));
+ u32 r = iwl_get_closed_rb_stts(trans, rxq);
pos += scnprintf(buf + pos, bufsz - pos,
- "\tclosed_rb_num: %u\n",
- r & 0x0FFF);
+ "\tclosed_rb_num: %u\n", r);
} else {
pos += scnprintf(buf + pos, bufsz - pos,
"\tclosed_rb_num: Not Allocated\n");
@@ -3091,7 +3089,7 @@ static u32 iwl_trans_pcie_dump_rbs(struct iwl_trans *trans,

spin_lock_bh(&rxq->lock);

- r = le16_to_cpu(iwl_get_closed_rb_stts(trans, rxq)) & 0x0FFF;
+ r = iwl_get_closed_rb_stts(trans, rxq);

for (i = rxq->read, j = 0;
i != r && j < allocated_rb_nums;
@@ -3387,9 +3385,7 @@ iwl_trans_pcie_dump_data(struct iwl_trans *trans,
/* Dump RBs is supported only for pre-9000 devices (1 queue) */
struct iwl_rxq *rxq = &trans_pcie->rxq[0];
/* RBs */
- num_rbs =
- le16_to_cpu(iwl_get_closed_rb_stts(trans, rxq))
- & 0x0FFF;
+ num_rbs = iwl_get_closed_rb_stts(trans, rxq);
num_rbs = (num_rbs - rxq->read) & RX_QUEUE_MASK;
len += num_rbs * (sizeof(*data) +
sizeof(struct iwl_fw_error_dump_rb) +
--
2.34.1


2024-03-29 09:34:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 6.6 1/1] wifi: iwlwifi: pcie: fix RB status reading

On Thu, Mar 28, 2024 at 09:54:02AM +0800, Aaron Ma wrote:
> From: Johannes Berg <[email protected]>
>
> [ Upstream commit 9f9797c7de18d2ec6be4ef6e0abbaea585040b39 ]
>
> On newer hardware, a queue's RB status / write pointer
> can be bigger than 4095 (0xFFF), so we cannot mask the
> value by 0xFFF unconditionally. Since anyway that's
> only necessary on older hardware, move the masking to
> the helper function and apply it only for older HW.
> This also moves the endian conversion in to handle it
> more easily.
>
> Signed-off-by: Johannes Berg <[email protected]>
> Signed-off-by: Gregory Greenman <[email protected]>
> Link: https://lore.kernel.org/r/20230830112059.7be2a3fff6f4.I94f11dee314a4f7c1941d2d223936b1fa8aa9ee4@changeid
> Signed-off-by: Johannes Berg <[email protected]>
> Cc: [email protected] # 6.6.y
> Signed-off-by: Aaron Ma <[email protected]>
> ---
> drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 8 ++++----
> drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +-
> drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 12 ++++--------
> 3 files changed, 9 insertions(+), 13 deletions(-)

Now queued up, thanks.

greg k-h