2023-11-15 08:57:35

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] wifi: iwlwifi: Add debugging around scan failure warning.

On Tue, 2023-11-14 at 13:23 -0800, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> Provide additional information that may help debugging why
> the WARN_ON is seen.
>
> Signed-off-by: Ben Greear <[email protected]>
> ---
> drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> index 75c5c58e14a5..ec24ece7c877 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> @@ -2356,7 +2356,10 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm,
> link_id = ffs(vif->active_links) - 1;
>
> link_info = scan_vif->link[link_id];
> - if (!WARN_ON(!link_info))
> + if (WARN_ON(!link_info))
> + IWL_ERR(mvm, "scan failure: Cannot find link info for link-id: %d active-links: 0x%x\n",
> + link_id, vif->active_links);
>

It would make sense to put the data *into* the warning, rather than
separately? Though I'm not sure I see so much value in the long string
(vs. just the data).

But honestly I'm not sure this really even is a problem at all? Some
confusion can happen during firmware restart here, and not sure we can
really fix that. Though maybe with the new locking we can.

johannes


2023-11-15 16:06:26

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] wifi: iwlwifi: Add debugging around scan failure warning.

On 11/15/23 00:57, Johannes Berg wrote:
> On Tue, 2023-11-14 at 13:23 -0800, [email protected] wrote:
>> From: Ben Greear <[email protected]>
>>
>> Provide additional information that may help debugging why
>> the WARN_ON is seen.
>>
>> Signed-off-by: Ben Greear <[email protected]>
>> ---
>> drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
>> index 75c5c58e14a5..ec24ece7c877 100644
>> --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
>> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
>> @@ -2356,7 +2356,10 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm,
>> link_id = ffs(vif->active_links) - 1;
>>
>> link_info = scan_vif->link[link_id];
>> - if (!WARN_ON(!link_info))
>> + if (WARN_ON(!link_info))
>> + IWL_ERR(mvm, "scan failure: Cannot find link info for link-id: %d active-links: 0x%x\n",
>> + link_id, vif->active_links);
>>
>
> It would make sense to put the data *into* the warning, rather than
> separately? Though I'm not sure I see so much value in the long string
> (vs. just the data).

I assume IWL_ERR will provide some extra context in systems with multiple radios,
so more useful than what I could easily put into WARN_* directly.

>
> But honestly I'm not sure this really even is a problem at all? Some
> confusion can happen during firmware restart here, and not sure we can
> really fix that. Though maybe with the new locking we can.

I see the splat, not sure why. It would be nice to know if link-id is
corrupted somehow, or if active links is zero, etc.

Thanks,
Ben

>
> johannes
>

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2023-11-15 16:11:35

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] wifi: iwlwifi: Add debugging around scan failure warning.

On Wed, 2023-11-15 at 08:01 -0800, Ben Greear wrote:
> On 11/15/23 00:57, Johannes Berg wrote:
> > On Tue, 2023-11-14 at 13:23 -0800, [email protected] wrote:
> > > From: Ben Greear <[email protected]>
> > >
> > > Provide additional information that may help debugging why
> > > the WARN_ON is seen.
> > >
> > > Signed-off-by: Ben Greear <[email protected]>
> > > ---
> > > drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> > > index 75c5c58e14a5..ec24ece7c877 100644
> > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
> > > @@ -2356,7 +2356,10 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm,
> > > link_id = ffs(vif->active_links) - 1;
> > >
> > > link_info = scan_vif->link[link_id];
> > > - if (!WARN_ON(!link_info))
> > > + if (WARN_ON(!link_info))
> > > + IWL_ERR(mvm, "scan failure: Cannot find link info for link-id: %d active-links: 0x%x\n",
> > > + link_id, vif->active_links);
> > >
> >
> > It would make sense to put the data *into* the warning, rather than
> > separately? Though I'm not sure I see so much value in the long string
> > (vs. just the data).
>
> I assume IWL_ERR will provide some extra context in systems with multiple radios,
> so more useful than what I could easily put into WARN_* directly.

That's true, though not sure that helps so much more? You could put it
easily into the warning though :)

Anyway, the point is that bug reports from the field often don't include
the full log etc. so putting things _into_ the warning really is usually
more useful than putting it after.

> >
> > But honestly I'm not sure this really even is a problem at all? Some
> > confusion can happen during firmware restart here, and not sure we can
> > really fix that. Though maybe with the new locking we can.
>
> I see the splat, not sure why. It would be nice to know if link-id is
> corrupted somehow, or if active links is zero, etc.

Yeah in this case we'd really need more of the dynamic information to
really debug it, I guess.

johannes

2023-11-16 17:53:24

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] wifi: iwlwifi: Add debugging around scan failure warning.

On 11/15/23 08:10, Johannes Berg wrote:
> On Wed, 2023-11-15 at 08:01 -0800, Ben Greear wrote:
>> On 11/15/23 00:57, Johannes Berg wrote:
>>> On Tue, 2023-11-14 at 13:23 -0800, [email protected] wrote:
>>>> From: Ben Greear <[email protected]>
>>>>
>>>> Provide additional information that may help debugging why
>>>> the WARN_ON is seen.
>>>>
>>>> Signed-off-by: Ben Greear <[email protected]>
>>>> ---
>>>> drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 5 ++++-
>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
>>>> index 75c5c58e14a5..ec24ece7c877 100644
>>>> --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
>>>> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
>>>> @@ -2356,7 +2356,10 @@ iwl_mvm_scan_umac_fill_general_p_v12(struct iwl_mvm *mvm,
>>>> link_id = ffs(vif->active_links) - 1;
>>>>
>>>> link_info = scan_vif->link[link_id];
>>>> - if (!WARN_ON(!link_info))
>>>> + if (WARN_ON(!link_info))
>>>> + IWL_ERR(mvm, "scan failure: Cannot find link info for link-id: %d active-links: 0x%x\n",
>>>> + link_id, vif->active_links);
>>>>
>>>
>>> It would make sense to put the data *into* the warning, rather than
>>> separately? Though I'm not sure I see so much value in the long string
>>> (vs. just the data).
>>
>> I assume IWL_ERR will provide some extra context in systems with multiple radios,
>> so more useful than what I could easily put into WARN_* directly.
>
> That's true, though not sure that helps so much more? You could put it
> easily into the warning though :)
>
> Anyway, the point is that bug reports from the field often don't include
> the full log etc. so putting things _into_ the warning really is usually
> more useful than putting it after.
>
>>>
>>> But honestly I'm not sure this really even is a problem at all? Some
>>> confusion can happen during firmware restart here, and not sure we can
>>> really fix that. Though maybe with the new locking we can.
>>
>> I see the splat, not sure why. It would be nice to know if link-id is
>> corrupted somehow, or if active links is zero, etc.
>
> Yeah in this case we'd really need more of the dynamic information to
> really debug it, I guess.

I found a splat with my debugging enabled today, running with the v86 firmware. Looks like
immediate cause is that active_links was 0x0. Logs are full of it, in fact. Seems to always
be the same reason.

[ 4381.954574] cfg80211-scan-6ghz, n-channels is 0, flags: 0x4000 need_scan_psc: 1 req-n-channels: 2
[ 4382.599873] ------------[ cut here ]------------
[ 4382.599880] WARNING: CPU: 3 PID: 6083 at drivers/net/wireless/intel/iwlwifi/mvm/scan.c:2359 iwl_mvm_scan_umac_v14_and_above+0x169a/0x1730 [iwlmvm]
[ 4382.599950] Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink vrf nf_defrag_ipv6 nf_defrag_ipv4 bpfilter 8021q garp mrp stp llc macvlan pktgen
rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr f71882fg intel_rapl_msr iTCO_wdt snd_hda_codec_hdmi ee1004 intel_pmc_bxt snd_hda_codec_realtek snd_hda_codec_generic
iTCO_vendor_support ledtrig_audio coretemp intel_rapl_common intel_tcc_cooling snd_hda_intel x86_pkg_temp_thermal snd_intel_dspcfg intel_powerclamp
snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device kvm_intel snd_pcm kvm irqbypass intel_wmi_thunderbolt snd_timer snd i2c_i801 pcspkr soundcore
i2c_smbus iwlmvm mac80211 iwlwifi cfg80211 bfq mei_pxp mei_hdcp intel_pch_thermal tpm_crb tpm_tis tpm_tis_core tpm acpi_pad sch_fq_codel nfsd auth_rpcgss
nfs_acl lockd grace fuse sunrpc zram raid1 dm_raid raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq i915 drm_buddy
intel_gtt drm_display_helper igb drm_kms_helper i2c_algo_bit cec rc_core ttm ixgbe mdio dca drm agpgart
[ 4382.600146] hwmon mei_wdt xhci_pci i2c_core xhci_pci_renesas video wmi [last unloaded: nfnetlink]
[ 4382.600167] CPU: 3 PID: 6083 Comm: wpa_supplicant Tainted: G W 6.7.0-rc1+ #3
[ 4382.600173] Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/21/2023
[ 4382.600177] RIP: 0010:iwl_mvm_scan_umac_v14_and_above+0x169a/0x1730 [iwlmvm]
[ 4382.600240] Code: 8b 84 24 28 01 00 00 48 0f ba e0 2f 0f 83 9a fe ff ff 41 83 ce 04 e9 91 fe ff ff e8 50 63 c2 df b8 80 ff ff ff e9 ec f7 ff ff <0f> 0b 4c 89
ff e8 7c 65 c2 df 49 8b 3f 45 0f b7 c4 89 e9 48 c7 c2
[ 4382.600245] RSP: 0018:ffff88814342ed18 EFLAGS: 00010246
[ 4382.600251] RAX: 0000000000000000 RBX: ffff88814c0ef198 RCX: ffffffffa19e4a83
[ 4382.600256] RDX: dffffc0000000000 RSI: 0000000000000003 RDI: ffff88814c0efa80
[ 4382.600259] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed102b98cd01
[ 4382.600263] R10: 0000000000000000 R11: 6048302418120c08 R12: 0000000000000000
[ 4382.600267] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88814b1454e8
[ 4382.600271] FS: 00007f32fbf4b780(0000) GS:ffff88841bf80000(0000) knlGS:0000000000000000
[ 4382.600276] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4382.600280] CR2: 00007f604e9faa50 CR3: 00000001a07b2001 CR4: 00000000003706f0
[ 4382.600284] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4382.600288] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4382.600291] Call Trace:
[ 4382.600295] <TASK>
[ 4382.600299] ? __warn+0x9c/0x1f0
[ 4382.600307] ? iwl_mvm_scan_umac_v14_and_above+0x169a/0x1730 [iwlmvm]
[ 4382.600384] ? report_bug+0x1aa/0x1e0
[ 4382.600393] ? handle_bug+0x41/0x80
[ 4382.600398] ? exc_invalid_op+0x13/0x40
[ 4382.600404] ? asm_exc_invalid_op+0x16/0x20
[ 4382.600413] ? iwl_mvm_scan_umac_v14_and_above+0x13e3/0x1730 [iwlmvm]
[ 4382.600475] ? iwl_mvm_scan_umac_v14_and_above+0x169a/0x1730 [iwlmvm]
[ 4382.600536] ? iwl_mvm_scan_umac_v14_and_above+0x13e3/0x1730 [iwlmvm]
[ 4382.600597] ? desc_read_finalized_seq+0x69/0xf0
[ 4382.600604] ? desc_read+0x190/0x190
[ 4382.600609] ? rcu_is_watching+0x1f/0x40
[ 4382.600615] ? lock_acquire+0xc4/0x3b0
[ 4382.600621] ? rcu_is_watching+0x1f/0x40
[ 4382.600627] ? __iwl_info+0x1e0/0x1e0 [iwlwifi]
[ 4382.600690] ? ieee80211_iterate_active_interfaces_atomic+0x78/0x100 [mac80211]
[ 4382.600849] ? __iterate_interfaces+0x12e/0x220 [mac80211]
[ 4382.601000] ? iwl_mvm_scan_umac_chan_flags_v2+0x2c0/0x2c0 [iwlmvm]
[ 4382.601063] ? iwl_wait_stats_complete+0x30/0x30 [iwlmvm]
[ 4382.601125] iwl_mvm_build_scan_cmd+0xb21/0x2110 [iwlmvm]
[ 4382.601190] ? iwl_mvm_scan_umac_v12+0x530/0x530 [iwlmvm]
[ 4382.601253] ? iwl_mvm_build_scan_probe+0x39e/0x550 [iwlmvm]
[ 4382.601316] iwl_mvm_reg_scan_start+0x77a/0xa10 [iwlmvm]
[ 4382.601388] ? iwl_mvm_check_running_scans+0x130/0x130 [iwlmvm]
[ 4382.601449] ? stack_access_ok+0x40/0xf0
[ 4382.601482] ? ieee80211_scan_accept_presp+0xb0/0xb0 [mac80211]
[ 4382.601623] ? kasan_unpoison+0x23/0x50
[ 4382.601632] iwl_mvm_mac_hw_scan+0x81/0xa0 [iwlmvm]
[ 4382.601689] drv_hw_scan+0x15e/0x2f0 [mac80211]
[ 4382.601834] __ieee80211_start_scan+0x5bf/0xb10 [mac80211]
[ 4382.601977] ? drv_hw_scan+0x2f0/0x2f0 [mac80211]
[ 4382.602117] ? __kasan_kmalloc+0x83/0x90
[ 4382.602122] ? rcu_is_watching+0x1f/0x40
[ 4382.602128] ? __kmalloc+0xd7/0x150
[ 4382.602136] rdev_scan+0x6d/0x1d0 [cfg80211]
[ 4382.602276] nl80211_trigger_scan+0x673/0xd10 [cfg80211]
[ 4382.602426] genl_family_rcv_msg_doit+0x151/0x1d0
[ 4382.602435] ? genl_family_rcv_msg_attrs_parse.constprop.0+0x120/0x120
[ 4382.602444] ? cap_capable+0x9a/0xd0
[ 4382.602451] ? security_capable+0x29/0x60
[ 4382.602458] genl_rcv_msg+0x281/0x3b0
[ 4382.602465] ? genl_family_rcv_msg_dumpit+0x1e0/0x1e0
[ 4382.602471] ? set_track_prepare+0x3b/0x60
[ 4382.602477] ? kmalloc_reserve+0xa3/0x120
[ 4382.602482] ? __alloc_skb+0xcb/0x210
[ 4382.602487] ? nl80211_assoc_bss+0xf0/0xf0 [cfg80211]
[ 4382.602627] ? nl80211_send_scan_start+0xd0/0xd0 [cfg80211]
[ 4382.602765] ? cfg80211_external_auth_request+0x340/0x340 [cfg80211]
[ 4382.602901] ? do_syscall_64+0x3b/0x110
[ 4382.602906] ? entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 4382.602913] ? ____sys_sendmsg+0x3e6/0x470
[ 4382.602918] ? ___sys_sendmsg+0xdc/0x140
[ 4382.602926] netlink_rcv_skb+0xcf/0x210
[ 4382.602931] ? lock_sync+0xf0/0xf0
[ 4382.602936] ? genl_family_rcv_msg_dumpit+0x1e0/0x1e0
[ 4382.602943] ? netlink_ack+0x840/0x840
[ 4382.602954] ? rwsem_down_read_slowpath+0x920/0x920
[ 4382.602960] ? netlink_deliver_tap+0x9f/0x630
[ 4382.602965] ? netlink_deliver_tap+0xf9/0x630
[ 4382.602973] genl_rcv+0x1f/0x30
[ 4382.602978] netlink_unicast+0x2ee/0x490
[ 4382.602985] ? netlink_attachskb+0x430/0x430
[ 4382.602990] ? __virt_addr_valid+0xab/0x130
[ 4382.602996] ? check_stack_object+0x1d/0x60
[ 4382.603003] ? __check_object_size+0x22c/0x3b0
[ 4382.603010] netlink_sendmsg+0x3ca/0x6f0
[ 4382.603018] ? netlink_unicast+0x490/0x490
[ 4382.603025] ? import_ubuf+0xa9/0xd0
[ 4382.603032] ? __import_iovec+0x256/0x290
[ 4382.603038] ? netlink_unicast+0x490/0x490
[ 4382.603044] __sock_sendmsg+0x6e/0xb0
[ 4382.603050] ____sys_sendmsg+0x3e6/0x470
[ 4382.603056] ? kernel_sendmsg+0x30/0x30
[ 4382.603061] ? __copy_msghdr+0x1a0/0x1a0
[ 4382.603069] ? up_write+0xfb/0x2d0
[ 4382.603076] ___sys_sendmsg+0xdc/0x140
[ 4382.603082] ? __ia32_sys_recvmmsg+0x160/0x160
[ 4382.603089] ? netlink_setsockopt+0x2da/0x420
[ 4382.603096] ? netlink_realloc_groups+0x180/0x180
[ 4382.603101] ? reacquire_held_locks+0x270/0x270
[ 4382.603108] ? sock_do_ioctl+0xaf/0x180
[ 4382.603114] ? rcu_is_watching+0x1f/0x40
[ 4382.603120] ? do_sock_setsockopt+0x162/0x2a0
[ 4382.603125] ? move_addr_to_user+0xf0/0xf0
[ 4382.603133] ? __fget_light+0xd2/0xf0
[ 4382.603141] __sys_sendmsg+0xc4/0x150
[ 4382.603146] ? __sys_sendmsg_sock+0x10/0x10
[ 4382.603152] ? __sys_setsockopt+0x119/0x190
[ 4382.603158] ? __ia32_sys_recv+0x60/0x60
[ 4382.603167] ? lockdep_hardirqs_on_prepare+0xe/0x200
[ 4382.603174] ? syscall_enter_from_user_mode+0x1c/0x50
[ 4382.603180] ? trace_hardirqs_on+0x12/0xe0
[ 4382.603188] do_syscall_64+0x3b/0x110
[ 4382.603193] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 4382.603199] RIP: 0033:0x7f32fb713737
[ 4382.603203] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0
ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 4382.603208] RSP: 002b:00007ffd9313fc08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 4382.603215] RAX: ffffffffffffffda RBX: 0000000001dabc00 RCX: 00007f32fb713737
[ 4382.603219] RDX: 0000000000000000 RSI: 00007ffd9313fc60 RDI: 0000000000000007
[ 4382.603223] RBP: 0000000001e43e40 R08: 0000000000000004 R09: 00000000000000f0
[ 4382.603227] R10: 00007ffd9313fd24 R11: 0000000000000246 R12: 0000000001dabb10
[ 4382.603230] R13: 00007ffd9313fc60 R14: 000000000079cd98 R15: 00007f32fc093000
[ 4382.603239] </TASK>
[ 4382.603243] irq event stamp: 0
[ 4382.603246] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 4382.603250] hardirqs last disabled at (0): [<ffffffff811d9d28>] copy_process+0xf68/0x32f0
[ 4382.603257] softirqs last enabled at (0): [<ffffffff811d9d28>] copy_process+0xf68/0x32f0
[ 4382.603262] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 4382.603266] ---[ end trace 0000000000000000 ]---
[ 4382.603271] iwlwifi 0000:21:00.0: scan failure: Cannot find link info for link-id: 0 active-links: 0x0
[ 4382.611303] iwlwifi 0000:21:00.0: Scan failed! ret -5
[ 4382.615098] wlan15: _ieee80211_start_scan had failure code: -5
[ 4382.615103] scan: cfg80211_scan failed: -5
[ 4383.282455] wlan5: do_stop, canceling dfs-cac-timer-work.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2023-11-16 20:05:49

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] wifi: iwlwifi: Add debugging around scan failure warning.

On Thu, 2023-11-16 at 09:51 -0800, Ben Greear wrote:

> I found a splat with my debugging enabled today, running with the v86 firmware. Looks like
> immediate cause is that active_links was 0x0. Logs are full of it, in fact. Seems to always
> be the same reason.

active_links will be 0 if there isn't MLO, i.e. you're not connected.

> [ 4382.603271] iwlwifi 0000:21:00.0: scan failure: Cannot find link info for link-id: 0 active-links: 0x0
>

The real question is how we ended up with no info for link-id 0, but
like I said, that's a more dynamic question, just knowing the number
doesn't really help with it :)

johannes