2019-12-09 10:28:49

by Kalle Valo

[permalink] [raw]
Subject: Re: 5.5-rc1 oops on boot in 802.11 kernel driver

Hi Steve,

Steve French <[email protected]> writes:

> Noticed this crash in the Linux kernel Wifi driver on boot a few
> minutes ago immediately after updating to latest mainline kernel about
> an hour ago. I didn't see it last week and certainly not in 5.4.

please CC linux-wireless on all wireless related problems, we don't
follow lkml very closely and I found your email just by chance.

Full warning below. Steve is using iwlwifi.

>
> [ 36.904191] ------------[ cut here ]------------
> [ 36.904192] STA e0:3f:49:98:14:80 AC 2 txq pending airtime
> underflow: 4294967132, 260
> [ 36.904231] WARNING: CPU: 11 PID: 680 at
> net/mac80211/sta_info.c:1931
> ieee80211_sta_update_pending_airtime+0x115/0x120 [mac80211]
> [ 36.904232] Modules linked in: uinput ccm xt_CHECKSUM xt_MASQUERADE
> nf_nat_tftp nf_conntrack_tftp tun bridge stp llc
> nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT
> nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack
> ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
> ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
> iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
> nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
> iptable_filter bnep sunrpc xfs vfat fat libcrc32c typec_displayport
> elan_i2c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
> snd_soc_skl kvm snd_hda_codec_hdmi snd_soc_hdac_hda snd_hda_ext_core
> snd_soc_sst_ipc irqbypass snd_soc_sst_dsp snd_soc_acpi_intel_match
> snd_soc_acpi snd_soc_core crct10dif_pclmul snd_hda_codec_realtek
> iTCO_wdt crc32_pclmul iwlmvm snd_hda_codec_generic snd_compress
> iTCO_vendor_support ac97_bus snd_pcm_dmaengine mei_hdcp mei_wdt
> ghash_clmulni_intel
> [ 36.904245] intel_rapl_msr snd_hda_intel snd_intel_dspcfg mac80211
> snd_hda_codec intel_cstate snd_hda_core intel_uncore uvcvideo libarc4
> btusb intel_rapl_perf snd_hwdep videobuf2_vmalloc btrtl snd_seq
> videobuf2_memops btbcm videobuf2_v4l2 btintel videobuf2_common pcspkr
> snd_seq_device wmi_bmof intel_wmi_thunderbolt bluetooth i2c_i801
> iwlwifi videodev snd_pcm thinkpad_acpi thunderbolt mei_me
> intel_lpss_pci cfg80211 ecdh_generic intel_lpss ledtrig_audio joydev
> mei snd_timer mc ecc idma64 snd ucsi_acpi processor_thermal_device
> typec_ucsi intel_rapl_common int3403_thermal soundcore
> intel_pch_thermal intel_soc_dts_iosf typec rfkill int340x_thermal_zone
> int3400_thermal acpi_pad acpi_thermal_rel ip_tables wacom nouveau
> mxm_wmi i2c_algo_bit rtsx_pci_sdmmc mmc_core drm_kms_helper ttm nvme
> drm e1000e crc32c_intel nvme_core serio_raw rtsx_pci wmi
> pinctrl_cannonlake video pinctrl_intel fuse
> [ 36.904263] CPU: 11 PID: 680 Comm: irq/162-iwlwifi Not tainted
> 5.5.0-0.rc0.git7.1.vanilla.knurd.1.fc31.x86_64 #1
> [ 36.904263] Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS
> N2CET48W (1.31 ) 11/04/2019
> [ 36.904271] RIP:
> 0010:ieee80211_sta_update_pending_airtime+0x115/0x120 [mac80211]
> [ 36.904272] Code: 3f e3 0f 0b 8b 44 24 04 eb 9d 48 83 c6 40 41 89
> e8 89 c1 48 c7 c7 20 dd d5 c0 89 44 24 04 c6 05 35 99 0b 00 01 e8 bd
> cc 3f e3 <0f> 0b 8b 44 24 04 eb 8c 0f 1f 00 0f 1f 44 00 00 55 89 f5 53
> 48 89
> [ 36.904273] RSP: 0018:ffffa4e3809c7ba8 EFLAGS: 00010286
> [ 36.904273] RAX: 0000000000000000 RBX: 00000000fffffefc RCX: 0000000000000000
> [ 36.904274] RDX: 0000000000000049 RSI: ffffffffa6150489 RDI: 0000000000000246
> [ 36.904274] RBP: 0000000000000104 R08: 0000000897a94ba5 R09: 0000000000000049
> [ 36.904275] R10: 0000000000000002 R11: 00000000a6150443 R12: ffff8a0774e607c0
> [ 36.904275] R13: 0000000000000002 R14: ffff8a07778025c8 R15: ffff8a07044f3000
> [ 36.904276] FS: 0000000000000000(0000) GS:ffff8a077bcc0000(0000)
> knlGS:0000000000000000
> [ 36.904276] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 36.904277] CR2: 00003446319a8000 CR3: 000000042c60a004 CR4: 00000000003606e0
> [ 36.904277] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 36.904277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 36.904278] Call Trace:
> [ 36.904286] __ieee80211_tx_status+0x720/0x900 [mac80211]
> [ 36.904293] ieee80211_tx_status+0x6a/0x90 [mac80211]
> [ 36.904299] iwl_mvm_tx_reclaim+0x2a2/0x3c0 [iwlmvm]
> [ 36.904304] iwl_mvm_rx_ba_notif+0xff/0x2e0 [iwlmvm]
> [ 36.904307] ? __switch_to_asm+0x40/0x70
> [ 36.904311] iwl_mvm_rx_common+0xdd/0x2c0 [iwlmvm]
> [ 36.904317] iwl_pcie_rx_handle+0x3af/0x9c0 [iwlwifi]
> [ 36.904319] ? irq_finalize_oneshot.part.0+0xf0/0xf0
> [ 36.904322] iwl_pcie_irq_rx_msix_handler+0x45/0x80 [iwlwifi]
> [ 36.904323] irq_thread_fn+0x20/0x60
> [ 36.904325] irq_thread+0xdc/0x170
> [ 36.904326] ? irq_forced_thread_fn+0x80/0x80
> [ 36.904327] kthread+0xf9/0x130
> [ 36.904328] ? irq_thread_check_affinity+0xf0/0xf0
> [ 36.904329] ? kthread_park+0x90/0x90
> [ 36.904330] ret_from_fork+0x35/0x40
> [ 36.904332] ---[ end trace ce4946b8661ca7fd ]---
> [ 36.904332] ------------[ cut here ]------------

--
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


2019-12-09 11:11:26

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: 5.5-rc1 oops on boot in 802.11 kernel driver

Kalle Valo <[email protected]> writes:

> Hi Steve,
>
> Steve French <[email protected]> writes:
>
>> Noticed this crash in the Linux kernel Wifi driver on boot a few
>> minutes ago immediately after updating to latest mainline kernel about
>> an hour ago. I didn't see it last week and certainly not in 5.4.
>
> please CC linux-wireless on all wireless related problems, we don't
> follow lkml very closely and I found your email just by chance.
>
> Full warning below. Steve is using iwlwifi.

Right, we already got a similar report off-list, but with a different
stack trace. I was going to try to reproduce this on my own machine
today. However, the fact that this includes the iwl_mvm_tx_reclaim()
function may be a hint; that code seems to be reusing skbs without
freeing them?

If I'm reading the code correctly, it seems the reuse leads to the same
skb being passed to ieee80211_tx_status() multiple times; the driver is
clearing info->status, but since we added the info->tx_time_est field,
that would lead to double-accounting of that SKB, which would explain
the warning?

Can someone familiar with iwlwifi confirm that this is indeed what that
code is supposed to be doing? If it is, I think it needs the patch
below; however, if I'm wrong, then clearing the field could lead to the
opposite problem (that skbs fail to be accounted at all), which would
lead to the queue being throttled because the limit gets too high and is
never brought back down...

-Toke


diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
index dc5c02fbc65a..7d822445730c 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
@@ -1848,6 +1848,7 @@ static void iwl_mvm_tx_reclaim(struct iwl_mvm *mvm, int sta_id, int tid,
iwl_trans_free_tx_cmd(mvm->trans, info->driver_data[1]);

memset(&info->status, 0, sizeof(info->status));
+ info->tx_time_est = 0;
/* Packet was transmitted successfully, failures come as single
* frames because before failing a frame the firmware transmits
* it without aggregation at least once.

2019-12-09 14:12:32

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: 5.5-rc1 oops on boot in 802.11 kernel driver

Toke Høiland-Jørgensen <[email protected]> writes:

> Kalle Valo <[email protected]> writes:
>
>> Hi Steve,
>>
>> Steve French <[email protected]> writes:
>>
>>> Noticed this crash in the Linux kernel Wifi driver on boot a few
>>> minutes ago immediately after updating to latest mainline kernel about
>>> an hour ago. I didn't see it last week and certainly not in 5.4.
>>
>> please CC linux-wireless on all wireless related problems, we don't
>> follow lkml very closely and I found your email just by chance.
>>
>> Full warning below. Steve is using iwlwifi.
>
> Right, we already got a similar report off-list, but with a different
> stack trace. I was going to try to reproduce this on my own machine
> today. However, the fact that this includes the iwl_mvm_tx_reclaim()
> function may be a hint; that code seems to be reusing skbs without
> freeing them?
>
> If I'm reading the code correctly, it seems the reuse leads to the same
> skb being passed to ieee80211_tx_status() multiple times; the driver is
> clearing info->status, but since we added the info->tx_time_est field,
> that would lead to double-accounting of that SKB, which would explain
> the warning?
>
> Can someone familiar with iwlwifi confirm that this is indeed what that
> code is supposed to be doing? If it is, I think it needs the patch
> below; however, if I'm wrong, then clearing the field could lead to the
> opposite problem (that skbs fail to be accounted at all), which would
> lead to the queue being throttled because the limit gets too high and is
> never brought back down...

Right, and now I did boot up my own laptop with the -next kernel, and
tested the patch. It definitely breaks things, so that was not the
issue. However, I don't get the WARN_ON either, so don't have any better
ideas. I guess we'll have to wait for someone who actually knows the
iwlwifi driver to take a look at this :)

-Toke