2015-04-22 20:42:37

by Jiri Kosina

[permalink] [raw]
Subject: iwlwifi getting stuck with current Linus' tree (646da63172)

Hi,

I've been running current Linus' tree and have been getting system lockups
frequently. After a few "silent" lockups, I was able to obtain a dmesg
before the machine turned dead again (wifi stopped working shortly before
that).

Before starting to debug / bisect (last known good on this machine is
4.0-rc6), I am attaching the dmesg in case someone already knows what the
issue is.

iwlwifi 0000:03:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.
iwlwifi 0000:03:00.0: Current CMD queue read_ptr 235 write_ptr 237
iwlwifi 0000:03:00.0: Loaded firmware version: 8.83.5.1 build 33692
iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
iwlwifi 0000:03:00.0: Status: 0x0000004C, count: -1
iwlwifi 0000:03:00.0: 0xFFFFFFFF | ADVANCED_SYSASSERT
iwlwifi 0000:03:00.0: 0xFFFFFFFF | uPc
iwlwifi 0000:03:00.0: 0xFFFFFFFF | branchlink1
iwlwifi 0000:03:00.0: 0xFFFFFFFF | branchlink2
iwlwifi 0000:03:00.0: 0xFFFFFFFF | interruptlink1
iwlwifi 0000:03:00.0: 0xFFFFFFFF | interruptlink2
iwlwifi 0000:03:00.0: 0xFFFFFFFF | data1
iwlwifi 0000:03:00.0: 0xFFFFFFFF | data2
iwlwifi 0000:03:00.0: 0xFFFFFFFF | line
iwlwifi 0000:03:00.0: 0xFFFFFFFF | beacon time
iwlwifi 0000:03:00.0: 0xFFFFFFFF | tsf low
iwlwifi 0000:03:00.0: 0xFFFFFFFF | tsf hi
iwlwifi 0000:03:00.0: 0xFFFFFFFF | time gp1
iwlwifi 0000:03:00.0: 0xFFFFFFFF | time gp2
iwlwifi 0000:03:00.0: 0xFFFFFFFF | time gp3
iwlwifi 0000:03:00.0: 0xFFFFFFFF | uCode version
iwlwifi 0000:03:00.0: 0xFFFFFFFF | hw version
iwlwifi 0000:03:00.0: 0xFFFFFFFF | board version
iwlwifi 0000:03:00.0: 0xFFFFFFFF | hcmd
iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr0
iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr1
iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr2
iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr3
iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr4
iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr_pref
iwlwifi 0000:03:00.0: 0xFFFFFFFF | wait_event
iwlwifi 0000:03:00.0: 0xFFFFFFFF | l2p_control
iwlwifi 0000:03:00.0: 0xFFFFFFFF | l2p_duration
iwlwifi 0000:03:00.0: 0xFFFFFFFF | l2p_mhvalid
iwlwifi 0000:03:00.0: 0xFFFFFFFF | l2p_addr_match
iwlwifi 0000:03:00.0: 0xFFFFFFFF | lmpm_pmg_sel
iwlwifi 0000:03:00.0: 0xFFFFFFFF | timestamp
iwlwifi 0000:03:00.0: 0xFFFFFFFF | flow_handler
iwlwifi 0000:03:00.0: Log capacity -1 is bogus, limit to 256 entries
iwlwifi 0000:03:00.0: Log write index -1 is bogus, limit to 256
iwlwifi 0000:03:00.0: Start IWL Event Log Dump: display last 20 entries
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
wlan0: HW problem - can not stop rx aggregation for 00:b0:0c:4e:a5:28 tid 0
ieee80211 phy0: Hardware restart was requested
iwlwifi 0000:03:00.0: iwl_trans_wait_tx_queue_empty bad state = 0
iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
iwlwifi 0000:03:00.0: Failed to load firmware chunk!
iwlwifi 0000:03:00.0: Could not load the [0] uCode section
iwlwifi 0000:03:00.0: Failed to run INIT ucode: -110
iwlwifi 0000:03:00.0: Unable to initialize device.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 30 at net/mac80211/util.c:1800 ieee80211_reconfig+0x66a/0x2a00 [mac80211]()
Hardware became unavailable during restart.
Modules linked in: ctr ccm fuse tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defra
e_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack af_packet iptable_filter ip_tables x_tables iTCO_wdt iTCO_vendor_support iwldvm mac80211 kvm_intel kvm snd_hda_codec_conexant snd_hda_codec_generic iwlwifi microcode snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm cfg80211 thinkpad_acpi snd_seq pcspkr i2c_i801 lpc_ich mfd_core snd_seq_device snd_hda_core snd_timer e1000e snd ptp pps_core thermal wmi ac soundcore rfkill battery tpm_tis tpm acpi_cpufreq processor dm_mod ehci_pci uhci_hcd ehci_hcd i915 i2c_algo_bit drm_kms_helper ata_generic usbcore usb_common drm video thermal_sys button sg autofs4
CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 4.0.0-09000-g646da63 #1
Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
Workqueue: events ieee80211_restart_work [mac80211]
ffffffffc0721b26 ffff880079443c28 ffffffff81686cc3 0000000000000000
ffff880079443c78 ffff880079443c68 ffffffff8105e8fa ffffffff81c44500
ffff880076091e18 00000000ffffff92 ffff88007c215b80 ffff88007c21e500
Call Trace:
[<ffffffff81686cc3>] dump_stack+0x4c/0x65
[<ffffffff8105e8fa>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8105e976>] warn_slowpath_fmt+0x46/0x50
[<ffffffffc06c2e1c>] ? trace_drv_return_int+0x18c/0x1c0 [mac80211]
[<ffffffffc06c7aba>] ieee80211_reconfig+0x66a/0x2a00 [mac80211]
[<ffffffff8168cbee>] ? mutex_unlock+0xe/0x10
[<ffffffffc0677122>] ieee80211_restart_work+0x42/0x70 [mac80211]
[<ffffffff8107ce58>] process_one_work+0x1d8/0x800
[<ffffffff8107cdcb>] ? process_one_work+0x14b/0x800
[<ffffffff8107d59b>] worker_thread+0x11b/0x470
[<ffffffff8107d480>] ? process_one_work+0x800/0x800
[<ffffffff81083b9f>] kthread+0xef/0x110
[<ffffffff81083ab0>] ? kthread_create_on_node+0x210/0x210
[<ffffffff8168fce2>] ret_from_fork+0x42/0x70
[<ffffffff81083ab0>] ? kthread_create_on_node+0x210/0x210
---[ end trace 5cececd09b834ad1 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 30 at net/mac80211/driver-ops.h:12 ieee80211_do_stop+0xad4/0xb80 [mac80211]()
wlan0: Failed check-sdata-in-driver check, flags: 0x4
Modules linked in: ctr ccm fuse tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack af_packet iptable_filter ip_tables x_tables iTCO_wdt iTCO_vendor_support iwldvm mac80211 kvm_intel kvm snd_hda_codec_conexant snd_hda_codec_generic iwlwifi microcode snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm cfg80211 thinkpad_acpi snd_seq pcspkr i2c_i801 lpc_ich mfd_core snd_seq_device snd_hda_core snd_timer e1000e snd ptp pps_core thermal wmi ac soundcore rfkill battery tpm_tis tpm acpi_cpufreq processor dm_mod ehci_pci uhci_hcd ehci_hcd i915 i2c_algo_bit drm_kms_helper ata_generic usbcore usb_common drm video thermal_sys button sg autofs4
CPU: 0 PID: 30 Comm: kworker/0:1 Tainted: G W 4.0.0-09000-g646da63 #1
Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
Workqueue: events ieee80211_restart_work [mac80211]
ffffffffc07215fa ffff880079443a78 ffffffff81686cc3 0000000000000000
ffff880079443ac8 ffff880079443ab8 ffffffff8105e8fa 0000000000000246
ffff88006a6009c0 ffff8800760909e0 0000000000000000 ffff880076091780
Call Trace:
[<ffffffff81686cc3>] dump_stack+0x4c/0x65
[<ffffffff8105e8fa>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8105e976>] warn_slowpath_fmt+0x46/0x50
[<ffffffff8108ae2d>] ? __might_sleep+0x4d/0x90
[<ffffffffc069d8e4>] ieee80211_do_stop+0xad4/0xb80 [mac80211]
[<ffffffff815a54a7>] ? dev_deactivate_many+0x257/0x310
[<ffffffffc069d9aa>] ieee80211_stop+0x1a/0x20 [mac80211]
[<ffffffff81577645>] __dev_close_many+0x95/0xe0
[<ffffffff81577718>] dev_close_many+0x88/0x120
[<ffffffff81579d46>] dev_close.part.77+0x36/0x50
[<ffffffff81579d81>] dev_close+0x21/0x30
[<ffffffffc03896ad>] cfg80211_shutdown_all_interfaces+0x4d/0xc0 [cfg80211]
[<ffffffffc06c2675>] ieee80211_handle_reconfig_failure+0x95/0xa0 [mac80211]
[<ffffffffc06c753a>] ieee80211_reconfig+0xea/0x2a00 [mac80211]
[<ffffffff8168cbee>] ? mutex_unlock+0xe/0x10
[<ffffffffc0677122>] ieee80211_restart_work+0x42/0x70 [mac80211]
[<ffffffff8107ce58>] process_one_work+0x1d8/0x800
[<ffffffff8107cdcb>] ? process_one_work+0x14b/0x800
[<ffffffff8107d59b>] worker_thread+0x11b/0x470
[<ffffffff8107d480>] ? process_one_work+0x800/0x800
[<ffffffff81083b9f>] kthread+0xef/0x110
[<ffffffff81083ab0>] ? kthread_create_on_node+0x210/0x210
[<ffffffff8168fce2>] ret_from_fork+0x42/0x70
[<ffffffff81083ab0>] ? kthread_create_on_node+0x210/0x210
---[ end trace 5cececd09b834ad2 ]---
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
cfg80211: DFS Master region: unset
cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
cfg80211: (5170000 KHz - 5250000 KHz @ 160000 KHz), (N/A, 2000 mBm), (N/A)
cfg80211: (5250000 KHz - 5330000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
cfg80211: (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
iwlwifi 0000:03:00.0: Failed to load firmware chunk!
iwlwifi 0000:03:00.0: Could not load the [0] uCode section
iwlwifi 0000:03:00.0: Failed to run INIT ucode: -110
iwlwifi 0000:03:00.0: Unable to initialize device.
iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
iwlwifi 0000:03:00.0: Failed to load firmware chunk!
iwlwifi 0000:03:00.0: Could not load the [0] uCode section
iwlwifi 0000:03:00.0: Failed to run INIT ucode: -110
iwlwifi 0000:03:00.0: Unable to initialize device.
e1000e: eth0 NIC Link is Down
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0

--
Jiri Kosina
SUSE Labs


2015-04-23 04:47:29

by Grumbach, Emmanuel

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

Hi,


On Wed, 2015-04-22 at 22:42 +0200, Jiri Kosina wrote:
> Hi,
>
> I've been running current Linus' tree and have been getting system lockups
> frequently. After a few "silent" lockups, I was able to obtain a dmesg
> before the machine turned dead again (wifi stopped working shortly before
> that).
>
> Before starting to debug / bisect (last known good on this machine is
> 4.0-rc6), I am attaching the dmesg in case someone already knows what the
> issue is.
>

I briefly went over the iwlwifi commits between 4.0-rc6 and linux/master
and couldn't find anything obvious.
Note that for the device you have, the commits that touch
drivers/net/wireless/iwlwifi/mvm are not relevant.

What you are seeing is that the PCI host is disconnecting the WiFi NIC
for some weird reason. It is not the first time I see that, but
unfortunately, I have never been able to debug this. I am personally not
a HW PCI expert and I couldn't reproduce either...

I am afraid I won't save you the time of the bisection, but I am not
entirely sure that bisecting the iwlwifi driver is enough to find the
commit that broke it. You may want to bisect the pci bus driver as well.
First question is: Are you sure that 4.0-rc6 was good?

Let me know if you find something.

> iwlwifi 0000:03:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.
> iwlwifi 0000:03:00.0: Current CMD queue read_ptr 235 write_ptr 237
> iwlwifi 0000:03:00.0: Loaded firmware version: 8.83.5.1 build 33692
> iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
> iwlwifi 0000:03:00.0: Status: 0x0000004C, count: -1
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | ADVANCED_SYSASSERT
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | uPc
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | branchlink1
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | branchlink2
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | interruptlink1
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | interruptlink2
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | data1
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | data2
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | line
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | beacon time
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | tsf low
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | tsf hi
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | time gp1
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | time gp2
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | time gp3
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | uCode version
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | hw version
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | board version
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | hcmd
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr0
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr1
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr2
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr3
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr4
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | isr_pref
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | wait_event
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | l2p_control
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | l2p_duration
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | l2p_mhvalid
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | l2p_addr_match
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | lmpm_pmg_sel
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | timestamp
> iwlwifi 0000:03:00.0: 0xFFFFFFFF | flow_handler
> iwlwifi 0000:03:00.0: Log capacity -1 is bogus, limit to 256 entries
> iwlwifi 0000:03:00.0: Log write index -1 is bogus, limit to 256
> iwlwifi 0000:03:00.0: Start IWL Event Log Dump: display last 20 entries
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> iwlwifi 0000:03:00.0: EVT_LOGT:4294967295:0xffffffff:4294967295
> wlan0: HW problem - can not stop rx aggregation for 00:b0:0c:4e:a5:28 tid 0
> ieee80211 phy0: Hardware restart was requested
> iwlwifi 0000:03:00.0: iwl_trans_wait_tx_queue_empty bad state = 0
> iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
> iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
> iwlwifi 0000:03:00.0: Failed to load firmware chunk!
> iwlwifi 0000:03:00.0: Could not load the [0] uCode section
> iwlwifi 0000:03:00.0: Failed to run INIT ucode: -110
> iwlwifi 0000:03:00.0: Unable to initialize device.
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 30 at net/mac80211/util.c:1800 ieee80211_reconfig+0x66a/0x2a00 [mac80211]()
> Hardware became unavailable during restart.
> Modules linked in: ctr ccm fuse tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defra
> e_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack af_packet iptable_filter ip_tables x_tables iTCO_wdt iTCO_vendor_support iwldvm mac80211 kvm_intel kvm snd_hda_codec_conexant snd_hda_codec_generic iwlwifi microcode snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm cfg80211 thinkpad_acpi snd_seq pcspkr i2c_i801 lpc_ich mfd_core snd_seq_device snd_hda_core snd_timer e1000e snd ptp pps_core thermal wmi ac soundcore rfkill battery tpm_tis tpm acpi_cpufreq processor dm_mod ehci_pci uhci_hcd ehci_hcd i915 i2c_algo_bit drm_kms_helper ata_generic usbcore usb_common drm video thermal_sys button sg autofs4
> CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 4.0.0-09000-g646da63 #1
> Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
> Workqueue: events ieee80211_restart_work [mac80211]
> ffffffffc0721b26 ffff880079443c28 ffffffff81686cc3 0000000000000000
> ffff880079443c78 ffff880079443c68 ffffffff8105e8fa ffffffff81c44500
> ffff880076091e18 00000000ffffff92 ffff88007c215b80 ffff88007c21e500
> Call Trace:
> [<ffffffff81686cc3>] dump_stack+0x4c/0x65
> [<ffffffff8105e8fa>] warn_slowpath_common+0x8a/0xc0
> [<ffffffff8105e976>] warn_slowpath_fmt+0x46/0x50
> [<ffffffffc06c2e1c>] ? trace_drv_return_int+0x18c/0x1c0 [mac80211]
> [<ffffffffc06c7aba>] ieee80211_reconfig+0x66a/0x2a00 [mac80211]
> [<ffffffff8168cbee>] ? mutex_unlock+0xe/0x10
> [<ffffffffc0677122>] ieee80211_restart_work+0x42/0x70 [mac80211]
> [<ffffffff8107ce58>] process_one_work+0x1d8/0x800
> [<ffffffff8107cdcb>] ? process_one_work+0x14b/0x800
> [<ffffffff8107d59b>] worker_thread+0x11b/0x470
> [<ffffffff8107d480>] ? process_one_work+0x800/0x800
> [<ffffffff81083b9f>] kthread+0xef/0x110
> [<ffffffff81083ab0>] ? kthread_create_on_node+0x210/0x210
> [<ffffffff8168fce2>] ret_from_fork+0x42/0x70
> [<ffffffff81083ab0>] ? kthread_create_on_node+0x210/0x210
> ---[ end trace 5cececd09b834ad1 ]---
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 30 at net/mac80211/driver-ops.h:12 ieee80211_do_stop+0xad4/0xb80 [mac80211]()
> wlan0: Failed check-sdata-in-driver check, flags: 0x4
> Modules linked in: ctr ccm fuse tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack af_packet iptable_filter ip_tables x_tables iTCO_wdt iTCO_vendor_support iwldvm mac80211 kvm_intel kvm snd_hda_codec_conexant snd_hda_codec_generic iwlwifi microcode snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm cfg80211 thinkpad_acpi snd_seq pcspkr i2c_i801 lpc_ich mfd_core snd_seq_device snd_hda_core snd_timer e1000e snd ptp pps_core thermal wmi ac soundcore rfkill battery tpm_tis tpm acpi_cpufreq processor dm_mod ehci_pci uhci_hcd ehci_hcd i915 i2c_algo_bit drm_kms_helper ata_generic usbcore usb_common drm video thermal_sys button sg autofs4
> CPU: 0 PID: 30 Comm: kworker/0:1 Tainted: G W 4.0.0-09000-g646da63 #1
> Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
> Workqueue: events ieee80211_restart_work [mac80211]
> ffffffffc07215fa ffff880079443a78 ffffffff81686cc3 0000000000000000
> ffff880079443ac8 ffff880079443ab8 ffffffff8105e8fa 0000000000000246
> ffff88006a6009c0 ffff8800760909e0 0000000000000000 ffff880076091780
> Call Trace:
> [<ffffffff81686cc3>] dump_stack+0x4c/0x65
> [<ffffffff8105e8fa>] warn_slowpath_common+0x8a/0xc0
> [<ffffffff8105e976>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff8108ae2d>] ? __might_sleep+0x4d/0x90
> [<ffffffffc069d8e4>] ieee80211_do_stop+0xad4/0xb80 [mac80211]
> [<ffffffff815a54a7>] ? dev_deactivate_many+0x257/0x310
> [<ffffffffc069d9aa>] ieee80211_stop+0x1a/0x20 [mac80211]
> [<ffffffff81577645>] __dev_close_many+0x95/0xe0
> [<ffffffff81577718>] dev_close_many+0x88/0x120
> [<ffffffff81579d46>] dev_close.part.77+0x36/0x50
> [<ffffffff81579d81>] dev_close+0x21/0x30
> [<ffffffffc03896ad>] cfg80211_shutdown_all_interfaces+0x4d/0xc0 [cfg80211]
> [<ffffffffc06c2675>] ieee80211_handle_reconfig_failure+0x95/0xa0 [mac80211]
> [<ffffffffc06c753a>] ieee80211_reconfig+0xea/0x2a00 [mac80211]
> [<ffffffff8168cbee>] ? mutex_unlock+0xe/0x10
> [<ffffffffc0677122>] ieee80211_restart_work+0x42/0x70 [mac80211]
> [<ffffffff8107ce58>] process_one_work+0x1d8/0x800
> [<ffffffff8107cdcb>] ? process_one_work+0x14b/0x800
> [<ffffffff8107d59b>] worker_thread+0x11b/0x470
> [<ffffffff8107d480>] ? process_one_work+0x800/0x800
> [<ffffffff81083b9f>] kthread+0xef/0x110
> [<ffffffff81083ab0>] ? kthread_create_on_node+0x210/0x210
> [<ffffffff8168fce2>] ret_from_fork+0x42/0x70
> [<ffffffff81083ab0>] ? kthread_create_on_node+0x210/0x210
> ---[ end trace 5cececd09b834ad2 ]---
> cfg80211: Calling CRDA to update world regulatory domain
> cfg80211: World regulatory domain updated:
> cfg80211: DFS Master region: unset
> cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
> cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
> cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
> cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
> cfg80211: (5170000 KHz - 5250000 KHz @ 160000 KHz), (N/A, 2000 mBm), (N/A)
> cfg80211: (5250000 KHz - 5330000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
> cfg80211: (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
> cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
> cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
> iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
> iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
> iwlwifi 0000:03:00.0: Failed to load firmware chunk!
> iwlwifi 0000:03:00.0: Could not load the [0] uCode section
> iwlwifi 0000:03:00.0: Failed to run INIT ucode: -110
> iwlwifi 0000:03:00.0: Unable to initialize device.
> iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
> iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
> iwlwifi 0000:03:00.0: Failed to load firmware chunk!
> iwlwifi 0000:03:00.0: Could not load the [0] uCode section
> iwlwifi 0000:03:00.0: Failed to run INIT ucode: -110
> iwlwifi 0000:03:00.0: Unable to initialize device.
> e1000e: eth0 NIC Link is Down
> IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> iwlwifi 0000:03:00.0: L1 Disabled - LTR Disabled
> iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-04-23 08:15:47

by Jiri Kosina

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

On Thu, 23 Apr 2015, Grumbach, Emmanuel wrote:

> > I've been running current Linus' tree and have been getting system lockups
> > frequently. After a few "silent" lockups, I was able to obtain a dmesg
> > before the machine turned dead again (wifi stopped working shortly before
> > that).
> >
> > Before starting to debug / bisect (last known good on this machine is
> > 4.0-rc6), I am attaching the dmesg in case someone already knows what the
> > issue is.
> >
>
> I briefly went over the iwlwifi commits between 4.0-rc6 and linux/master
> and couldn't find anything obvious.
> Note that for the device you have, the commits that touch
> drivers/net/wireless/iwlwifi/mvm are not relevant.
>
> What you are seeing is that the PCI host is disconnecting the WiFi NIC
> for some weird reason. It is not the first time I see that, but
> unfortunately, I have never been able to debug this. I am personally not
> a HW PCI expert and I couldn't reproduce either...
>
> I am afraid I won't save you the time of the bisection, but I am not
> entirely sure that bisecting the iwlwifi driver is enough to find the
> commit that broke it. You may want to bisect the pci bus driver as well.

The problem is that I can't really reliably reproduce it; it happens
rather often, but not so often that I could be certainly sure that my
distinction of good and bad kernels would be accurate.

I will try it, but I expect the result to be bogus because of this,
unfortunately.

> First question is: Are you sure that 4.0-rc6 was good?

Pretty much, yes. I've been running it for quite some time on this
machine without any issues. But after updating to current HEAD two days
ago, the issue triggered like 6 or 7 times already.

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-23 09:12:58

by Grumbach, Emmanuel

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

On Thu, 2015-04-23 at 10:15 +0200, Jiri Kosina wrote:
> On Thu, 23 Apr 2015, Grumbach, Emmanuel wrote:
>
> > > I've been running current Linus' tree and have been getting system lockups
> > > frequently. After a few "silent" lockups, I was able to obtain a dmesg
> > > before the machine turned dead again (wifi stopped working shortly before
> > > that).
> > >
> > > Before starting to debug / bisect (last known good on this machine is
> > > 4.0-rc6), I am attaching the dmesg in case someone already knows what the
> > > issue is.
> > >
> >
> > I briefly went over the iwlwifi commits between 4.0-rc6 and linux/master
> > and couldn't find anything obvious.
> > Note that for the device you have, the commits that touch
> > drivers/net/wireless/iwlwifi/mvm are not relevant.
> >
> > What you are seeing is that the PCI host is disconnecting the WiFi NIC
> > for some weird reason. It is not the first time I see that, but
> > unfortunately, I have never been able to debug this. I am personally not
> > a HW PCI expert and I couldn't reproduce either...
> >
> > I am afraid I won't save you the time of the bisection, but I am not
> > entirely sure that bisecting the iwlwifi driver is enough to find the
> > commit that broke it. You may want to bisect the pci bus driver as well.
>
> The problem is that I can't really reliably reproduce it; it happens
> rather often, but not so often that I could be certainly sure that my
> distinction of good and bad kernels would be accurate.
>
> I will try it, but I expect the result to be bogus because of this,
> unfortunately.
>

I can understand. A few users reported that this bug occurred more
reliably when moving their system, although it seems very weird to me.

> > First question is: Are you sure that 4.0-rc6 was good?
>
> Pretty much, yes. I've been running it for quite some time on this
> machine without any issues. But after updating to current HEAD two days
> ago, the issue triggered like 6 or 7 times already.
>

Ok - I will try to look at the PCI commits there although I am not sure
I'll be able to make much sense of them...

> Thanks,
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-04-23 09:15:05

by Jiri Kosina

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

On Thu, 23 Apr 2015, Grumbach, Emmanuel wrote:

> > I will try it, but I expect the result to be bogus because of this,
> > unfortunately.
>
> I can understand. A few users reported that this bug occurred more
> reliably when moving their system, although it seems very weird to me.

My "feeling" was that it sometimes was related to start of audio playback,
but not 100% relieble either.

> > > First question is: Are you sure that 4.0-rc6 was good?
> >
> > Pretty much, yes. I've been running it for quite some time on this
> > machine without any issues. But after updating to current HEAD two days
> > ago, the issue triggered like 6 or 7 times already.
>
> Ok - I will try to look at the PCI commits there although I am not sure
> I'll be able to make much sense of them...

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-23 12:48:51

by Piotr Karbowski

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

Hello,

On Thu, Apr 23, 2015 at 11:15 AM, Jiri Kosina <[email protected]> wrote:
> On Thu, 23 Apr 2015, Grumbach, Emmanuel wrote:
>
>> > I will try it, but I expect the result to be bogus because of this,
>> > unfortunately.
>>
>> I can understand. A few users reported that this bug occurred more
>> reliably when moving their system, although it seems very weird to me.
>
> My "feeling" was that it sometimes was related to start of audio playback,
> but not 100% relieble either.
>
>> > > First question is: Are you sure that 4.0-rc6 was good?
>> >
>> > Pretty much, yes. I've been running it for quite some time on this
>> > machine without any issues. But after updating to current HEAD two days
>> > ago, the issue triggered like 6 or 7 times already.
>>
>> Ok - I will try to look at the PCI commits there although I am not sure
>> I'll be able to make much sense of them...
>
> Thanks,

i've also started noticing hard system lockups with iwlwifi. What's
worth noticing is that I've started to see this on 3.19 and I also
have it on 4.0 kernel thus allow me to report it in this very thread.
I had over month uptime, untill I've updated iwl ucode, maybe this was
introduced not by kernel but firmware?. The lockups are random and
result in hard system lockup, frozen Xorg, screen artifacts/glitches,
audio playback keeps repeating last second or so of music, sysrq
cannot handle it as well as auto-reboot after kernel panic does not
kicks in.

The hardware I see this problem on is iwl 7260. The only workaround
I've found so far was to disable power save via `iw dev wlan0 set
power_save off` and not a single lockup since. What's rather
disturbing is that by 'modinfo iwlwifi' it does say that power_save is
default off, but it is not. I am sure that nothing else enables
power_save on my system because I don't even have udev, and for the
test I've also disabled wicd, then simple `modprobe iwlwifi` and `iw
dev wlan0 get power_save` reports 'on'.

I also have another system iwl 6205 and no single lockup was noted,
but it indeed have power_save disabled, so that may be the case.

-- Piotr.

2015-04-23 18:10:20

by Grumbach, Emmanuel

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

On Thu, 2015-04-23 at 14:48 +0200, Piotr Karbowski wrote:
> Hello,
>
> On Thu, Apr 23, 2015 at 11:15 AM, Jiri Kosina <[email protected]> wrote:
> > On Thu, 23 Apr 2015, Grumbach, Emmanuel wrote:
> >
> >> > I will try it, but I expect the result to be bogus because of this,
> >> > unfortunately.
> >>
> >> I can understand. A few users reported that this bug occurred more
> >> reliably when moving their system, although it seems very weird to me.
> >
> > My "feeling" was that it sometimes was related to start of audio playback,
> > but not 100% relieble either.
> >
> >> > > First question is: Are you sure that 4.0-rc6 was good?
> >> >
> >> > Pretty much, yes. I've been running it for quite some time on this
> >> > machine without any issues. But after updating to current HEAD two days
> >> > ago, the issue triggered like 6 or 7 times already.
> >>
> >> Ok - I will try to look at the PCI commits there although I am not sure
> >> I'll be able to make much sense of them...
> >
> > Thanks,
>
> i've also started noticing hard system lockups with iwlwifi. What's
> worth noticing is that I've started to see this on 3.19 and I also
> have it on 4.0 kernel thus allow me to report it in this very thread.
> I had over month uptime, untill I've updated iwl ucode, maybe this was
> introduced not by kernel but firmware?. The lockups are random and
> result in hard system lockup, frozen Xorg, screen artifacts/glitches,
> audio playback keeps repeating last second or so of music, sysrq
> cannot handle it as well as auto-reboot after kernel panic does not
> kicks in.
>

I don't see how the ucode can cause crashes, but a new ucode can make
the driver behave differently (new APIs being available) and that can
cause crashes. Please share any logs you may have.

> The hardware I see this problem on is iwl 7260. The only workaround
> I've found so far was to disable power save via `iw dev wlan0 set
> power_save off` and not a single lockup since. What's rather
> disturbing is that by 'modinfo iwlwifi' it does say that power_save is
> default off, but it is not. I am sure that nothing else enables
> power_save on my system because I don't even have udev, and for the
> test I've also disabled wicd, then simple `modprobe iwlwifi` and `iw
> dev wlan0 get power_save` reports 'on'.

Yeah - I know. This is confusing. This module parameter doesn't affect
7260: it uses iwlmvm module which its own setting. iwlwifi's power_save
parameter serves older devices such as 6205 which you seem to have as
well.
In 6205's case, power save is disabled by default. But all this seems
very far away from the original thread.

>
> I also have another system iwl 6205 and no single lockup was noted,
> but it indeed have power_save disabled, so that may be the case.
>
> -- Piotr.

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-03 21:42:24

by Jiri Kosina

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

Hi,

so over a past few days, I tried to perform a bisect, but failed as
expected. The issue is not reliably enough reproducible for me, so I ended
up with a completely bogus commit.

*However*, now that I have been following the causes and symptoms more
closely (due to the bisect attempt), I have to confirm that the issue
happens much more often when the machine is question is physically being
moved when associated.

--
Jiri Kosina
SUSE Labs

2015-05-04 04:32:21

by Grumbach, Emmanuel

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

On Sun, 2015-05-03 at 23:42 +0200, Jiri Kosina wrote:
> Hi,
>
> so over a past few days, I tried to perform a bisect, but failed as
> expected. The issue is not reliably enough reproducible for me, so I ended
> up with a completely bogus commit.
>
> *However*, now that I have been following the causes and symptoms more
> closely (due to the bisect attempt), I have to confirm that the issue
> happens much more often when the machine is question is physically being
> moved when associated.
>

I am doing this all the time and I don't hit your problem. Without any
logs I don't see how I can help here. Please try to catch logs of the
crash. Thanks.

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-04 06:55:13

by Jiri Kosina

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

On Mon, 4 May 2015, Grumbach, Emmanuel wrote:

> > so over a past few days, I tried to perform a bisect, but failed as
> > expected. The issue is not reliably enough reproducible for me, so I ended
> > up with a completely bogus commit.
> >
> > *However*, now that I have been following the causes and symptoms more
> > closely (due to the bisect attempt), I have to confirm that the issue
> > happens much more often when the machine is question is physically being
> > moved when associated.
> >
>
> I am doing this all the time and I don't hit your problem. Without any
> logs I don't see how I can help here. Please try to catch logs of the
> crash. Thanks.

I have provided them in the very first e-mail to this thread; see

https://lkml.org/lkml/2015/4/22/601

If there is any way how to obtain more verbose / useful logs, please let
me know, I'll be happy to do that.

--
Jiri Kosina
SUSE Labs

2015-05-04 07:02:21

by Grumbach, Emmanuel

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

On Mon, 2015-05-04 at 08:55 +0200, Jiri Kosina wrote:
> On Mon, 4 May 2015, Grumbach, Emmanuel wrote:
>
> > > so over a past few days, I tried to perform a bisect, but failed as
> > > expected. The issue is not reliably enough reproducible for me, so I ended
> > > up with a completely bogus commit.
> > >
> > > *However*, now that I have been following the causes and symptoms more
> > > closely (due to the bisect attempt), I have to confirm that the issue
> > > happens much more often when the machine is question is physically being
> > > moved when associated.
> > >
> >
> > I am doing this all the time and I don't hit your problem. Without any
> > logs I don't see how I can help here. Please try to catch logs of the
> > crash. Thanks.
>
> I have provided them in the very first e-mail to this thread; see
>
> https://lkml.org/lkml/2015/4/22/601
>
> If there is any way how to obtain more verbose / useful logs, please let
> me know, I'll be happy to do that.
>


Sorry, my bad. I lost context here... So I really don't know what to
say. I can't see any iwlwifi commit that could be causing this, OTOH,
bisection didn't bring any results. You can try to unregister the device
from PCI and then to rescan. It worked for some people. Other than that,
I can't do much. Did you update your BIOS before you started to see this
failure?
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-07 08:54:57

by Jiri Kosina

[permalink] [raw]
Subject: Re: iwlwifi getting stuck with current Linus' tree (646da63172)

On Mon, 4 May 2015, Grumbach, Emmanuel wrote:

> Sorry, my bad. I lost context here... So I really don't know what to
> say. I can't see any iwlwifi commit that could be causing this, OTOH,
> bisection didn't bring any results. You can try to unregister the device
> from PCI and then to rescan. It worked for some people. Other than that,
> I can't do much. Did you update your BIOS before you started to see this
> failure?

Well, this kept getting worse and worse. Newer kernels were much "better"
provoking the HW crashes, but a few days ago I started seeing it even with
very old kernels, which used to be rock stable.

I first ruled out a firmware update to be the cause.

So I dismanlted the notebook and replaced the wifi chip with another one
(also driven by iwlwifi), and no failure after 2 days so far.

So it might turn out that this particular one was a red herring. If it
happens again even with the replaced chip, I'll report back. Sorry for the
noise.

--
Jiri Kosina
SUSE Labs