2022-02-17 10:47:18

by Chris Murphy

[permalink] [raw]
Subject: Re: netconsole fail, iwlwifi, WARNING: at net/mac80211/tx.c:3638 ieee80211_tx_dequeue

So this is quite a bit more verbose with kernel
5.17.0-0.rc4.96.fc36.x86_64+debug.

Is netconsole expected to work from a host with wifi?

Snippet below, complete trace is at:
https://bugzilla.redhat.com/attachment.cgi?id=1861614



[ 4970.929723] kernel: WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock
order detected
[ 4970.929726] kernel: 5.17.0-0.rc4.96.fc36.x86_64+debug #1 Tainted: G
W --------- ---
[ 4970.929728] kernel: -----------------------------------------------------
[ 4970.929729] kernel: modprobe/8230 [HC0[0]:SC0[2]:HE0:SE0] is trying
to acquire:
[ 4970.929734] kernel: ffff953c499b9070 (&fq->lock){+.-.}-{2:2}, at:
ieee80211_xmit_fast+0x412/0xb90 [mac80211]
[ 4970.929823] kernel:
and this task is already holding:
[ 4970.929824] kernel: ffffffffc1504618
(target_list_lock){....}-{2:2}, at: write_msg+0x48/0xf0 [netconsole]
[ 4970.929834] kernel: which would create a new lock dependency:
[ 4970.929835] kernel: (target_list_lock){....}-{2:2} ->
(&fq->lock){+.-.}-{2:2}
[ 4970.929840] kernel:
but this new dependency connects a HARDIRQ-irq-safe lock:
[ 4970.929841] kernel: (&ec->lock){-.-.}-{2:2}
[ 4970.929842] kernel:
... which became HARDIRQ-irq-safe at:
[ 4970.929844] kernel: lock_acquire+0xd0/0x2d0
[ 4970.929850] kernel: _raw_spin_lock_irqsave+0x4f/0x90
[ 4970.929855] kernel: acpi_ec_gpe_handler+0x1b/0x40
[ 4970.929860] kernel: acpi_ev_detect_gpe+0x15a/0x1a0
[ 4970.929864] kernel: acpi_ev_gpe_detect+0x10d/0x153
[ 4970.929866] kernel: acpi_ev_sci_xrupt_handler+0x36/0x6a
[ 4970.929869] kernel: acpi_irq+0x15/0x40
[ 4970.929873] kernel: __handle_irq_event_percpu+0x90/0x330
[ 4970.929879] kernel: handle_irq_event+0x34/0x70
[ 4970.929883] kernel: handle_fasteoi_irq+0x90/0x1e0
[ 4970.929886] kernel: __common_interrupt+0x6e/0x150
[ 4970.929889] kernel: common_interrupt+0x5c/0xd0
[ 4970.929893] kernel: asm_common_interrupt+0x1e/0x40
[ 4970.929897] kernel: __do_softirq+0xcb/0x599
[ 4970.929901] kernel: __irq_exit_rcu+0xf3/0x170
[ 4970.929905] kernel: irq_exit_rcu+0xa/0x20
[ 4970.929908] kernel: sysvec_apic_timer_interrupt+0x9e/0xc0
[ 4970.929912] kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 4970.929916] kernel: _raw_spin_unlock_irqrestore+0x37/0x60
[ 4970.929919] kernel: prepare_to_wait_event+0x72/0x190
[ 4970.929924] kernel: ec_guard+0x154/0x1f0
[ 4970.929929] kernel: acpi_ec_transaction+0x17e/0x400
[ 4970.929933] kernel: acpi_ec_space_handler+0x117/0x270
[ 4970.929935] kernel: acpi_ev_address_space_dispatch+0x304/0x3ac
[ 4970.929938] kernel: acpi_ex_access_region+0x449/0x4e8
[ 4970.929943] kernel: acpi_ex_field_datum_io+0x181/0x413
[ 4970.929947] kernel: acpi_ex_extract_from_field+0x152/0x2e2
[ 4970.929951] kernel: acpi_ex_read_data_from_field+0x303/0x34b
[ 4970.929955] kernel: acpi_ex_resolve_node_to_value+0x366/0x480
[ 4970.929959] kernel: acpi_ex_resolve_to_value+0x407/0x49b
[ 4970.929963] kernel: acpi_ds_evaluate_name_path+0x9a/0x13c
[ 4970.929965] kernel: acpi_ds_exec_end_op+0x111/0x7b1
[ 4970.929968] kernel: acpi_ps_parse_loop+0x7ab/0x86a
[ 4970.929972] kernel: acpi_ps_parse_aml+0x2db/0x57e
[ 4970.929976] kernel: acpi_ps_execute_method+0x237/0x2ef
[ 4970.929980] kernel: acpi_ns_evaluate+0x341/0x4e3
[ 4970.929983] kernel: acpi_ut_evaluate_object+0xb1/0x24d
[ 4970.929986] kernel: acpi_ut_execute_STA+0x45/0xf4
[ 4970.929989] kernel: acpi_ns_get_device_callback+0x74/0x160
[ 4970.929993] kernel: acpi_ns_walk_namespace+0x137/0x29e
[ 4970.929996] kernel: acpi_get_devices+0xd1/0x10c
[ 4970.930000] kernel: pnpacpi_init+0x4b/0x6a
[ 4970.930005] kernel: do_one_initcall+0x66/0x350
[ 4970.930009] kernel: kernel_init_freeable+0x28e/0x2cf
[ 4970.930013] kernel: kernel_init+0x16/0x130
[ 4970.930015] kernel: ret_from_fork+0x1f/0x30
[ 4970.930019] kernel:
to a HARDIRQ-irq-unsafe lock:
[ 4970.930020] kernel: (&fq->lock){+.-.}-{2:2}
[ 4970.930022] kernel:
... which became HARDIRQ-irq-unsafe at:
[ 4970.930023] kernel: ...
[ 4970.930024] kernel: lock_acquire+0xd0/0x2d0
[ 4970.930027] kernel: _raw_spin_lock_bh+0x38/0x80
[ 4970.930030] kernel: ieee80211_get_txq_stats+0x49/0x190 [mac80211]
[ 4970.930097] kernel: rdev_get_txq_stats+0x50/0x130 [cfg80211]
[ 4970.930164] kernel: nl80211_send_iface+0x38c/0x9d0 [cfg80211]
[ 4970.930221] kernel: nl80211_notify_iface+0x48/0xa0 [cfg80211]
[ 4970.930279] kernel: cfg80211_register_netdevice+0x7c/0xf0 [cfg80211]
[ 4970.930328] kernel: ieee80211_if_add+0x4eb/0x820 [mac80211]
[ 4970.930403] kernel: ieee80211_register_hw+0xc93/0xcb0 [mac80211]
[ 4970.930459] kernel: iwl_mvm_mac_setup_register+0xb41/0xc90 [iwlmvm]
[ 4970.930482] kernel: iwl_mvm_start_post_nvm+0x4f/0xd0 [iwlmvm]
[ 4970.930501] kernel: iwl_op_mode_mvm_start+0xb0e/0xb80 [iwlmvm]
[ 4970.930521] kernel: _iwl_op_mode_start+0x98/0xc0 [iwlwifi]
[ 4970.930546] kernel: iwl_opmode_register+0x6a/0xd0 [iwlwifi]
[ 4970.930568] kernel: tcc_get_cur_state+0x24/0x40 [intel_tcc_cooling]
[ 4970.930572] kernel: do_one_initcall+0x66/0x350
[ 4970.930576] kernel: do_init_module+0x4a/0x260
[ 4970.930580] kernel: __do_sys_init_module+0x13a/0x190
[ 4970.930583] kernel: do_syscall_64+0x37/0x80
[ 4970.930587] kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4970.930591] kernel:
other info that might help us debug this:
[ 4970.930592] kernel: Chain exists of:


2022-03-23 23:40:57

by Johannes Berg

[permalink] [raw]
Subject: Re: netconsole fail, iwlwifi, WARNING: at net/mac80211/tx.c:3638 ieee80211_tx_dequeue

On Wed, 2022-02-16 at 20:44 -0700, Chris Murphy wrote:
> So this is quite a bit more verbose with kernel
> 5.17.0-0.rc4.96.fc36.x86_64+debug.
>
> Is netconsole expected to work from a host with wifi?

Let's say it seems pretty audacious to try? There's a huge stack between
the hardware and the network (MAC service) interface...

We might sometimes even need to do memory allocations to send out a
frame, or touch other hardware (e.g. crypto engines if crypto is not in
wifi HW) etc.

We also didn't really expect to be called from interrupts-disabled
contexts here.

> [ 4970.929723] kernel: WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> [ 4970.929726] kernel: 5.17.0-0.rc4.96.fc36.x86_64+debug #1 Tainted: G W --------- ---
> [ 4970.929728] kernel: -----------------------------------------------------
> [ 4970.929729] kernel: modprobe/8230 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
> [ 4970.929734] kernel: ffff953c499b9070 (&fq->lock){+.-.}-{2:2}, at: ieee80211_xmit_fast+0x412/0xb90 [mac80211]
> [ 4970.929823] kernel:
> and this task is already holding:
> [ 4970.929824] kernel: ffffffffc1504618 (target_list_lock){....}-{2:2}, at: write_msg+0x48/0xf0 [netconsole]
> [ 4970.929834] kernel: which would create a new lock dependency:
> [ 4970.929835] kernel: (target_list_lock){....}-{2:2} -> (&fq->lock){+.-.}-{2:2}
> [ 4970.929840] kernel:
> but this new dependency connects a HARDIRQ-irq-safe lock:
> [ 4970.929841] kernel: (&ec->lock){-.-.}-{2:2}
> [ 4970.929842] kernel:
> ... which became HARDIRQ-irq-safe at:
>
[...]
> [ 4970.930019] kernel:
> to a HARDIRQ-irq-unsafe lock:
> [ 4970.930020] kernel: (&fq->lock){+.-.}-{2:2}
> [ 4970.930022] kernel:
> ... which became HARDIRQ-irq-unsafe at:
> [ 4970.930023] kernel: ...
> [ 4970.930024] kernel: lock_acquire+0xd0/0x2d0
> [ 4970.930027] kernel: _raw_spin_lock_bh+0x38/0x80
> [ 4970.930030] kernel: ieee80211_get_txq_stats+0x49/0x190 [mac80211]
[...]

which is pretty much what it's telling us here, I suppose, just in a bit
more roundabout way.

Unfortunately, I don't think we can really support that - there's so
much code below ... We'd have to make *everything* IRQ safe here?

johannes