2017-07-31 11:18:25

by Jiri Kosina

[permalink] [raw]
Subject: x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

Hi,

booting current Linus' tree, I'm seeing lockdep splat (see the end of this
mail).

Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU
hotplug lock.

The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and
then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp()
callback), which acquires mvm->mutex

The less obvious dependency is primarily caused by iwl_op_mode_mvm_start()
allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is
broken, because that mutex is being taken also from CPU hotplug callback
path, hence the AB-BA).

======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc2-00110-g0b5477d #347 Not tainted
------------------------------------------------------
modprobe/881 is trying to acquire lock:
(&mvm->mutex){+.+.+.}, at: [<ffffffffc0c504d2>] iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]

but task is already holding lock:
(&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #5 (&tz->lock){+.+.+.}:
lock_acquire+0xbd/0x220
__mutex_lock+0x6e/0x900
mutex_lock_nested+0x1b/0x20
thermal_zone_get_temp+0x41/0x70
thermal_zone_device_update+0x3c/0x280
thermal_zone_device_register+0x3b8/0x610
pkg_thermal_cpu_online+0x20b/0x284 [x86_pkg_temp_thermal]
cpuhp_invoke_callback+0xac/0x900
cpuhp_thread_fun+0x79/0x160
smpboot_thread_fn+0x156/0x220
kthread+0x114/0x150
ret_from_fork+0x2a/0x40

-> #4 (cpuhp_state){+.+.+.}:
lock_acquire+0xbd/0x220
cpuhp_issue_call+0xea/0x170
__cpuhp_setup_state_cpuslocked+0x12a/0x190
__cpuhp_setup_state+0x46/0xc0
page_writeback_init+0x43/0x67
pagecache_init+0x39/0x3c
start_kernel+0x45a/0x4ae
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x13d/0x14c
verify_cpu+0x0/0xf1

-> #3 (cpuhp_state_mutex){+.+.+.}:
lock_acquire+0xbd/0x220
__mutex_lock+0x6e/0x900
mutex_lock_nested+0x1b/0x20
__cpuhp_setup_state_cpuslocked+0x4f/0x190
__cpuhp_setup_state+0x46/0xc0
page_alloc_init+0x28/0x30
start_kernel+0x186/0x4ae
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x13d/0x14c
verify_cpu+0x0/0xf1

-> #2 (cpu_hotplug_lock.rw_sem){++++++}:
lock_acquire+0xbd/0x220
cpus_read_lock+0x46/0x90
apply_workqueue_attrs+0x17/0x50
__alloc_workqueue_key+0x195/0x4d0
_iwl_pcie_rx_init+0x384/0x390 [iwlwifi]
iwl_pcie_rx_init+0x1e/0x380 [iwlwifi]
iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi]
iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
_iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
iwl_opmode_register+0xaa/0xd0 [iwlwifi]
iwl_mvm_init+0x37/0x1000 [iwlmvm]
do_one_initcall+0x51/0x1a9
do_init_module+0x60/0x20e
load_module+0x203f/0x2b50
SYSC_finit_module+0x96/0xd0
SyS_finit_module+0xe/0x10
entry_SYSCALL_64_fastpath+0x23/0xc2

-> #1 (&trans_pcie->mutex){+.+.+.}:
lock_acquire+0xbd/0x220
__mutex_lock+0x6e/0x900
mutex_lock_nested+0x1b/0x20
iwl_trans_pcie_start_fw+0x130/0x6f0 [iwlwifi]
iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
_iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
iwl_opmode_register+0xaa/0xd0 [iwlwifi]
iwl_mvm_init+0x37/0x1000 [iwlmvm]
do_one_initcall+0x51/0x1a9
do_init_module+0x60/0x20e
load_module+0x203f/0x2b50
SYSC_finit_module+0x96/0xd0
SyS_finit_module+0xe/0x10
entry_SYSCALL_64_fastpath+0x23/0xc2

-> #0 (&mvm->mutex){+.+.+.}:
__lock_acquire+0x13e1/0x1400
lock_acquire+0xbd/0x220
__mutex_lock+0x6e/0x900
mutex_lock_nested+0x1b/0x20
iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
thermal_zone_get_temp+0x51/0x70
thermal_zone_device_update+0x3c/0x280
thermal_zone_device_register+0x3b8/0x610
iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm]
iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm]
_iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
iwl_opmode_register+0xaa/0xd0 [iwlwifi]
iwl_mvm_init+0x37/0x1000 [iwlmvm]
do_one_initcall+0x51/0x1a9
do_init_module+0x60/0x20e
load_module+0x203f/0x2b50
SYSC_finit_module+0x96/0xd0
SyS_finit_module+0xe/0x10
entry_SYSCALL_64_fastpath+0x23/0xc2

other info that might help us debug this:

Chain exists of:
&mvm->mutex --> cpuhp_state --> &tz->lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&tz->lock);
lock(cpuhp_state);
lock(&tz->lock);
lock(&mvm->mutex);

*** DEADLOCK ***

2 locks held by modprobe/881:
#0: (&iwlwifi_opmode_table_mtx){+.+.+.}, at: [<ffffffffc0aa2df4>] iwl_opmode_register+0x24/0xd0 [iwlwifi]
#1: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70

stack backtrace:
CPU: 3 PID: 881 Comm: modprobe Not tainted 4.13.0-rc2-00110-g0b5477d #347
Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017
Call Trace:
dump_stack+0x85/0xc9
print_circular_bug+0x1f9/0x207
__lock_acquire+0x13e1/0x1400
lock_acquire+0xbd/0x220
? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
__mutex_lock+0x6e/0x900
? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
? thermal_zone_get_temp+0x41/0x70
? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
? thermal_zone_get_temp+0x41/0x70
? find_held_lock+0x39/0xb0
mutex_lock_nested+0x1b/0x20
iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
thermal_zone_get_temp+0x51/0x70
thermal_zone_device_update+0x3c/0x280
thermal_zone_device_register+0x3b8/0x610
iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm]
iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm]
_iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
iwl_opmode_register+0xaa/0xd0 [iwlwifi]
iwl_mvm_init+0x37/0x1000 [iwlmvm]
? 0xffffffffc0c87000
do_one_initcall+0x51/0x1a9
? rcu_read_lock_sched_held+0x98/0xa0
? kmem_cache_alloc_trace+0x2a5/0x340
do_init_module+0x60/0x20e
load_module+0x203f/0x2b50
? __symbol_put+0x50/0x50
SYSC_finit_module+0x96/0xd0
SyS_finit_module+0xe/0x10
entry_SYSCALL_64_fastpath+0x23/0xc2
RIP: 0033:0x7f2ef067cc89
RSP: 002b:00007ffea2ea3d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2ef067cc89
RDX: 0000000000000000 RSI: 000000000041af06 RDI: 0000000000000001
RBP: 0000000000000005 R08: 0000000000000000 R09: 000000000096b230
R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffea2ea2d80
R13: 00007ffea2ea2d60 R14: 0000000000000005 R15: 000000000096f3c0
thermal thermal_zone3: failed to read out thermal zone (-5)


--
Jiri Kosina
SUSE Labs


2017-08-03 09:10:21

by Jiri Kosina

[permalink] [raw]
Subject: Re: x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

On Mon, 31 Jul 2017, Jiri Kosina wrote:

> Hi,
>
> booting current Linus' tree, I'm seeing lockdep splat (see the end of this
> mail).
>
> Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU
> hotplug lock.
>
> The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and
> then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp()
> callback), which acquires mvm->mutex
>
> The less obvious dependency is primarily caused by iwl_op_mode_mvm_start()
> allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is
> broken, because that mutex is being taken also from CPU hotplug callback
> path, hence the AB-BA).

As the "central" part of the dependency is being added by iwlwifi driver
(_iwl_pcie_rx_init() allocating workqueue while holding
trans_pcie->mutex), I'm adding iwlwifi folks as well to CC.

>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.13.0-rc2-00110-g0b5477d #347 Not tainted
> ------------------------------------------------------
> modprobe/881 is trying to acquire lock:
> (&mvm->mutex){+.+.+.}, at: [<ffffffffc0c504d2>] iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
>
> but task is already holding lock:
> (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #5 (&tz->lock){+.+.+.}:
> lock_acquire+0xbd/0x220
> __mutex_lock+0x6e/0x900
> mutex_lock_nested+0x1b/0x20
> thermal_zone_get_temp+0x41/0x70
> thermal_zone_device_update+0x3c/0x280
> thermal_zone_device_register+0x3b8/0x610
> pkg_thermal_cpu_online+0x20b/0x284 [x86_pkg_temp_thermal]
> cpuhp_invoke_callback+0xac/0x900
> cpuhp_thread_fun+0x79/0x160
> smpboot_thread_fn+0x156/0x220
> kthread+0x114/0x150
> ret_from_fork+0x2a/0x40
>
> -> #4 (cpuhp_state){+.+.+.}:
> lock_acquire+0xbd/0x220
> cpuhp_issue_call+0xea/0x170
> __cpuhp_setup_state_cpuslocked+0x12a/0x190
> __cpuhp_setup_state+0x46/0xc0
> page_writeback_init+0x43/0x67
> pagecache_init+0x39/0x3c
> start_kernel+0x45a/0x4ae
> x86_64_start_reservations+0x24/0x26
> x86_64_start_kernel+0x13d/0x14c
> verify_cpu+0x0/0xf1
>
> -> #3 (cpuhp_state_mutex){+.+.+.}:
> lock_acquire+0xbd/0x220
> __mutex_lock+0x6e/0x900
> mutex_lock_nested+0x1b/0x20
> __cpuhp_setup_state_cpuslocked+0x4f/0x190
> __cpuhp_setup_state+0x46/0xc0
> page_alloc_init+0x28/0x30
> start_kernel+0x186/0x4ae
> x86_64_start_reservations+0x24/0x26
> x86_64_start_kernel+0x13d/0x14c
> verify_cpu+0x0/0xf1
>
> -> #2 (cpu_hotplug_lock.rw_sem){++++++}:
> lock_acquire+0xbd/0x220
> cpus_read_lock+0x46/0x90
> apply_workqueue_attrs+0x17/0x50
> __alloc_workqueue_key+0x195/0x4d0
> _iwl_pcie_rx_init+0x384/0x390 [iwlwifi]
> iwl_pcie_rx_init+0x1e/0x380 [iwlwifi]
> iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi]
> iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
> iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
> iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
> _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> iwl_mvm_init+0x37/0x1000 [iwlmvm]
> do_one_initcall+0x51/0x1a9
> do_init_module+0x60/0x20e
> load_module+0x203f/0x2b50
> SYSC_finit_module+0x96/0xd0
> SyS_finit_module+0xe/0x10
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> -> #1 (&trans_pcie->mutex){+.+.+.}:
> lock_acquire+0xbd/0x220
> __mutex_lock+0x6e/0x900
> mutex_lock_nested+0x1b/0x20
> iwl_trans_pcie_start_fw+0x130/0x6f0 [iwlwifi]
> iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
> iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
> iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
> _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> iwl_mvm_init+0x37/0x1000 [iwlmvm]
> do_one_initcall+0x51/0x1a9
> do_init_module+0x60/0x20e
> load_module+0x203f/0x2b50
> SYSC_finit_module+0x96/0xd0
> SyS_finit_module+0xe/0x10
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> -> #0 (&mvm->mutex){+.+.+.}:
> __lock_acquire+0x13e1/0x1400
> lock_acquire+0xbd/0x220
> __mutex_lock+0x6e/0x900
> mutex_lock_nested+0x1b/0x20
> iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> thermal_zone_get_temp+0x51/0x70
> thermal_zone_device_update+0x3c/0x280
> thermal_zone_device_register+0x3b8/0x610
> iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm]
> iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm]
> _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> iwl_mvm_init+0x37/0x1000 [iwlmvm]
> do_one_initcall+0x51/0x1a9
> do_init_module+0x60/0x20e
> load_module+0x203f/0x2b50
> SYSC_finit_module+0x96/0xd0
> SyS_finit_module+0xe/0x10
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> other info that might help us debug this:
>
> Chain exists of:
> &mvm->mutex --> cpuhp_state --> &tz->lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&tz->lock);
> lock(cpuhp_state);
> lock(&tz->lock);
> lock(&mvm->mutex);
>
> *** DEADLOCK ***
>
> 2 locks held by modprobe/881:
> #0: (&iwlwifi_opmode_table_mtx){+.+.+.}, at: [<ffffffffc0aa2df4>] iwl_opmode_register+0x24/0xd0 [iwlwifi]
> #1: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70
>
> stack backtrace:
> CPU: 3 PID: 881 Comm: modprobe Not tainted 4.13.0-rc2-00110-g0b5477d #347
> Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017
> Call Trace:
> dump_stack+0x85/0xc9
> print_circular_bug+0x1f9/0x207
> __lock_acquire+0x13e1/0x1400
> lock_acquire+0xbd/0x220
> ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> __mutex_lock+0x6e/0x900
> ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> ? thermal_zone_get_temp+0x41/0x70
> ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> ? thermal_zone_get_temp+0x41/0x70
> ? find_held_lock+0x39/0xb0
> mutex_lock_nested+0x1b/0x20
> iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> thermal_zone_get_temp+0x51/0x70
> thermal_zone_device_update+0x3c/0x280
> thermal_zone_device_register+0x3b8/0x610
> iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm]
> iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm]
> _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> iwl_mvm_init+0x37/0x1000 [iwlmvm]
> ? 0xffffffffc0c87000
> do_one_initcall+0x51/0x1a9
> ? rcu_read_lock_sched_held+0x98/0xa0
> ? kmem_cache_alloc_trace+0x2a5/0x340
> do_init_module+0x60/0x20e
> load_module+0x203f/0x2b50
> ? __symbol_put+0x50/0x50
> SYSC_finit_module+0x96/0xd0
> SyS_finit_module+0xe/0x10
> entry_SYSCALL_64_fastpath+0x23/0xc2
> RIP: 0033:0x7f2ef067cc89
> RSP: 002b:00007ffea2ea3d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2ef067cc89
> RDX: 0000000000000000 RSI: 000000000041af06 RDI: 0000000000000001
> RBP: 0000000000000005 R08: 0000000000000000 R09: 000000000096b230
> R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffea2ea2d80
> R13: 00007ffea2ea2d60 R14: 0000000000000005 R15: 000000000096f3c0
> thermal thermal_zone3: failed to read out thermal zone (-5)
>
>
> --
> Jiri Kosina
> SUSE Labs
>

--
Jiri Kosina
SUSE Labs

2017-08-03 09:43:33

by Luciano Coelho

[permalink] [raw]
Subject: Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

On Thu, 2017-08-03 at 11:10 +0200, Jiri Kosina wrote:
> On Mon, 31 Jul 2017, Jiri Kosina wrote:
>
> > Hi,
> >
> > booting current Linus' tree, I'm seeing lockdep splat (see the end of this
> > mail).
> >
> > Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU
> > hotplug lock.
> >
> > The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and
> > then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp()
> > callback), which acquires mvm->mutex
> >
> > The less obvious dependency is primarily caused by iwl_op_mode_mvm_start()
> > allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is
> > broken, because that mutex is being taken also from CPU hotplug callback
> > path, hence the AB-BA).
>
> As the "central" part of the dependency is being added by iwlwifi driver
> (_iwl_pcie_rx_init() allocating workqueue while holding
> trans_pcie->mutex), I'm adding iwlwifi folks as well to CC.
>
> >
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.13.0-rc2-00110-g0b5477d #347 Not tainted
> > ------------------------------------------------------
> > modprobe/881 is trying to acquire lock:
> > (&mvm->mutex){+.+.+.}, at: [<ffffffffc0c504d2>] iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> >
> > but task is already holding lock:
> > (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #5 (&tz->lock){+.+.+.}:
> > lock_acquire+0xbd/0x220
> > __mutex_lock+0x6e/0x900
> > mutex_lock_nested+0x1b/0x20
> > thermal_zone_get_temp+0x41/0x70
> > thermal_zone_device_update+0x3c/0x280
> > thermal_zone_device_register+0x3b8/0x610
> > pkg_thermal_cpu_online+0x20b/0x284 [x86_pkg_temp_thermal]
> > cpuhp_invoke_callback+0xac/0x900
> > cpuhp_thread_fun+0x79/0x160
> > smpboot_thread_fn+0x156/0x220
> > kthread+0x114/0x150
> > ret_from_fork+0x2a/0x40
> >
> > -> #4 (cpuhp_state){+.+.+.}:
> > lock_acquire+0xbd/0x220
> > cpuhp_issue_call+0xea/0x170
> > __cpuhp_setup_state_cpuslocked+0x12a/0x190
> > __cpuhp_setup_state+0x46/0xc0
> > page_writeback_init+0x43/0x67
> > pagecache_init+0x39/0x3c
> > start_kernel+0x45a/0x4ae
> > x86_64_start_reservations+0x24/0x26
> > x86_64_start_kernel+0x13d/0x14c
> > verify_cpu+0x0/0xf1
> >
> > -> #3 (cpuhp_state_mutex){+.+.+.}:
> > lock_acquire+0xbd/0x220
> > __mutex_lock+0x6e/0x900
> > mutex_lock_nested+0x1b/0x20
> > __cpuhp_setup_state_cpuslocked+0x4f/0x190
> > __cpuhp_setup_state+0x46/0xc0
> > page_alloc_init+0x28/0x30
> > start_kernel+0x186/0x4ae
> > x86_64_start_reservations+0x24/0x26
> > x86_64_start_kernel+0x13d/0x14c
> > verify_cpu+0x0/0xf1
> >
> > -> #2 (cpu_hotplug_lock.rw_sem){++++++}:
> > lock_acquire+0xbd/0x220
> > cpus_read_lock+0x46/0x90
> > apply_workqueue_attrs+0x17/0x50
> > __alloc_workqueue_key+0x195/0x4d0
> > _iwl_pcie_rx_init+0x384/0x390 [iwlwifi]
> > iwl_pcie_rx_init+0x1e/0x380 [iwlwifi]
> > iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi]
> > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
> > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
> > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> > iwl_mvm_init+0x37/0x1000 [iwlmvm]
> > do_one_initcall+0x51/0x1a9
> > do_init_module+0x60/0x20e
> > load_module+0x203f/0x2b50
> > SYSC_finit_module+0x96/0xd0
> > SyS_finit_module+0xe/0x10
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> >
> > -> #1 (&trans_pcie->mutex){+.+.+.}:
> > lock_acquire+0xbd/0x220
> > __mutex_lock+0x6e/0x900
> > mutex_lock_nested+0x1b/0x20
> > iwl_trans_pcie_start_fw+0x130/0x6f0 [iwlwifi]
> > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
> > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
> > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> > iwl_mvm_init+0x37/0x1000 [iwlmvm]
> > do_one_initcall+0x51/0x1a9
> > do_init_module+0x60/0x20e
> > load_module+0x203f/0x2b50
> > SYSC_finit_module+0x96/0xd0
> > SyS_finit_module+0xe/0x10
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> >
> > -> #0 (&mvm->mutex){+.+.+.}:
> > __lock_acquire+0x13e1/0x1400
> > lock_acquire+0xbd/0x220
> > __mutex_lock+0x6e/0x900
> > mutex_lock_nested+0x1b/0x20
> > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> > thermal_zone_get_temp+0x51/0x70
> > thermal_zone_device_update+0x3c/0x280
> > thermal_zone_device_register+0x3b8/0x610
> > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm]
> > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm]
> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> > iwl_mvm_init+0x37/0x1000 [iwlmvm]
> > do_one_initcall+0x51/0x1a9
> > do_init_module+0x60/0x20e
> > load_module+0x203f/0x2b50
> > SYSC_finit_module+0x96/0xd0
> > SyS_finit_module+0xe/0x10
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> > &mvm->mutex --> cpuhp_state --> &tz->lock
> >
> > Possible unsafe locking scenario:
> >
> >   CPU0   CPU1
> > ---- ----
> > lock(&tz->lock);
> > lock(cpuhp_state);
> > lock(&tz->lock);
> > lock(&mvm->mutex);
> >
> > *** DEADLOCK ***
> >
> > 2 locks held by modprobe/881:
> > #0: (&iwlwifi_opmode_table_mtx){+.+.+.}, at: [<ffffffffc0aa2df4>] iwl_opmode_register+0x24/0xd0 [iwlwifi]
> > #1: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70
> >
> > stack backtrace:
> > CPU: 3 PID: 881 Comm: modprobe Not tainted 4.13.0-rc2-00110-g0b5477d #347
> > Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017
> > Call Trace:
> > dump_stack+0x85/0xc9
> > print_circular_bug+0x1f9/0x207
> > __lock_acquire+0x13e1/0x1400
> > lock_acquire+0xbd/0x220
> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> > __mutex_lock+0x6e/0x900
> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> > ? thermal_zone_get_temp+0x41/0x70
> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> > ? thermal_zone_get_temp+0x41/0x70
> > ? find_held_lock+0x39/0xb0
> > mutex_lock_nested+0x1b/0x20
> > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
> > thermal_zone_get_temp+0x51/0x70
> > thermal_zone_device_update+0x3c/0x280
> > thermal_zone_device_register+0x3b8/0x610
> > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm]
> > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm]
> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> > iwl_mvm_init+0x37/0x1000 [iwlmvm]
> > ? 0xffffffffc0c87000
> > do_one_initcall+0x51/0x1a9
> > ? rcu_read_lock_sched_held+0x98/0xa0
> > ? kmem_cache_alloc_trace+0x2a5/0x340
> > do_init_module+0x60/0x20e
> > load_module+0x203f/0x2b50
> > ? __symbol_put+0x50/0x50
> > SYSC_finit_module+0x96/0xd0
> > SyS_finit_module+0xe/0x10
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> > RIP: 0033:0x7f2ef067cc89
> > RSP: 002b:00007ffea2ea3d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2ef067cc89
> > RDX: 0000000000000000 RSI: 000000000041af06 RDI: 0000000000000001
> > RBP: 0000000000000005 R08: 0000000000000000 R09: 000000000096b230
> > R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffea2ea2d80
> > R13: 00007ffea2ea2d60 R14: 0000000000000005 R15: 000000000096f3c0
> > thermal thermal_zone3: failed to read out thermal zone (-5)

CCing David Weinehall who also just reported this to me.

We'll check this ASAP. Thanks for reporting!

--
Luca.

2017-08-03 10:02:33

by Kalle Valo

[permalink] [raw]
Subject: Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

"Coelho, Luciano" <[email protected]> writes:

> On Thu, 2017-08-03 at 11:10 +0200, Jiri Kosina wrote:
>> On Mon, 31 Jul 2017, Jiri Kosina wrote:
>>
>> > Hi,
>> >
>> > booting current Linus' tree, I'm seeing lockdep splat (see the end of this
>> > mail).
>> >
>> > Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU
>> > hotplug lock.
>> >
>> > The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and
>> > then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp()
>> > callback), which acquires mvm->mutex
>> >
>> > The less obvious dependency is primarily caused by iwl_op_mode_mvm_start()
>> > allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is
>> > broken, because that mutex is being taken also from CPU hotplug callback
>> > path, hence the AB-BA).
>>
>> As the "central" part of the dependency is being added by iwlwifi driver
>> (_iwl_pcie_rx_init() allocating workqueue while holding
>> trans_pcie->mutex), I'm adding iwlwifi folks as well to CC.
>>
>> >
>> > ======================================================
>> > WARNING: possible circular locking dependency detected
>> > 4.13.0-rc2-00110-g0b5477d #347 Not tainted
>> > ------------------------------------------------------
>> > modprobe/881 is trying to acquire lock:
>> > (&mvm->mutex){+.+.+.}, at: [<ffffffffc0c504d2>] iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
>> >
>> > but task is already holding lock:
>> > (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70
>> >
>> > which lock already depends on the new lock.
>> >
>> >
>> > the existing dependency chain (in reverse order) is:
>> >
>> > -> #5 (&tz->lock){+.+.+.}:
>> > lock_acquire+0xbd/0x220
>> > __mutex_lock+0x6e/0x900
>> > mutex_lock_nested+0x1b/0x20
>> > thermal_zone_get_temp+0x41/0x70
>> > thermal_zone_device_update+0x3c/0x280
>> > thermal_zone_device_register+0x3b8/0x610
>> > pkg_thermal_cpu_online+0x20b/0x284 [x86_pkg_temp_thermal]
>> > cpuhp_invoke_callback+0xac/0x900
>> > cpuhp_thread_fun+0x79/0x160
>> > smpboot_thread_fn+0x156/0x220
>> > kthread+0x114/0x150
>> > ret_from_fork+0x2a/0x40
>> >
>> > -> #4 (cpuhp_state){+.+.+.}:
>> > lock_acquire+0xbd/0x220
>> > cpuhp_issue_call+0xea/0x170
>> > __cpuhp_setup_state_cpuslocked+0x12a/0x190
>> > __cpuhp_setup_state+0x46/0xc0
>> > page_writeback_init+0x43/0x67
>> > pagecache_init+0x39/0x3c
>> > start_kernel+0x45a/0x4ae
>> > x86_64_start_reservations+0x24/0x26
>> > x86_64_start_kernel+0x13d/0x14c
>> > verify_cpu+0x0/0xf1
>> >
>> > -> #3 (cpuhp_state_mutex){+.+.+.}:
>> > lock_acquire+0xbd/0x220
>> > __mutex_lock+0x6e/0x900
>> > mutex_lock_nested+0x1b/0x20
>> > __cpuhp_setup_state_cpuslocked+0x4f/0x190
>> > __cpuhp_setup_state+0x46/0xc0
>> > page_alloc_init+0x28/0x30
>> > start_kernel+0x186/0x4ae
>> > x86_64_start_reservations+0x24/0x26
>> > x86_64_start_kernel+0x13d/0x14c
>> > verify_cpu+0x0/0xf1
>> >
>> > -> #2 (cpu_hotplug_lock.rw_sem){++++++}:
>> > lock_acquire+0xbd/0x220
>> > cpus_read_lock+0x46/0x90
>> > apply_workqueue_attrs+0x17/0x50
>> > __alloc_workqueue_key+0x195/0x4d0
>> > _iwl_pcie_rx_init+0x384/0x390 [iwlwifi]
>> > iwl_pcie_rx_init+0x1e/0x380 [iwlwifi]
>> > iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi]
>> > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
>> > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
>> > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
>> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
>> > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
>> > iwl_mvm_init+0x37/0x1000 [iwlmvm]
>> > do_one_initcall+0x51/0x1a9
>> > do_init_module+0x60/0x20e
>> > load_module+0x203f/0x2b50
>> > SYSC_finit_module+0x96/0xd0
>> > SyS_finit_module+0xe/0x10
>> > entry_SYSCALL_64_fastpath+0x23/0xc2
>> >
>> > -> #1 (&trans_pcie->mutex){+.+.+.}:
>> > lock_acquire+0xbd/0x220
>> > __mutex_lock+0x6e/0x900
>> > mutex_lock_nested+0x1b/0x20
>> > iwl_trans_pcie_start_fw+0x130/0x6f0 [iwlwifi]
>> > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
>> > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
>> > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
>> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
>> > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
>> > iwl_mvm_init+0x37/0x1000 [iwlmvm]
>> > do_one_initcall+0x51/0x1a9
>> > do_init_module+0x60/0x20e
>> > load_module+0x203f/0x2b50
>> > SYSC_finit_module+0x96/0xd0
>> > SyS_finit_module+0xe/0x10
>> > entry_SYSCALL_64_fastpath+0x23/0xc2
>> >
>> > -> #0 (&mvm->mutex){+.+.+.}:
>> > __lock_acquire+0x13e1/0x1400
>> > lock_acquire+0xbd/0x220
>> > __mutex_lock+0x6e/0x900
>> > mutex_lock_nested+0x1b/0x20
>> > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
>> > thermal_zone_get_temp+0x51/0x70
>> > thermal_zone_device_update+0x3c/0x280
>> > thermal_zone_device_register+0x3b8/0x610
>> > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm]
>> > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm]
>> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
>> > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
>> > iwl_mvm_init+0x37/0x1000 [iwlmvm]
>> > do_one_initcall+0x51/0x1a9
>> > do_init_module+0x60/0x20e
>> > load_module+0x203f/0x2b50
>> > SYSC_finit_module+0x96/0xd0
>> > SyS_finit_module+0xe/0x10
>> > entry_SYSCALL_64_fastpath+0x23/0xc2
>> >
>> > other info that might help us debug this:
>> >
>> > Chain exists of:
>> > &mvm->mutex --> cpuhp_state --> &tz->lock
>> >
>> > Possible unsafe locking scenario:
>> >
>> >   CPU0   CPU1
>> > ---- ----
>> > lock(&tz->lock);
>> > lock(cpuhp_state);
>> > lock(&tz->lock);
>> > lock(&mvm->mutex);
>> >
>> > *** DEADLOCK ***
>> >
>> > 2 locks held by modprobe/881:
>> > #0: (&iwlwifi_opmode_table_mtx){+.+.+.}, at: [<ffffffffc0aa2df4>] iwl_opmode_register+0x24/0xd0 [iwlwifi]
>> > #1: (&tz->lock){+.+.+.}, at: [<ffffffff926a9351>] thermal_zone_get_temp+0x41/0x70
>> >
>> > stack backtrace:
>> > CPU: 3 PID: 881 Comm: modprobe Not tainted 4.13.0-rc2-00110-g0b5477d #347
>> > Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017
>> > Call Trace:
>> > dump_stack+0x85/0xc9
>> > print_circular_bug+0x1f9/0x207
>> > __lock_acquire+0x13e1/0x1400
>> > lock_acquire+0xbd/0x220
>> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
>> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
>> > __mutex_lock+0x6e/0x900
>> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
>> > ? thermal_zone_get_temp+0x41/0x70
>> > ? iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
>> > ? thermal_zone_get_temp+0x41/0x70
>> > ? find_held_lock+0x39/0xb0
>> > mutex_lock_nested+0x1b/0x20
>> > iwl_mvm_tzone_get_temp+0x32/0x80 [iwlmvm]
>> > thermal_zone_get_temp+0x51/0x70
>> > thermal_zone_device_update+0x3c/0x280
>> > thermal_zone_device_register+0x3b8/0x610
>> > iwl_mvm_thermal_initialize+0x1d1/0x3a0 [iwlmvm]
>> > iwl_op_mode_mvm_start+0xa1d/0xd30 [iwlmvm]
>> > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
>> > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
>> > iwl_mvm_init+0x37/0x1000 [iwlmvm]
>> > ? 0xffffffffc0c87000
>> > do_one_initcall+0x51/0x1a9
>> > ? rcu_read_lock_sched_held+0x98/0xa0
>> > ? kmem_cache_alloc_trace+0x2a5/0x340
>> > do_init_module+0x60/0x20e
>> > load_module+0x203f/0x2b50
>> > ? __symbol_put+0x50/0x50
>> > SYSC_finit_module+0x96/0xd0
>> > SyS_finit_module+0xe/0x10
>> > entry_SYSCALL_64_fastpath+0x23/0xc2
>> > RIP: 0033:0x7f2ef067cc89
>> > RSP: 002b:00007ffea2ea3d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>> > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2ef067cc89
>> > RDX: 0000000000000000 RSI: 000000000041af06 RDI: 0000000000000001
>> > RBP: 0000000000000005 R08: 0000000000000000 R09: 000000000096b230
>> > R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffea2ea2d80
>> > R13: 00007ffea2ea2d60 R14: 0000000000000005 R15: 000000000096f3c0
>> > thermal thermal_zone3: failed to read out thermal zone (-5)
>
> CCing David Weinehall who also just reported this to me.
>
> We'll check this ASAP. Thanks for reporting!

Adding linux-wireless also to the loop.

--
Kalle Valo

2017-08-03 11:30:37

by Luciano Coelho

[permalink] [raw]
Subject: Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

On Thu, 2017-08-03 at 13:02 +0300, Kalle Valo wrote:
> "Coelho, Luciano" <[email protected]> writes:
>
> > On Thu, 2017-08-03 at 11:10 +0200, Jiri Kosina wrote:
> > > On Mon, 31 Jul 2017, Jiri Kosina wrote:
> > >
> > > > Hi,
> > > >
> > > > booting current Linus' tree, I'm seeing lockdep splat (see the end of this
> > > > mail).
> > > >
> > > > Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU
> > > > hotplug lock.
> > > >
> > > > The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and
> > > > then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp()
> > > > callback), which acquires mvm->mutex
> > > >
> > > > The less obvious dependency is primarily caused by iwl_op_mode_mvm_start()
> > > > allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is
> > > > broken, because that mutex is being taken also from CPU hotplug callback
> > > > path, hence the AB-BA).
> > >
> > > As the "central" part of the dependency is being added by iwlwifi driver
> > > (_iwl_pcie_rx_init() allocating workqueue while holding
> > > trans_pcie->mutex), I'm adding iwlwifi folks as well to CC.

[...]

> > > > -> #2 (cpu_hotplug_lock.rw_sem){++++++}:
> > > > lock_acquire+0xbd/0x220
> > > > cpus_read_lock+0x46/0x90
> > > > apply_workqueue_attrs+0x17/0x50
> > > > __alloc_workqueue_key+0x195/0x4d0
> > > > _iwl_pcie_rx_init+0x384/0x390 [iwlwifi]
> > > > iwl_pcie_rx_init+0x1e/0x380 [iwlwifi]
> > > > iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi]
> > > > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
> > > > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
> > > > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
> > > > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> > > > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> > > > iwl_mvm_init+0x37/0x1000 [iwlmvm]
> > > > do_one_initcall+0x51/0x1a9
> > > > do_init_module+0x60/0x20e
> > > > load_module+0x203f/0x2b50
> > > > SYSC_finit_module+0x96/0xd0
> > > > SyS_finit_module+0xe/0x10
> > > > entry_SYSCALL_64_fastpath+0x23/0xc2

Okay, so as I understand it the problem has been there for a long time,
but the splat is only coming up now because of Thomas' patch that adds
the lockdep map[1], right?

I see the workqueue allocation you mentioned. I'll try to move this
allocation out of the mutex and see how it goes.

[1] http://lkml.kernel.org/r/[email protected]

--
Cheers,
Luca.

2017-08-03 11:34:30

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

On Thu, 3 Aug 2017, Coelho, Luciano wrote:

> Okay, so as I understand it the problem has been there for a long time,
> but the splat is only coming up now because of Thomas' patch that adds
> the lockdep map[1], right?

Yeah, sorry, forgot to mention that pre-49dfe2a67797 kernels wouldn't
produce this, as there would not be aware of the fact that
cpus_read_lock() is actually semantically a lock.

> I see the workqueue allocation you mentioned. I'll try to move this
> allocation out of the mutex and see how it goes.

I have been briefly looking into this as well -- it'll basically have to
be moved out of the trans_pcie->mutex context, but

(a) I'm not sure whether that's actually safe
(b) iwl_pcie_rx_reuse_rbd() (which is where corresponding work is being
queued) is not a proper context either (it's atomic context)

Thanks,

--
Jiri Kosina
SUSE Labs

2017-08-03 13:09:34

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

On Thu, 3 Aug 2017, Jiri Kosina wrote:

> > I see the workqueue allocation you mentioned. I'll try to move this
> > allocation out of the mutex and see how it goes.
>
> I have been briefly looking into this as well -- it'll basically have to
> be moved out of the trans_pcie->mutex context, but
>
> (a) I'm not sure whether that's actually safe
> (b) iwl_pcie_rx_reuse_rbd() (which is where corresponding work is being
> queued) is not a proper context either (it's atomic context)

Actually moving it out of trans_pcie->mutex is likely not to be enough,
the dependency would still be there, just the graph will have one vertex
less, with the dependency going directly from mvm mutex to
cpu_hotplug_lock.

--
Jiri Kosina
SUSE Labs

2017-08-17 13:38:09

by Jiri Kosina

[permalink] [raw]
Subject: Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

Hi,

anything new on this front please?

The splat (and therefore deadlock potential) is still there with current
Linus' tree.

Thanks,

--
Jiri Kosina
SUSE Labs

2017-08-17 14:02:54

by Luciano Coelho

[permalink] [raw]
Subject: Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

On Thu, 2017-08-17 at 15:38 +0200, Jiri Kosina wrote:
> Hi,
>
> anything new on this front please?
>
> The splat (and therefore deadlock potential) is still there with current
> Linus' tree.

Sorry, haven't had more time to spend on it. I'll do it this evening.

But, just to clarify, the deadlock potential has been there for a while,
right? The only difference is that now we get the splat.

Not saying we shouldn't fix it though. ;)

--
Luca.

2017-08-22 07:32:44

by Luca Coelho

[permalink] [raw]
Subject: [PATCH] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

From: Luca Coelho <[email protected]>

Work queues cannot be allocated in when a mutex is held because the
mutex may be in use and that would make it sleep. Doing so generates
the following splat with 4.13+:

[ 19.513298] ======================================================
[ 19.513429] WARNING: possible circular locking dependency detected
[ 19.513557] 4.13.0-rc5+ #6 Not tainted
[ 19.513638] ------------------------------------------------------
[ 19.513767] cpuhp/0/12 is trying to acquire lock:
[ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
[ 19.514047]
[ 19.514047] but task is already holding lock:
[ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
[ 19.514338]
[ 19.514338] which lock already depends on the new lock.

This lock dependency already existed with previous kernel versions,
but it was not detected until commit ... was introduced.

Reported-by: David Weinehall <[email protected]>
Reported-by: Jiri Kosina <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
---
drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 2 ++
drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 10 +---------
drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 9 +++++++++
3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
index fa315d84e98e..a1ea9ef97ed9 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
@@ -787,6 +787,8 @@ int iwl_pci_fw_enter_d0i3(struct iwl_trans *trans);

void iwl_pcie_enable_rx_wake(struct iwl_trans *trans, bool enable);

+void iwl_pcie_rx_allocator_work(struct work_struct *data);
+
/* common functions that are used by gen2 transport */
void iwl_pcie_apm_config(struct iwl_trans *trans);
int iwl_pcie_prepare_card_hw(struct iwl_trans *trans);
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
index 351c4423125a..942736d3fa75 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
@@ -597,7 +597,7 @@ static void iwl_pcie_rx_allocator_get(struct iwl_trans *trans,
rxq->free_count += RX_CLAIM_REQ_ALLOC;
}

-static void iwl_pcie_rx_allocator_work(struct work_struct *data)
+void iwl_pcie_rx_allocator_work(struct work_struct *data)
{
struct iwl_rb_allocator *rba_p =
container_of(data, struct iwl_rb_allocator, rx_alloc);
@@ -900,10 +900,6 @@ static int _iwl_pcie_rx_init(struct iwl_trans *trans)
return err;
}
def_rxq = trans_pcie->rxq;
- if (!rba->alloc_wq)
- rba->alloc_wq = alloc_workqueue("rb_allocator",
- WQ_HIGHPRI | WQ_UNBOUND, 1);
- INIT_WORK(&rba->rx_alloc, iwl_pcie_rx_allocator_work);

spin_lock(&rba->lock);
atomic_set(&rba->req_pending, 0);
@@ -1017,10 +1013,6 @@ void iwl_pcie_rx_free(struct iwl_trans *trans)
}

cancel_work_sync(&rba->rx_alloc);
- if (rba->alloc_wq) {
- destroy_workqueue(rba->alloc_wq);
- rba->alloc_wq = NULL;
- }

iwl_pcie_free_rbs_pool(trans);

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index f95eec52508e..3927bbf04f72 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -1786,6 +1786,11 @@ void iwl_trans_pcie_free(struct iwl_trans *trans)
iwl_pcie_tx_free(trans);
iwl_pcie_rx_free(trans);

+ if (trans_pcie->rba.alloc_wq) {
+ destroy_workqueue(trans_pcie->rba.alloc_wq);
+ trans_pcie->rba.alloc_wq = NULL;
+ }
+
if (trans_pcie->msix_enabled) {
for (i = 0; i < trans_pcie->alloc_vecs; i++) {
irq_set_affinity_hint(
@@ -3169,6 +3174,10 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev,
trans_pcie->inta_mask = CSR_INI_SET_MASK;
}

+ trans_pcie->rba.alloc_wq = alloc_workqueue("rb_allocator",
+ WQ_HIGHPRI | WQ_UNBOUND, 1);
+ INIT_WORK(&trans_pcie->rba.rx_alloc, iwl_pcie_rx_allocator_work);
+
#ifdef CONFIG_IWLWIFI_PCIE_RTPM
trans->runtime_pm_mode = IWL_PLAT_PM_MODE_D0I3;
#else
--
2.14.1

2017-08-22 07:37:49

by Luca Coelho

[permalink] [raw]
Subject: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

From: Luca Coelho <[email protected]>

Work queues cannot be allocated when a mutex is held because the mutex
may be in use and that would make it sleep. Doing so generates the
following splat with 4.13+:

[ 19.513298] ======================================================
[ 19.513429] WARNING: possible circular locking dependency detected
[ 19.513557] 4.13.0-rc5+ #6 Not tainted
[ 19.513638] ------------------------------------------------------
[ 19.513767] cpuhp/0/12 is trying to acquire lock:
[ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
[ 19.514047]
[ 19.514047] but task is already holding lock:
[ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
[ 19.514338]
[ 19.514338] which lock already depends on the new lock.

This lock dependency already existed with previous kernel versions,
but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
lock stacks for hotplug callbacks") was introduced.

Reported-by: David Weinehall <[email protected]>
Reported-by: Jiri Kosina <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
---
In v2:
- updated the commit message to a new version, with a grammar fix
and the actual commit that exposed the problem;

drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 2 ++
drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 10 +---------
drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 9 +++++++++
3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
index fa315d84e98e..a1ea9ef97ed9 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
@@ -787,6 +787,8 @@ int iwl_pci_fw_enter_d0i3(struct iwl_trans *trans);

void iwl_pcie_enable_rx_wake(struct iwl_trans *trans, bool enable);

+void iwl_pcie_rx_allocator_work(struct work_struct *data);
+
/* common functions that are used by gen2 transport */
void iwl_pcie_apm_config(struct iwl_trans *trans);
int iwl_pcie_prepare_card_hw(struct iwl_trans *trans);
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
index 351c4423125a..942736d3fa75 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
@@ -597,7 +597,7 @@ static void iwl_pcie_rx_allocator_get(struct iwl_trans *trans,
rxq->free_count += RX_CLAIM_REQ_ALLOC;
}

-static void iwl_pcie_rx_allocator_work(struct work_struct *data)
+void iwl_pcie_rx_allocator_work(struct work_struct *data)
{
struct iwl_rb_allocator *rba_p =
container_of(data, struct iwl_rb_allocator, rx_alloc);
@@ -900,10 +900,6 @@ static int _iwl_pcie_rx_init(struct iwl_trans *trans)
return err;
}
def_rxq = trans_pcie->rxq;
- if (!rba->alloc_wq)
- rba->alloc_wq = alloc_workqueue("rb_allocator",
- WQ_HIGHPRI | WQ_UNBOUND, 1);
- INIT_WORK(&rba->rx_alloc, iwl_pcie_rx_allocator_work);

spin_lock(&rba->lock);
atomic_set(&rba->req_pending, 0);
@@ -1017,10 +1013,6 @@ void iwl_pcie_rx_free(struct iwl_trans *trans)
}

cancel_work_sync(&rba->rx_alloc);
- if (rba->alloc_wq) {
- destroy_workqueue(rba->alloc_wq);
- rba->alloc_wq = NULL;
- }

iwl_pcie_free_rbs_pool(trans);

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index f95eec52508e..3927bbf04f72 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -1786,6 +1786,11 @@ void iwl_trans_pcie_free(struct iwl_trans *trans)
iwl_pcie_tx_free(trans);
iwl_pcie_rx_free(trans);

+ if (trans_pcie->rba.alloc_wq) {
+ destroy_workqueue(trans_pcie->rba.alloc_wq);
+ trans_pcie->rba.alloc_wq = NULL;
+ }
+
if (trans_pcie->msix_enabled) {
for (i = 0; i < trans_pcie->alloc_vecs; i++) {
irq_set_affinity_hint(
@@ -3169,6 +3174,10 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev,
trans_pcie->inta_mask = CSR_INI_SET_MASK;
}

+ trans_pcie->rba.alloc_wq = alloc_workqueue("rb_allocator",
+ WQ_HIGHPRI | WQ_UNBOUND, 1);
+ INIT_WORK(&trans_pcie->rba.rx_alloc, iwl_pcie_rx_allocator_work);
+
#ifdef CONFIG_IWLWIFI_PCIE_RTPM
trans->runtime_pm_mode = IWL_PLAT_PM_MODE_D0I3;
#else
--
2.14.1

2017-08-24 06:00:43

by Luca Coelho

[permalink] [raw]
Subject: Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

On Tue, 2017-08-22 at 10:37 +0300, Luca Coelho wrote:
> From: Luca Coelho <[email protected]>
>
> Work queues cannot be allocated when a mutex is held because the mutex
> may be in use and that would make it sleep. Doing so generates the
> following splat with 4.13+:
>
> [ 19.513298] ======================================================
> [ 19.513429] WARNING: possible circular locking dependency detected
> [ 19.513557] 4.13.0-rc5+ #6 Not tainted
> [ 19.513638] ------------------------------------------------------
> [ 19.513767] cpuhp/0/12 is trying to acquire lock:
> [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
> [ 19.514047]
> [ 19.514047] but task is already holding lock:
> [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
> [ 19.514338]
> [ 19.514338] which lock already depends on the new lock.
>
> This lock dependency already existed with previous kernel versions,
> but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
> lock stacks for hotplug callbacks") was introduced.
>
> Reported-by: David Weinehall <[email protected]>
> Reported-by: Jiri Kosina <[email protected]>
> Signed-off-by: Luca Coelho <[email protected]>

Jiri, did you have a chance to try this out? I'm about to ask Kalle to
merge this so it gets in in time for 4.13-rc7.

--
Cheers,
Luca.

2017-08-24 13:50:26

by Kalle Valo

[permalink] [raw]
Subject: Re: [v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

Luciano Coelho <[email protected]> wrote:

> From: Luca Coelho <[email protected]>
>
> Work queues cannot be allocated when a mutex is held because the mutex
> may be in use and that would make it sleep. Doing so generates the
> following splat with 4.13+:
>
> [ 19.513298] ======================================================
> [ 19.513429] WARNING: possible circular locking dependency detected
> [ 19.513557] 4.13.0-rc5+ #6 Not tainted
> [ 19.513638] ------------------------------------------------------
> [ 19.513767] cpuhp/0/12 is trying to acquire lock:
> [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
> [ 19.514047]
> [ 19.514047] but task is already holding lock:
> [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
> [ 19.514338]
> [ 19.514338] which lock already depends on the new lock.
>
> This lock dependency already existed with previous kernel versions,
> but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
> lock stacks for hotplug callbacks") was introduced.
>
> Reported-by: David Weinehall <[email protected]>
> Reported-by: Jiri Kosina <[email protected]>
> Signed-off-by: Luca Coelho <[email protected]>

Patch applied to wireless-drivers.git, thanks.

10a54d8196d1 iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

--
https://patchwork.kernel.org/patch/9914349/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2017-08-24 19:56:31

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

On Thu, 24 Aug 2017, Luca Coelho wrote:

> > Work queues cannot be allocated when a mutex is held because the mutex
> > may be in use and that would make it sleep. Doing so generates the
> > following splat with 4.13+:
> >
> > [ 19.513298] ======================================================
> > [ 19.513429] WARNING: possible circular locking dependency detected
> > [ 19.513557] 4.13.0-rc5+ #6 Not tainted
> > [ 19.513638] ------------------------------------------------------
> > [ 19.513767] cpuhp/0/12 is trying to acquire lock:
> > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
> > [ 19.514047]
> > [ 19.514047] but task is already holding lock:
> > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
> > [ 19.514338]
> > [ 19.514338] which lock already depends on the new lock.
> >
> > This lock dependency already existed with previous kernel versions,
> > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
> > lock stacks for hotplug callbacks") was introduced.
> >
> > Reported-by: David Weinehall <[email protected]>
> > Reported-by: Jiri Kosina <[email protected]>
> > Signed-off-by: Luca Coelho <[email protected]>
>
> Jiri, did you have a chance to try this out? I'm about to ask Kalle to
> merge this so it gets in in time for 4.13-rc7.

Sorry, I am almost completely offline for one more week (vacation), and
will not have access to the affected system before that. But this indeed
looks like a correct fix to me, so feel free to add

Acked-by: Jiri Kosina <[email protected]>

I'll be able to provide my Tested-by: eventually only in ~10 days.

Thanks,

--
Jiri Kosina
SUSE Labs

2017-08-24 20:00:24

by Luca Coelho

[permalink] [raw]
Subject: Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

On Thu, 2017-08-24 at 21:56 +0200, Jiri Kosina wrote:
> On Thu, 24 Aug 2017, Luca Coelho wrote:
>
> > > Work queues cannot be allocated when a mutex is held because the mutex
> > > may be in use and that would make it sleep. Doing so generates the
> > > following splat with 4.13+:
> > >
> > > [ 19.513298] ======================================================
> > > [ 19.513429] WARNING: possible circular locking dependency detected
> > > [ 19.513557] 4.13.0-rc5+ #6 Not tainted
> > > [ 19.513638] ------------------------------------------------------
> > > [ 19.513767] cpuhp/0/12 is trying to acquire lock:
> > > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
> > > [ 19.514047]
> > > [ 19.514047] but task is already holding lock:
> > > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
> > > [ 19.514338]
> > > [ 19.514338] which lock already depends on the new lock.
> > >
> > > This lock dependency already existed with previous kernel versions,
> > > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
> > > lock stacks for hotplug callbacks") was introduced.
> > >
> > > Reported-by: David Weinehall <[email protected]>
> > > Reported-by: Jiri Kosina <[email protected]>
> > > Signed-off-by: Luca Coelho <[email protected]>
> >
> > Jiri, did you have a chance to try this out? I'm about to ask Kalle to
> > merge this so it gets in in time for 4.13-rc7.
>
> Sorry, I am almost completely offline for one more week (vacation), and
> will not have access to the affected system before that.

Sounds good! Enjoy! ;)


> But this indeed
> looks like a correct fix to me, so feel free to add
>
> Acked-by: Jiri Kosina <[email protected]>
>
> I'll be able to provide my Tested-by: eventually only in ~10 days.


Kalle already picked it up in wireless-drivers and this should make it's
way to 4.13-rc7 (we hope).

In any case, thanks for reporting and the help debugging it.

--
Cheers,
Luca.

2017-08-29 08:19:15

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

Luca Coelho <[email protected]> writes:

> On Thu, 2017-08-24 at 21:56 +0200, Jiri Kosina wrote:
>> On Thu, 24 Aug 2017, Luca Coelho wrote:
>>
>> > > Work queues cannot be allocated when a mutex is held because the mutex
>> > > may be in use and that would make it sleep. Doing so generates the
>> > > following splat with 4.13+:
>> > >
>> > > [ 19.513298] ======================================================
>> > > [ 19.513429] WARNING: possible circular locking dependency detected
>> > > [ 19.513557] 4.13.0-rc5+ #6 Not tainted
>> > > [ 19.513638] ------------------------------------------------------
>> > > [ 19.513767] cpuhp/0/12 is trying to acquire lock:
>> > > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>]
>> > > thermal_zone_get_temp+0x5b/0xb0
>> > > [ 19.514047]
>> > > [ 19.514047] but task is already holding lock:
>> > > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>]
>> > > cpuhp_thread_fun+0x3a/0x210
>> > > [ 19.514338]
>> > > [ 19.514338] which lock already depends on the new lock.
>> > >
>> > > This lock dependency already existed with previous kernel versions,
>> > > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
>> > > lock stacks for hotplug callbacks") was introduced.
>> > >
>> > > Reported-by: David Weinehall <[email protected]>
>> > > Reported-by: Jiri Kosina <[email protected]>
>> > > Signed-off-by: Luca Coelho <[email protected]>
>> >
>> > Jiri, did you have a chance to try this out? I'm about to ask Kalle to
>> > merge this so it gets in in time for 4.13-rc7.
>>
>> Sorry, I am almost completely offline for one more week (vacation), and
>> will not have access to the affected system before that.
>
> Sounds good! Enjoy! ;)
>
>
>> But this indeed
>> looks like a correct fix to me, so feel free to add
>>
>> Acked-by: Jiri Kosina <[email protected]>
>>
>> I'll be able to provide my Tested-by: eventually only in ~10 days.
>
>
> Kalle already picked it up in wireless-drivers and this should make it's
> way to 4.13-rc7 (we hope).

It (10a54d8196d1) didn't make it to -rc7 but it's in net tree now and
should make it to the next release on Sunday (either -rc8 or the final).

--
Kalle Valo

2017-08-30 14:57:19

by David Weinehall

[permalink] [raw]
Subject: Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

On Tue, Aug 22, 2017 at 10:37:29AM +0300, Luca Coelho wrote:
> From: Luca Coelho <[email protected]>
>
> Work queues cannot be allocated when a mutex is held because the mutex
> may be in use and that would make it sleep. Doing so generates the
> following splat with 4.13+:
>
> [ 19.513298] ======================================================
> [ 19.513429] WARNING: possible circular locking dependency detected
> [ 19.513557] 4.13.0-rc5+ #6 Not tainted
> [ 19.513638] ------------------------------------------------------
> [ 19.513767] cpuhp/0/12 is trying to acquire lock:
> [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
> [ 19.514047]
> [ 19.514047] but task is already holding lock:
> [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
> [ 19.514338]
> [ 19.514338] which lock already depends on the new lock.
>
> This lock dependency already existed with previous kernel versions,
> but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
> lock stacks for hotplug callbacks") was introduced.
>
> Reported-by: David Weinehall <[email protected]>
> Reported-by: Jiri Kosina <[email protected]>
> Signed-off-by: Luca Coelho <[email protected]>

With this patch I no longer get the lockdep warning,
and the driver seems to work just as well as before.

Thanks!

Tested-by: David Weinehall <[email protected]>

> ---
> In v2:
> - updated the commit message to a new version, with a grammar fix
> and the actual commit that exposed the problem;
>
> drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 2 ++
> drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 10 +---------
> drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 9 +++++++++
> 3 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
> index fa315d84e98e..a1ea9ef97ed9 100644
> --- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
> @@ -787,6 +787,8 @@ int iwl_pci_fw_enter_d0i3(struct iwl_trans *trans);
>
> void iwl_pcie_enable_rx_wake(struct iwl_trans *trans, bool enable);
>
> +void iwl_pcie_rx_allocator_work(struct work_struct *data);
> +
> /* common functions that are used by gen2 transport */
> void iwl_pcie_apm_config(struct iwl_trans *trans);
> int iwl_pcie_prepare_card_hw(struct iwl_trans *trans);
> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
> index 351c4423125a..942736d3fa75 100644
> --- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
> @@ -597,7 +597,7 @@ static void iwl_pcie_rx_allocator_get(struct iwl_trans *trans,
> rxq->free_count += RX_CLAIM_REQ_ALLOC;
> }
>
> -static void iwl_pcie_rx_allocator_work(struct work_struct *data)
> +void iwl_pcie_rx_allocator_work(struct work_struct *data)
> {
> struct iwl_rb_allocator *rba_p =
> container_of(data, struct iwl_rb_allocator, rx_alloc);
> @@ -900,10 +900,6 @@ static int _iwl_pcie_rx_init(struct iwl_trans *trans)
> return err;
> }
> def_rxq = trans_pcie->rxq;
> - if (!rba->alloc_wq)
> - rba->alloc_wq = alloc_workqueue("rb_allocator",
> - WQ_HIGHPRI | WQ_UNBOUND, 1);
> - INIT_WORK(&rba->rx_alloc, iwl_pcie_rx_allocator_work);
>
> spin_lock(&rba->lock);
> atomic_set(&rba->req_pending, 0);
> @@ -1017,10 +1013,6 @@ void iwl_pcie_rx_free(struct iwl_trans *trans)
> }
>
> cancel_work_sync(&rba->rx_alloc);
> - if (rba->alloc_wq) {
> - destroy_workqueue(rba->alloc_wq);
> - rba->alloc_wq = NULL;
> - }
>
> iwl_pcie_free_rbs_pool(trans);
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
> index f95eec52508e..3927bbf04f72 100644
> --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
> @@ -1786,6 +1786,11 @@ void iwl_trans_pcie_free(struct iwl_trans *trans)
> iwl_pcie_tx_free(trans);
> iwl_pcie_rx_free(trans);
>
> + if (trans_pcie->rba.alloc_wq) {
> + destroy_workqueue(trans_pcie->rba.alloc_wq);
> + trans_pcie->rba.alloc_wq = NULL;
> + }
> +
> if (trans_pcie->msix_enabled) {
> for (i = 0; i < trans_pcie->alloc_vecs; i++) {
> irq_set_affinity_hint(
> @@ -3169,6 +3174,10 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev,
> trans_pcie->inta_mask = CSR_INI_SET_MASK;
> }
>
> + trans_pcie->rba.alloc_wq = alloc_workqueue("rb_allocator",
> + WQ_HIGHPRI | WQ_UNBOUND, 1);
> + INIT_WORK(&trans_pcie->rba.rx_alloc, iwl_pcie_rx_allocator_work);
> +
> #ifdef CONFIG_IWLWIFI_PCIE_RTPM
> trans->runtime_pm_mode = IWL_PLAT_PM_MODE_D0I3;
> #else
> --
> 2.14.1
>

2017-08-31 06:11:50

by Luca Coelho

[permalink] [raw]
Subject: Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

On Wed, 2017-08-30 at 17:57 +0300, David Weinehall wrote:
> On Tue, Aug 22, 2017 at 10:37:29AM +0300, Luca Coelho wrote:
> > From: Luca Coelho <[email protected]>
> >
> > Work queues cannot be allocated when a mutex is held because the mutex
> > may be in use and that would make it sleep. Doing so generates the
> > following splat with 4.13+:
> >
> > [ 19.513298] ======================================================
> > [ 19.513429] WARNING: possible circular locking dependency detected
> > [ 19.513557] 4.13.0-rc5+ #6 Not tainted
> > [ 19.513638] ------------------------------------------------------
> > [ 19.513767] cpuhp/0/12 is trying to acquire lock:
> > [ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
> > [ 19.514047]
> > [ 19.514047] but task is already holding lock:
> > [ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
> > [ 19.514338]
> > [ 19.514338] which lock already depends on the new lock.
> >
> > This lock dependency already existed with previous kernel versions,
> > but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
> > lock stacks for hotplug callbacks") was introduced.
> >
> > Reported-by: David Weinehall <[email protected]>
> > Reported-by: Jiri Kosina <[email protected]>
> > Signed-off-by: Luca Coelho <[email protected]>
>
> With this patch I no longer get the lockdep warning,
> and the driver seems to work just as well as before.

Great! Thanks for reporting and testing, David!


> Thanks!
>
> Tested-by: David Weinehall <[email protected]>

Thanks for this too, but unfortunately it's too late to add it, since
the patch is already in net tree.

--
Cheers,
Luca.

2017-09-04 11:43:09

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

On Thu, 24 Aug 2017, Luca Coelho wrote:

> > looks like a correct fix to me, so feel free to add
> >
> > Acked-by: Jiri Kosina <[email protected]>
> >
> > I'll be able to provide my Tested-by: eventually only in ~10 days.
>
>
> Kalle already picked it up in wireless-drivers and this should make it's
> way to 4.13-rc7 (we hope).
>
> In any case, thanks for reporting and the help debugging it.

I know it's pretty late for this to be added to the commit, but I don't
want this to be left in the void, so for the sake of completness:

Tested-by: Jiri Kosina <[email protected]>

Thanks,

--
Jiri Kosina
SUSE Labs

2017-09-04 11:45:06

by Luca Coelho

[permalink] [raw]
Subject: Re: [PATCH v2] iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

On Mon, 2017-09-04 at 13:43 +0200, Jiri Kosina wrote:
> On Thu, 24 Aug 2017, Luca Coelho wrote:
>
> > > looks like a correct fix to me, so feel free to add
> > >
> > > Acked-by: Jiri Kosina <[email protected]>
> > >
> > > I'll be able to provide my Tested-by: eventually only in ~10 days.
> >
> >
> > Kalle already picked it up in wireless-drivers and this should make it's
> > way to 4.13-rc7 (we hope).
> >
> > In any case, thanks for reporting and the help debugging it.
>
> I know it's pretty late for this to be added to the commit, but I don't
> want this to be left in the void, so for the sake of completness:
>
> Tested-by: Jiri Kosina <[email protected]>

Thanks, Jiri, for reporting, debugging and testing!

--
Cheers,
Luca.