2009-04-05 09:53:35

by Ming Lei

[permalink] [raw]
Subject: pm-hibernate : possible circular locking dependency detected

kernel version : one simple usb-serial patch against commit
6bb597507f9839b13498781e481f5458aea33620.

Thanks.

[ 2276.031486] PM: Saving platform NVS memory
[ 2276.032410] Disabling non-boot CPUs ...
[ 2276.032879]
[ 2276.032880] =======================================================
[ 2276.032882] [ INFO: possible circular locking dependency detected ]
[ 2276.032885] 2.6.29-08003-g8b58977 #84
[ 2276.032886] -------------------------------------------------------
[ 2276.032888] pm-hibernate/5061 is trying to acquire lock:
[ 2276.032891] (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
[<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
[ 2276.032900]
[ 2276.032900] but task is already holding lock:
[ 2276.032902] (&cpu_hotplug.lock){+.+.+.}, at: [<ffffffff80240d38>]
cpu_hotplug_begin+0x22/0x50
[ 2276.032908]
[ 2276.032909] which lock already depends on the new lock.
[ 2276.032910]
[ 2276.032911]
[ 2276.032912] the existing dependency chain (in reverse order) is:
[ 2276.032913]
[ 2276.032914] -> #6 (&cpu_hotplug.lock){+.+.+.}:
[ 2276.032918] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.032922] [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.032926] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
[ 2276.032930] [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
[ 2276.032933] [<ffffffff804728b0>] _cpu_up+0x58/0x12f
[ 2276.032937] [<ffffffff804729ea>] cpu_up+0x63/0x78
[ 2276.032940] [<ffffffff806585cb>] kernel_init+0xbd/0x189
[ 2276.032944] [<ffffffff8020c1fa>] child_rip+0xa/0x20
[ 2276.032948] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.032960]
[ 2276.032961] -> #5 (cpu_add_remove_lock){+.+.+.}:
[ 2276.032964] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.032968] [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.032971] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
[ 2276.032974] [<ffffffff80240d02>] cpu_maps_update_begin+0x12/0x14
[ 2276.032978] [<ffffffff80250526>] __create_workqueue_key+0xdc/0x1a2
[ 2276.032982] [<ffffffff80276eaf>] stop_machine_create+0x3a/0x9b
[ 2276.032987] [<ffffffff8026d959>] load_module+0x58/0x18c8
[ 2276.032990] [<ffffffff8026f320>] sys_init_module+0x57/0x1cd
[ 2276.032993] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.032998] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033001]
[ 2276.033001] -> #4 (setup_lock){+.+.+.}:
[ 2276.033005] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.033008] [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033012] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
[ 2276.033015] [<ffffffff80276e87>] stop_machine_create+0x12/0x9b
[ 2276.033018] [<ffffffff80240d7c>] disable_nonboot_cpus+0x16/0x120
[ 2276.033022] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
[ 2276.033025] [<ffffffff80271b52>] hibernate+0xab/0x1cf
[ 2276.033028] [<ffffffff802705f0>] state_store+0x59/0xd7
[ 2276.033032] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
[ 2276.033036] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
[ 2276.033040] [<ffffffff802c0d14>] vfs_write+0xae/0x137
[ 2276.033044] [<ffffffff802c0e61>] sys_write+0x47/0x6e
[ 2276.033047] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.033050] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033053]
[ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
[ 2276.033057] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.033061] [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033065] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
[ 2276.033068] [<ffffffff803c4339>] device_pm_add+0x46/0xed
[ 2276.033073] [<ffffffff803bdeee>] device_add+0x488/0x61a
[ 2276.033077] [<ffffffff803be099>] device_register+0x19/0x1d
[ 2276.033080] [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
[ 2276.033085] [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
[ 2276.033089] [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
[ 2276.033093] [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
[ 2276.033097] [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
[ 2276.033100] [<ffffffff80253d40>] kthread+0x49/0x76
[ 2276.033104] [<ffffffff8020c1fa>] child_rip+0xa/0x20
[ 2276.033107] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033111]
[ 2276.033111] -> #2 (&wfc.work){+.+.+.}:
[ 2276.033114] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.033118] [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033122] [<ffffffff80250032>] worker_thread+0x1ac/0x2c9
[ 2276.033125] [<ffffffff80253d40>] kthread+0x49/0x76
[ 2276.033128] [<ffffffff8020c1fa>] child_rip+0xa/0x20
[ 2276.033131] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033134]
[ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
[ 2276.033138] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.033141] [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033145] [<ffffffff80250885>] flush_work+0x60/0xeb
[ 2276.033148] [<ffffffff80250c49>] work_on_cpu+0x73/0x84
[ 2276.033151] [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
[ 2276.033157] [<ffffffffa02b443a>]
get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
[ 2276.033161] [<ffffffffa02b4b69>]
acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
[ 2276.033166] [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
[ 2276.033169] [<ffffffff803bea90>] sysdev_driver_register+0xc6/0x120
[ 2276.033173] [<ffffffff803da5c1>] cpufreq_register_driver+0xbc/0x1ae
[ 2276.033177] [<ffffffffa00c60c5>] 0xffffffffa00c60c5
[ 2276.033180] [<ffffffff80209076>] do_one_initcall+0x70/0x185
[ 2276.033183] [<ffffffff8026f374>] sys_init_module+0xab/0x1cd
[ 2276.033187] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.033190] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033197]
[ 2276.033197] -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
[ 2276.033201] [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
[ 2276.033204] [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033208] [<ffffffff80476d6c>] down_write+0x43/0x77
[ 2276.033211] [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
[ 2276.033215] [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
[ 2276.033218] [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
[ 2276.033222] [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
[ 2276.033226] [<ffffffff80465097>] _cpu_down+0x86/0x298
[ 2276.033229] [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
[ 2276.033233] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
[ 2276.033236] [<ffffffff80271b52>] hibernate+0xab/0x1cf
[ 2276.033239] [<ffffffff802705f0>] state_store+0x59/0xd7
[ 2276.033242] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
[ 2276.033246] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
[ 2276.033249] [<ffffffff802c0d14>] vfs_write+0xae/0x137
[ 2276.033252] [<ffffffff802c0e61>] sys_write+0x47/0x6e
[ 2276.033255] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.033259] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033262]
[ 2276.033262] other info that might help us debug this:
[ 2276.033263]
[ 2276.033265] 5 locks held by pm-hibernate/5061:
[ 2276.033267] #0: (&buffer->mutex){+.+.+.}, at:
[<ffffffff8030f724>] sysfs_write_file+0x38/0x119
[ 2276.033273] #1: (pm_mutex){+.+.+.}, at: [<ffffffff80271abe>]
hibernate+0x17/0x1cf
[ 2276.033278] #2: (dpm_list_mtx){+.+.+.}, at: [<ffffffff803c35c5>]
device_pm_lock+0x12/0x14
[ 2276.033284] #3: (cpu_add_remove_lock){+.+.+.}, at:
[<ffffffff80240d99>] disable_nonboot_cpus+0x33/0x120
[ 2276.033289] #4: (&cpu_hotplug.lock){+.+.+.}, at:
[<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
[ 2276.033294]
[ 2276.033295] stack backtrace:
[ 2276.033297] Pid: 5061, comm: pm-hibernate Not tainted
2.6.29-08003-g8b58977 #84
[ 2276.033300] Call Trace:
[ 2276.033304] [<ffffffff80263cbf>] print_circular_bug_tail+0xc5/0xd0
[ 2276.033307] [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
[ 2276.033312] [<ffffffff802382b0>] ? sub_preempt_count+0xdc/0xf1
[ 2276.033315] [<ffffffff80263470>] ? trace_hardirqs_on+0xd/0xf
[ 2276.033319] [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033322] [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
[ 2276.033326] [<ffffffff80476d6c>] down_write+0x43/0x77
[ 2276.033329] [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
[ 2276.033333] [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
[ 2276.033337] [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
[ 2276.033340] [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
[ 2276.033344] [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
[ 2276.033347] [<ffffffff80465097>] _cpu_down+0x86/0x298
[ 2276.033350] [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
[ 2276.033350] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
[ 2276.033350] [<ffffffff80271b52>] hibernate+0xab/0x1cf
[ 2276.033350] [<ffffffff802705f0>] state_store+0x59/0xd7
[ 2276.033350] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
[ 2276.033350] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
[ 2276.033350] [<ffffffff802c0d14>] vfs_write+0xae/0x137
[ 2276.033350] [<ffffffff802c0e61>] sys_write+0x47/0x6e
[ 2276.033350] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.034478] Broke affinity for irq 21
[ 2276.034478] Broke affinity for irq 22
[ 2276.034478] Broke affinity for irq 29
[ 2276.035812] CPU 1 is now offline
[ 2276.035814] lockdep: fixing up alternatives.
[ 2276.035816] SMP alternatives: switching to UP code
[ 2276.158329] CPU0 attaching NULL sched-domain.
[ 2276.158334] CPU1 attaching NULL sched-domain.
[ 2276.158361] CPU0 attaching NULL sched-domain.
[ 2276.158841] CPU1 is down
[ 2276.158867] Checking wake-up interrupts
[ 2276.158883] Suspending System Devices
[ 2276.158885] Suspending type 'oprofile':
[ 2276.158886] oprofile0
[ 2276.158888] Suspending type 'edac':
[ 2276.158889] Suspending type 'irqrouter':
[ 2276.158891] irqrouter0
[ 2276.158892] Suspending type 'clocksource':
[ 2276.158894] clocksource0
[ 2276.158895] Suspending type 'timekeeping':
[ 2276.158897] timekeeping0
[ 2276.158929] Extended CMOS year: 2000
[ 2276.158937] Suspending type 'ioapic':
[ 2276.158938] ioapic0
[ 2276.159007] Suspending type 'lapic':
[ 2276.159007] lapic0
[ 2276.159007] Suspending type 'machinecheck':
[ 2276.159007] machinecheck0
[ 2276.159007] Suspending type 'i8237':
[ 2276.159007] i82370
[ 2276.159007] Suspending type 'i8259':
[ 2276.159007] i82590
[ 2276.159007] Suspending type 'node':
[ 2276.159007] node0
[ 2276.159007] Suspending type 'cpu':
[ 2276.159007] cpu0
[ 2276.159007] cpu1
[ 2276.159007] PM: Creating hibernation image:
[ 2276.159007] PM: Need to copy 112986 pages
[ 2276.159007] PM: Restoring platform NVS memory
[ 2276.159007] Resuming System Devices
[ 2276.159007] Resuming type 'cpu':
[ 2276.159007] cpu0
[ 2276.159007] cpu1
[ 2276.159007] Resuming type 'node':
[ 2276.159007] node0
[ 2276.159007] Resuming type 'i8259':
[ 2276.159007] i82590
[ 2276.159007] Resuming type 'i8237':
[ 2276.159007] i82370
[ 2276.159007] Resuming type 'machinecheck':
[ 2276.159007] machinecheck0
[ 2276.159007] CPU0: Thermal LVT vector (0xfa) already installed
[ 2276.159007] Resuming type 'lapic':
[ 2276.159007] lapic0
[ 2276.159007] Resuming type 'ioapic':
[ 2276.159007] ioapic0
[ 2276.159007] Resuming type 'timekeeping':
[ 2276.159007] timekeeping0
[ 2276.159007] Extended CMOS year: 2000
[ 2276.159007] Resuming type 'clocksource':
[ 2276.159007] clocksource0
[ 2276.159007] Resuming type 'irqrouter':
[ 2276.159007] irqrouter0
[ 2276.159007] Resuming type 'edac':
[ 2276.159007] Resuming type 'oprofile':
[ 2276.159007] oprofile0



--
Lei Ming


2009-04-05 10:13:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

On Sunday 05 April 2009, Ming Lei wrote:
> kernel version : one simple usb-serial patch against commit
> 6bb597507f9839b13498781e481f5458aea33620.
>
> Thanks.

Hmm, CPU hotplug again, it seems.

I'm not sure who's the maintainer at the moment. Andrew, is that Gautham?

Rafael


> [ 2276.031486] PM: Saving platform NVS memory
> [ 2276.032410] Disabling non-boot CPUs ...
> [ 2276.032879]
> [ 2276.032880] =======================================================
> [ 2276.032882] [ INFO: possible circular locking dependency detected ]
> [ 2276.032885] 2.6.29-08003-g8b58977 #84
> [ 2276.032886] -------------------------------------------------------
> [ 2276.032888] pm-hibernate/5061 is trying to acquire lock:
> [ 2276.032891] (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
> [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> [ 2276.032900]
> [ 2276.032900] but task is already holding lock:
> [ 2276.032902] (&cpu_hotplug.lock){+.+.+.}, at: [<ffffffff80240d38>]
> cpu_hotplug_begin+0x22/0x50
> [ 2276.032908]
> [ 2276.032909] which lock already depends on the new lock.
> [ 2276.032910]
> [ 2276.032911]
> [ 2276.032912] the existing dependency chain (in reverse order) is:
> [ 2276.032913]
> [ 2276.032914] -> #6 (&cpu_hotplug.lock){+.+.+.}:
> [ 2276.032918] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.032922] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.032926] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> [ 2276.032930] [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> [ 2276.032933] [<ffffffff804728b0>] _cpu_up+0x58/0x12f
> [ 2276.032937] [<ffffffff804729ea>] cpu_up+0x63/0x78
> [ 2276.032940] [<ffffffff806585cb>] kernel_init+0xbd/0x189
> [ 2276.032944] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> [ 2276.032948] [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.032960]
> [ 2276.032961] -> #5 (cpu_add_remove_lock){+.+.+.}:
> [ 2276.032964] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.032968] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.032971] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> [ 2276.032974] [<ffffffff80240d02>] cpu_maps_update_begin+0x12/0x14
> [ 2276.032978] [<ffffffff80250526>] __create_workqueue_key+0xdc/0x1a2
> [ 2276.032982] [<ffffffff80276eaf>] stop_machine_create+0x3a/0x9b
> [ 2276.032987] [<ffffffff8026d959>] load_module+0x58/0x18c8
> [ 2276.032990] [<ffffffff8026f320>] sys_init_module+0x57/0x1cd
> [ 2276.032993] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.032998] [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033001]
> [ 2276.033001] -> #4 (setup_lock){+.+.+.}:
> [ 2276.033005] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.033008] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033012] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> [ 2276.033015] [<ffffffff80276e87>] stop_machine_create+0x12/0x9b
> [ 2276.033018] [<ffffffff80240d7c>] disable_nonboot_cpus+0x16/0x120
> [ 2276.033022] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> [ 2276.033025] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> [ 2276.033028] [<ffffffff802705f0>] state_store+0x59/0xd7
> [ 2276.033032] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> [ 2276.033036] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> [ 2276.033040] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> [ 2276.033044] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> [ 2276.033047] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.033050] [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033053]
> [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> [ 2276.033057] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.033061] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033065] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> [ 2276.033068] [<ffffffff803c4339>] device_pm_add+0x46/0xed
> [ 2276.033073] [<ffffffff803bdeee>] device_add+0x488/0x61a
> [ 2276.033077] [<ffffffff803be099>] device_register+0x19/0x1d
> [ 2276.033080] [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> [ 2276.033085] [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> [ 2276.033089] [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> [ 2276.033093] [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> [ 2276.033097] [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> [ 2276.033100] [<ffffffff80253d40>] kthread+0x49/0x76
> [ 2276.033104] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> [ 2276.033107] [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033111]
> [ 2276.033111] -> #2 (&wfc.work){+.+.+.}:
> [ 2276.033114] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.033118] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033122] [<ffffffff80250032>] worker_thread+0x1ac/0x2c9
> [ 2276.033125] [<ffffffff80253d40>] kthread+0x49/0x76
> [ 2276.033128] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> [ 2276.033131] [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033134]
> [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> [ 2276.033138] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.033141] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033145] [<ffffffff80250885>] flush_work+0x60/0xeb
> [ 2276.033148] [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> [ 2276.033151] [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> [ 2276.033157] [<ffffffffa02b443a>]
> get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> [ 2276.033161] [<ffffffffa02b4b69>]
> acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> [ 2276.033166] [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
> [ 2276.033169] [<ffffffff803bea90>] sysdev_driver_register+0xc6/0x120
> [ 2276.033173] [<ffffffff803da5c1>] cpufreq_register_driver+0xbc/0x1ae
> [ 2276.033177] [<ffffffffa00c60c5>] 0xffffffffa00c60c5
> [ 2276.033180] [<ffffffff80209076>] do_one_initcall+0x70/0x185
> [ 2276.033183] [<ffffffff8026f374>] sys_init_module+0xab/0x1cd
> [ 2276.033187] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.033190] [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033197]
> [ 2276.033197] -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
> [ 2276.033201] [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> [ 2276.033204] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033208] [<ffffffff80476d6c>] down_write+0x43/0x77
> [ 2276.033211] [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> [ 2276.033215] [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> [ 2276.033218] [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> [ 2276.033222] [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> [ 2276.033226] [<ffffffff80465097>] _cpu_down+0x86/0x298
> [ 2276.033229] [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> [ 2276.033233] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> [ 2276.033236] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> [ 2276.033239] [<ffffffff802705f0>] state_store+0x59/0xd7
> [ 2276.033242] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> [ 2276.033246] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> [ 2276.033249] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> [ 2276.033252] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> [ 2276.033255] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.033259] [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033262]
> [ 2276.033262] other info that might help us debug this:
> [ 2276.033263]
> [ 2276.033265] 5 locks held by pm-hibernate/5061:
> [ 2276.033267] #0: (&buffer->mutex){+.+.+.}, at:
> [<ffffffff8030f724>] sysfs_write_file+0x38/0x119
> [ 2276.033273] #1: (pm_mutex){+.+.+.}, at: [<ffffffff80271abe>]
> hibernate+0x17/0x1cf
> [ 2276.033278] #2: (dpm_list_mtx){+.+.+.}, at: [<ffffffff803c35c5>]
> device_pm_lock+0x12/0x14
> [ 2276.033284] #3: (cpu_add_remove_lock){+.+.+.}, at:
> [<ffffffff80240d99>] disable_nonboot_cpus+0x33/0x120
> [ 2276.033289] #4: (&cpu_hotplug.lock){+.+.+.}, at:
> [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> [ 2276.033294]
> [ 2276.033295] stack backtrace:
> [ 2276.033297] Pid: 5061, comm: pm-hibernate Not tainted
> 2.6.29-08003-g8b58977 #84
> [ 2276.033300] Call Trace:
> [ 2276.033304] [<ffffffff80263cbf>] print_circular_bug_tail+0xc5/0xd0
> [ 2276.033307] [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> [ 2276.033312] [<ffffffff802382b0>] ? sub_preempt_count+0xdc/0xf1
> [ 2276.033315] [<ffffffff80263470>] ? trace_hardirqs_on+0xd/0xf
> [ 2276.033319] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033322] [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> [ 2276.033326] [<ffffffff80476d6c>] down_write+0x43/0x77
> [ 2276.033329] [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> [ 2276.033333] [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> [ 2276.033337] [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> [ 2276.033340] [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> [ 2276.033344] [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> [ 2276.033347] [<ffffffff80465097>] _cpu_down+0x86/0x298
> [ 2276.033350] [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> [ 2276.033350] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> [ 2276.033350] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> [ 2276.033350] [<ffffffff802705f0>] state_store+0x59/0xd7
> [ 2276.033350] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> [ 2276.033350] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> [ 2276.033350] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> [ 2276.033350] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> [ 2276.033350] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.034478] Broke affinity for irq 21
> [ 2276.034478] Broke affinity for irq 22
> [ 2276.034478] Broke affinity for irq 29
> [ 2276.035812] CPU 1 is now offline
> [ 2276.035814] lockdep: fixing up alternatives.
> [ 2276.035816] SMP alternatives: switching to UP code
> [ 2276.158329] CPU0 attaching NULL sched-domain.
> [ 2276.158334] CPU1 attaching NULL sched-domain.
> [ 2276.158361] CPU0 attaching NULL sched-domain.
> [ 2276.158841] CPU1 is down
> [ 2276.158867] Checking wake-up interrupts
> [ 2276.158883] Suspending System Devices
> [ 2276.158885] Suspending type 'oprofile':
> [ 2276.158886] oprofile0
> [ 2276.158888] Suspending type 'edac':
> [ 2276.158889] Suspending type 'irqrouter':
> [ 2276.158891] irqrouter0
> [ 2276.158892] Suspending type 'clocksource':
> [ 2276.158894] clocksource0
> [ 2276.158895] Suspending type 'timekeeping':
> [ 2276.158897] timekeeping0
> [ 2276.158929] Extended CMOS year: 2000
> [ 2276.158937] Suspending type 'ioapic':
> [ 2276.158938] ioapic0
> [ 2276.159007] Suspending type 'lapic':
> [ 2276.159007] lapic0
> [ 2276.159007] Suspending type 'machinecheck':
> [ 2276.159007] machinecheck0
> [ 2276.159007] Suspending type 'i8237':
> [ 2276.159007] i82370
> [ 2276.159007] Suspending type 'i8259':
> [ 2276.159007] i82590
> [ 2276.159007] Suspending type 'node':
> [ 2276.159007] node0
> [ 2276.159007] Suspending type 'cpu':
> [ 2276.159007] cpu0
> [ 2276.159007] cpu1
> [ 2276.159007] PM: Creating hibernation image:
> [ 2276.159007] PM: Need to copy 112986 pages
> [ 2276.159007] PM: Restoring platform NVS memory
> [ 2276.159007] Resuming System Devices
> [ 2276.159007] Resuming type 'cpu':
> [ 2276.159007] cpu0
> [ 2276.159007] cpu1
> [ 2276.159007] Resuming type 'node':
> [ 2276.159007] node0
> [ 2276.159007] Resuming type 'i8259':
> [ 2276.159007] i82590
> [ 2276.159007] Resuming type 'i8237':
> [ 2276.159007] i82370
> [ 2276.159007] Resuming type 'machinecheck':
> [ 2276.159007] machinecheck0
> [ 2276.159007] CPU0: Thermal LVT vector (0xfa) already installed
> [ 2276.159007] Resuming type 'lapic':
> [ 2276.159007] lapic0
> [ 2276.159007] Resuming type 'ioapic':
> [ 2276.159007] ioapic0
> [ 2276.159007] Resuming type 'timekeeping':
> [ 2276.159007] timekeeping0
> [ 2276.159007] Extended CMOS year: 2000
> [ 2276.159007] Resuming type 'clocksource':
> [ 2276.159007] clocksource0
> [ 2276.159007] Resuming type 'irqrouter':
> [ 2276.159007] irqrouter0
> [ 2276.159007] Resuming type 'edac':
> [ 2276.159007] Resuming type 'oprofile':
> [ 2276.159007] oprofile0

2009-04-05 13:46:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected


* Rafael J. Wysocki <[email protected]> wrote:

> On Sunday 05 April 2009, Ming Lei wrote:
> > kernel version : one simple usb-serial patch against commit
> > 6bb597507f9839b13498781e481f5458aea33620.
> >
> > Thanks.
>
> Hmm, CPU hotplug again, it seems.
>
> I'm not sure who's the maintainer at the moment. Andrew, is that
> Gautham?

CPU hotplug tends to land on the scheduler people's desk normally.

But i'm not sure that's the real thing here - key appears to be this
work_on_cpu() worklet by the cpufreq code:

> > [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> > [ 2276.033138] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033141] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033145] [<ffffffff80250885>] flush_work+0x60/0xeb
> > [ 2276.033148] [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> > [ 2276.033151] [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> > [ 2276.033157] [<ffffffffa02b443a>]
> > get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> > [ 2276.033161] [<ffffffffa02b4b69>]
> > acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> > [ 2276.033166] [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676

Connecting things the wrong way around. Have not looked deeply
though - Peter, Rusty, what's your take? (below is the message
again)

Ingo

> > [ 2276.031486] PM: Saving platform NVS memory
> > [ 2276.032410] Disabling non-boot CPUs ...
> > [ 2276.032879]
> > [ 2276.032880] =======================================================
> > [ 2276.032882] [ INFO: possible circular locking dependency detected ]
> > [ 2276.032885] 2.6.29-08003-g8b58977 #84
> > [ 2276.032886] -------------------------------------------------------
> > [ 2276.032888] pm-hibernate/5061 is trying to acquire lock:
> > [ 2276.032891] (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
> > [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > [ 2276.032900]
> > [ 2276.032900] but task is already holding lock:
> > [ 2276.032902] (&cpu_hotplug.lock){+.+.+.}, at: [<ffffffff80240d38>]
> > cpu_hotplug_begin+0x22/0x50
> > [ 2276.032908]
> > [ 2276.032909] which lock already depends on the new lock.
> > [ 2276.032910]
> > [ 2276.032911]
> > [ 2276.032912] the existing dependency chain (in reverse order) is:
> > [ 2276.032913]
> > [ 2276.032914] -> #6 (&cpu_hotplug.lock){+.+.+.}:
> > [ 2276.032918] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.032922] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.032926] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.032930] [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> > [ 2276.032933] [<ffffffff804728b0>] _cpu_up+0x58/0x12f
> > [ 2276.032937] [<ffffffff804729ea>] cpu_up+0x63/0x78
> > [ 2276.032940] [<ffffffff806585cb>] kernel_init+0xbd/0x189
> > [ 2276.032944] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > [ 2276.032948] [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.032960]
> > [ 2276.032961] -> #5 (cpu_add_remove_lock){+.+.+.}:
> > [ 2276.032964] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.032968] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.032971] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.032974] [<ffffffff80240d02>] cpu_maps_update_begin+0x12/0x14
> > [ 2276.032978] [<ffffffff80250526>] __create_workqueue_key+0xdc/0x1a2
> > [ 2276.032982] [<ffffffff80276eaf>] stop_machine_create+0x3a/0x9b
> > [ 2276.032987] [<ffffffff8026d959>] load_module+0x58/0x18c8
> > [ 2276.032990] [<ffffffff8026f320>] sys_init_module+0x57/0x1cd
> > [ 2276.032993] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.032998] [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033001]
> > [ 2276.033001] -> #4 (setup_lock){+.+.+.}:
> > [ 2276.033005] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033008] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033012] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.033015] [<ffffffff80276e87>] stop_machine_create+0x12/0x9b
> > [ 2276.033018] [<ffffffff80240d7c>] disable_nonboot_cpus+0x16/0x120
> > [ 2276.033022] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > [ 2276.033025] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > [ 2276.033028] [<ffffffff802705f0>] state_store+0x59/0xd7
> > [ 2276.033032] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > [ 2276.033036] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > [ 2276.033040] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > [ 2276.033044] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > [ 2276.033047] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.033050] [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033053]
> > [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> > [ 2276.033057] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033061] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033065] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.033068] [<ffffffff803c4339>] device_pm_add+0x46/0xed
> > [ 2276.033073] [<ffffffff803bdeee>] device_add+0x488/0x61a
> > [ 2276.033077] [<ffffffff803be099>] device_register+0x19/0x1d
> > [ 2276.033080] [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> > [ 2276.033085] [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> > [ 2276.033089] [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> > [ 2276.033093] [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> > [ 2276.033097] [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> > [ 2276.033100] [<ffffffff80253d40>] kthread+0x49/0x76
> > [ 2276.033104] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > [ 2276.033107] [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033111]
> > [ 2276.033111] -> #2 (&wfc.work){+.+.+.}:
> > [ 2276.033114] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033118] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033122] [<ffffffff80250032>] worker_thread+0x1ac/0x2c9
> > [ 2276.033125] [<ffffffff80253d40>] kthread+0x49/0x76
> > [ 2276.033128] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > [ 2276.033131] [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033134]
> > [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> > [ 2276.033138] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033141] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033145] [<ffffffff80250885>] flush_work+0x60/0xeb
> > [ 2276.033148] [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> > [ 2276.033151] [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> > [ 2276.033157] [<ffffffffa02b443a>]
> > get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> > [ 2276.033161] [<ffffffffa02b4b69>]
> > acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> > [ 2276.033166] [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
> > [ 2276.033169] [<ffffffff803bea90>] sysdev_driver_register+0xc6/0x120
> > [ 2276.033173] [<ffffffff803da5c1>] cpufreq_register_driver+0xbc/0x1ae
> > [ 2276.033177] [<ffffffffa00c60c5>] 0xffffffffa00c60c5
> > [ 2276.033180] [<ffffffff80209076>] do_one_initcall+0x70/0x185
> > [ 2276.033183] [<ffffffff8026f374>] sys_init_module+0xab/0x1cd
> > [ 2276.033187] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.033190] [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033197]
> > [ 2276.033197] -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
> > [ 2276.033201] [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> > [ 2276.033204] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033208] [<ffffffff80476d6c>] down_write+0x43/0x77
> > [ 2276.033211] [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > [ 2276.033215] [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> > [ 2276.033218] [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> > [ 2276.033222] [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> > [ 2276.033226] [<ffffffff80465097>] _cpu_down+0x86/0x298
> > [ 2276.033229] [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> > [ 2276.033233] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > [ 2276.033236] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > [ 2276.033239] [<ffffffff802705f0>] state_store+0x59/0xd7
> > [ 2276.033242] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > [ 2276.033246] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > [ 2276.033249] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > [ 2276.033252] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > [ 2276.033255] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.033259] [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033262]
> > [ 2276.033262] other info that might help us debug this:
> > [ 2276.033263]
> > [ 2276.033265] 5 locks held by pm-hibernate/5061:
> > [ 2276.033267] #0: (&buffer->mutex){+.+.+.}, at:
> > [<ffffffff8030f724>] sysfs_write_file+0x38/0x119
> > [ 2276.033273] #1: (pm_mutex){+.+.+.}, at: [<ffffffff80271abe>]
> > hibernate+0x17/0x1cf
> > [ 2276.033278] #2: (dpm_list_mtx){+.+.+.}, at: [<ffffffff803c35c5>]
> > device_pm_lock+0x12/0x14
> > [ 2276.033284] #3: (cpu_add_remove_lock){+.+.+.}, at:
> > [<ffffffff80240d99>] disable_nonboot_cpus+0x33/0x120
> > [ 2276.033289] #4: (&cpu_hotplug.lock){+.+.+.}, at:
> > [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> > [ 2276.033294]
> > [ 2276.033295] stack backtrace:
> > [ 2276.033297] Pid: 5061, comm: pm-hibernate Not tainted
> > 2.6.29-08003-g8b58977 #84
> > [ 2276.033300] Call Trace:
> > [ 2276.033304] [<ffffffff80263cbf>] print_circular_bug_tail+0xc5/0xd0
> > [ 2276.033307] [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> > [ 2276.033312] [<ffffffff802382b0>] ? sub_preempt_count+0xdc/0xf1
> > [ 2276.033315] [<ffffffff80263470>] ? trace_hardirqs_on+0xd/0xf
> > [ 2276.033319] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033322] [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> > [ 2276.033326] [<ffffffff80476d6c>] down_write+0x43/0x77
> > [ 2276.033329] [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> > [ 2276.033333] [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > [ 2276.033337] [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> > [ 2276.033340] [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> > [ 2276.033344] [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> > [ 2276.033347] [<ffffffff80465097>] _cpu_down+0x86/0x298
> > [ 2276.033350] [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> > [ 2276.033350] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > [ 2276.033350] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > [ 2276.033350] [<ffffffff802705f0>] state_store+0x59/0xd7
> > [ 2276.033350] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > [ 2276.033350] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > [ 2276.033350] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > [ 2276.033350] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > [ 2276.033350] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.034478] Broke affinity for irq 21
> > [ 2276.034478] Broke affinity for irq 22
> > [ 2276.034478] Broke affinity for irq 29
> > [ 2276.035812] CPU 1 is now offline
> > [ 2276.035814] lockdep: fixing up alternatives.
> > [ 2276.035816] SMP alternatives: switching to UP code
> > [ 2276.158329] CPU0 attaching NULL sched-domain.
> > [ 2276.158334] CPU1 attaching NULL sched-domain.
> > [ 2276.158361] CPU0 attaching NULL sched-domain.
> > [ 2276.158841] CPU1 is down
> > [ 2276.158867] Checking wake-up interrupts
> > [ 2276.158883] Suspending System Devices
> > [ 2276.158885] Suspending type 'oprofile':
> > [ 2276.158886] oprofile0
> > [ 2276.158888] Suspending type 'edac':
> > [ 2276.158889] Suspending type 'irqrouter':
> > [ 2276.158891] irqrouter0
> > [ 2276.158892] Suspending type 'clocksource':
> > [ 2276.158894] clocksource0
> > [ 2276.158895] Suspending type 'timekeeping':
> > [ 2276.158897] timekeeping0
> > [ 2276.158929] Extended CMOS year: 2000
> > [ 2276.158937] Suspending type 'ioapic':
> > [ 2276.158938] ioapic0
> > [ 2276.159007] Suspending type 'lapic':
> > [ 2276.159007] lapic0
> > [ 2276.159007] Suspending type 'machinecheck':
> > [ 2276.159007] machinecheck0
> > [ 2276.159007] Suspending type 'i8237':
> > [ 2276.159007] i82370
> > [ 2276.159007] Suspending type 'i8259':
> > [ 2276.159007] i82590
> > [ 2276.159007] Suspending type 'node':
> > [ 2276.159007] node0
> > [ 2276.159007] Suspending type 'cpu':
> > [ 2276.159007] cpu0
> > [ 2276.159007] cpu1
> > [ 2276.159007] PM: Creating hibernation image:
> > [ 2276.159007] PM: Need to copy 112986 pages
> > [ 2276.159007] PM: Restoring platform NVS memory
> > [ 2276.159007] Resuming System Devices
> > [ 2276.159007] Resuming type 'cpu':
> > [ 2276.159007] cpu0
> > [ 2276.159007] cpu1
> > [ 2276.159007] Resuming type 'node':
> > [ 2276.159007] node0
> > [ 2276.159007] Resuming type 'i8259':
> > [ 2276.159007] i82590
> > [ 2276.159007] Resuming type 'i8237':
> > [ 2276.159007] i82370
> > [ 2276.159007] Resuming type 'machinecheck':
> > [ 2276.159007] machinecheck0
> > [ 2276.159007] CPU0: Thermal LVT vector (0xfa) already installed
> > [ 2276.159007] Resuming type 'lapic':
> > [ 2276.159007] lapic0
> > [ 2276.159007] Resuming type 'ioapic':
> > [ 2276.159007] ioapic0
> > [ 2276.159007] Resuming type 'timekeeping':
> > [ 2276.159007] timekeeping0
> > [ 2276.159007] Extended CMOS year: 2000
> > [ 2276.159007] Resuming type 'clocksource':
> > [ 2276.159007] clocksource0
> > [ 2276.159007] Resuming type 'irqrouter':
> > [ 2276.159007] irqrouter0
> > [ 2276.159007] Resuming type 'edac':
> > [ 2276.159007] Resuming type 'oprofile':
> > [ 2276.159007] oprofile0

Subject: Re: pm-hibernate : possible circular locking dependency detected

On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > On Sunday 05 April 2009, Ming Lei wrote:
> > > kernel version : one simple usb-serial patch against commit
> > > 6bb597507f9839b13498781e481f5458aea33620.
> > >
> > > Thanks.
> >
> > Hmm, CPU hotplug again, it seems.
> >
> > I'm not sure who's the maintainer at the moment. Andrew, is that
> > Gautham?
>
> CPU hotplug tends to land on the scheduler people's desk normally.
>
> But i'm not sure that's the real thing here - key appears to be this
> work_on_cpu() worklet by the cpufreq code:

Actually, there are two dependency chains here which can lead to a deadlock.
The one we're seeing here is the longer of the two.

If the relevant locks are numbered as follows:
[1]: cpu_policy_rwsem
[2]: work_on_cpu
[3]: cpu_hotplug.lock
[4]: dpm_list_mtx


The individual callpaths are:

1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
|
work_on_cpu()[2] <-- get_measured_perf() <--|


2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
|
[4] device_pm_add() <-- ..<-- local_pci_probe() <--|



3) hibernate() --> hibernatioin_snapshot() --> create_image()
|
disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
|
|--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]


The two chains which can deadlock are

a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
and
b) [3] --> [2] --> [4] --> [3]

Ingo,
do_dbs_timer() function of the ondemand governor is run from a per-cpu
workqueue. Hence it is already running on the cpu whose perf counters
we're interested in.

Does it make sense to introduce a get_this_measured_perf() API
for users who are already running on the relevant CPU ?
And have get_measured_perf(cpu) for other users (currently there are
none) ?

Thus, do_dbs_timer() can avoid calling work_on_cpu() thereby preventing
deadlock a) from occuring.

Rafael,
Sorry, I am not well versed with the hibernation code. But does the
following make sense:

create_image()
{
device_pm_lock();
device_power_down(PMSG_FREEZE);
platform_pre_snapshot(platform_mode);

device_pm_unlock();
disable_nonboot_cpus()
device_pm_lock();
.
.
.
.
}



>
> > > [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> > > [ 2276.033138] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033141] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033145] [<ffffffff80250885>] flush_work+0x60/0xeb
> > > [ 2276.033148] [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> > > [ 2276.033151] [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> > > [ 2276.033157] [<ffffffffa02b443a>]
> > > get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> > > [ 2276.033161] [<ffffffffa02b4b69>]
> > > acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> > > [ 2276.033166] [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
>
> Connecting things the wrong way around. Have not looked deeply
> though - Peter, Rusty, what's your take? (below is the message
> again)
>
> Ingo
>
> > > [ 2276.031486] PM: Saving platform NVS memory
> > > [ 2276.032410] Disabling non-boot CPUs ...
> > > [ 2276.032879]
> > > [ 2276.032880] =======================================================
> > > [ 2276.032882] [ INFO: possible circular locking dependency detected ]
> > > [ 2276.032885] 2.6.29-08003-g8b58977 #84
> > > [ 2276.032886] -------------------------------------------------------
> > > [ 2276.032888] pm-hibernate/5061 is trying to acquire lock:
> > > [ 2276.032891] (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
> > > [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.032900]
> > > [ 2276.032900] but task is already holding lock:
> > > [ 2276.032902] (&cpu_hotplug.lock){+.+.+.}, at: [<ffffffff80240d38>]
> > > cpu_hotplug_begin+0x22/0x50
> > > [ 2276.032908]
> > > [ 2276.032909] which lock already depends on the new lock.
> > > [ 2276.032910]
> > > [ 2276.032911]
> > > [ 2276.032912] the existing dependency chain (in reverse order) is:
> > > [ 2276.032913]
> > > [ 2276.032914] -> #6 (&cpu_hotplug.lock){+.+.+.}:
> > > [ 2276.032918] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.032922] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.032926] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.032930] [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> > > [ 2276.032933] [<ffffffff804728b0>] _cpu_up+0x58/0x12f
> > > [ 2276.032937] [<ffffffff804729ea>] cpu_up+0x63/0x78
> > > [ 2276.032940] [<ffffffff806585cb>] kernel_init+0xbd/0x189
> > > [ 2276.032944] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > > [ 2276.032948] [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.032960]
> > > [ 2276.032961] -> #5 (cpu_add_remove_lock){+.+.+.}:
> > > [ 2276.032964] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.032968] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.032971] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.032974] [<ffffffff80240d02>] cpu_maps_update_begin+0x12/0x14
> > > [ 2276.032978] [<ffffffff80250526>] __create_workqueue_key+0xdc/0x1a2
> > > [ 2276.032982] [<ffffffff80276eaf>] stop_machine_create+0x3a/0x9b
> > > [ 2276.032987] [<ffffffff8026d959>] load_module+0x58/0x18c8
> > > [ 2276.032990] [<ffffffff8026f320>] sys_init_module+0x57/0x1cd
> > > [ 2276.032993] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.032998] [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033001]
> > > [ 2276.033001] -> #4 (setup_lock){+.+.+.}:
> > > [ 2276.033005] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033008] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033012] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.033015] [<ffffffff80276e87>] stop_machine_create+0x12/0x9b
> > > [ 2276.033018] [<ffffffff80240d7c>] disable_nonboot_cpus+0x16/0x120
> > > [ 2276.033022] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > > [ 2276.033025] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > > [ 2276.033028] [<ffffffff802705f0>] state_store+0x59/0xd7
> > > [ 2276.033032] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > > [ 2276.033036] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > > [ 2276.033040] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > > [ 2276.033044] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > > [ 2276.033047] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.033050] [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033053]
> > > [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> > > [ 2276.033057] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033061] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033065] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.033068] [<ffffffff803c4339>] device_pm_add+0x46/0xed
> > > [ 2276.033073] [<ffffffff803bdeee>] device_add+0x488/0x61a
> > > [ 2276.033077] [<ffffffff803be099>] device_register+0x19/0x1d
> > > [ 2276.033080] [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> > > [ 2276.033085] [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> > > [ 2276.033089] [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> > > [ 2276.033093] [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> > > [ 2276.033097] [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> > > [ 2276.033100] [<ffffffff80253d40>] kthread+0x49/0x76
> > > [ 2276.033104] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > > [ 2276.033107] [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033111]
> > > [ 2276.033111] -> #2 (&wfc.work){+.+.+.}:
> > > [ 2276.033114] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033118] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033122] [<ffffffff80250032>] worker_thread+0x1ac/0x2c9
> > > [ 2276.033125] [<ffffffff80253d40>] kthread+0x49/0x76
> > > [ 2276.033128] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > > [ 2276.033131] [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033134]
> > > [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> > > [ 2276.033138] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033141] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033145] [<ffffffff80250885>] flush_work+0x60/0xeb
> > > [ 2276.033148] [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> > > [ 2276.033151] [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> > > [ 2276.033157] [<ffffffffa02b443a>]
> > > get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> > > [ 2276.033161] [<ffffffffa02b4b69>]
> > > acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> > > [ 2276.033166] [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
> > > [ 2276.033169] [<ffffffff803bea90>] sysdev_driver_register+0xc6/0x120
> > > [ 2276.033173] [<ffffffff803da5c1>] cpufreq_register_driver+0xbc/0x1ae
> > > [ 2276.033177] [<ffffffffa00c60c5>] 0xffffffffa00c60c5
> > > [ 2276.033180] [<ffffffff80209076>] do_one_initcall+0x70/0x185
> > > [ 2276.033183] [<ffffffff8026f374>] sys_init_module+0xab/0x1cd
> > > [ 2276.033187] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.033190] [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033197]
> > > [ 2276.033197] -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
> > > [ 2276.033201] [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> > > [ 2276.033204] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033208] [<ffffffff80476d6c>] down_write+0x43/0x77
> > > [ 2276.033211] [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.033215] [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> > > [ 2276.033218] [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> > > [ 2276.033222] [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> > > [ 2276.033226] [<ffffffff80465097>] _cpu_down+0x86/0x298
> > > [ 2276.033229] [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> > > [ 2276.033233] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > > [ 2276.033236] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > > [ 2276.033239] [<ffffffff802705f0>] state_store+0x59/0xd7
> > > [ 2276.033242] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > > [ 2276.033246] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > > [ 2276.033249] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > > [ 2276.033252] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > > [ 2276.033255] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.033259] [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033262]
> > > [ 2276.033262] other info that might help us debug this:
> > > [ 2276.033263]
> > > [ 2276.033265] 5 locks held by pm-hibernate/5061:
> > > [ 2276.033267] #0: (&buffer->mutex){+.+.+.}, at:
> > > [<ffffffff8030f724>] sysfs_write_file+0x38/0x119
> > > [ 2276.033273] #1: (pm_mutex){+.+.+.}, at: [<ffffffff80271abe>]
> > > hibernate+0x17/0x1cf
> > > [ 2276.033278] #2: (dpm_list_mtx){+.+.+.}, at: [<ffffffff803c35c5>]
> > > device_pm_lock+0x12/0x14
> > > [ 2276.033284] #3: (cpu_add_remove_lock){+.+.+.}, at:
> > > [<ffffffff80240d99>] disable_nonboot_cpus+0x33/0x120
> > > [ 2276.033289] #4: (&cpu_hotplug.lock){+.+.+.}, at:
> > > [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> > > [ 2276.033294]
> > > [ 2276.033295] stack backtrace:
> > > [ 2276.033297] Pid: 5061, comm: pm-hibernate Not tainted
> > > 2.6.29-08003-g8b58977 #84
> > > [ 2276.033300] Call Trace:
> > > [ 2276.033304] [<ffffffff80263cbf>] print_circular_bug_tail+0xc5/0xd0
> > > [ 2276.033307] [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> > > [ 2276.033312] [<ffffffff802382b0>] ? sub_preempt_count+0xdc/0xf1
> > > [ 2276.033315] [<ffffffff80263470>] ? trace_hardirqs_on+0xd/0xf
> > > [ 2276.033319] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033322] [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.033326] [<ffffffff80476d6c>] down_write+0x43/0x77
> > > [ 2276.033329] [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.033333] [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.033337] [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> > > [ 2276.033340] [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> > > [ 2276.033344] [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> > > [ 2276.033347] [<ffffffff80465097>] _cpu_down+0x86/0x298
> > > [ 2276.033350] [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> > > [ 2276.033350] [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > > [ 2276.033350] [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > > [ 2276.033350] [<ffffffff802705f0>] state_store+0x59/0xd7
> > > [ 2276.033350] [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > > [ 2276.033350] [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > > [ 2276.033350] [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > > [ 2276.033350] [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > > [ 2276.033350] [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.034478] Broke affinity for irq 21
> > > [ 2276.034478] Broke affinity for irq 22
> > > [ 2276.034478] Broke affinity for irq 29
> > > [ 2276.035812] CPU 1 is now offline
> > > [ 2276.035814] lockdep: fixing up alternatives.
> > > [ 2276.035816] SMP alternatives: switching to UP code
> > > [ 2276.158329] CPU0 attaching NULL sched-domain.
> > > [ 2276.158334] CPU1 attaching NULL sched-domain.
> > > [ 2276.158361] CPU0 attaching NULL sched-domain.
> > > [ 2276.158841] CPU1 is down
> > > [ 2276.158867] Checking wake-up interrupts
> > > [ 2276.158883] Suspending System Devices
> > > [ 2276.158885] Suspending type 'oprofile':
> > > [ 2276.158886] oprofile0
> > > [ 2276.158888] Suspending type 'edac':
> > > [ 2276.158889] Suspending type 'irqrouter':
> > > [ 2276.158891] irqrouter0
> > > [ 2276.158892] Suspending type 'clocksource':
> > > [ 2276.158894] clocksource0
> > > [ 2276.158895] Suspending type 'timekeeping':
> > > [ 2276.158897] timekeeping0
> > > [ 2276.158929] Extended CMOS year: 2000
> > > [ 2276.158937] Suspending type 'ioapic':
> > > [ 2276.158938] ioapic0
> > > [ 2276.159007] Suspending type 'lapic':
> > > [ 2276.159007] lapic0
> > > [ 2276.159007] Suspending type 'machinecheck':
> > > [ 2276.159007] machinecheck0
> > > [ 2276.159007] Suspending type 'i8237':
> > > [ 2276.159007] i82370
> > > [ 2276.159007] Suspending type 'i8259':
> > > [ 2276.159007] i82590
> > > [ 2276.159007] Suspending type 'node':
> > > [ 2276.159007] node0
> > > [ 2276.159007] Suspending type 'cpu':
> > > [ 2276.159007] cpu0
> > > [ 2276.159007] cpu1
> > > [ 2276.159007] PM: Creating hibernation image:
> > > [ 2276.159007] PM: Need to copy 112986 pages
> > > [ 2276.159007] PM: Restoring platform NVS memory
> > > [ 2276.159007] Resuming System Devices
> > > [ 2276.159007] Resuming type 'cpu':
> > > [ 2276.159007] cpu0
> > > [ 2276.159007] cpu1
> > > [ 2276.159007] Resuming type 'node':
> > > [ 2276.159007] node0
> > > [ 2276.159007] Resuming type 'i8259':
> > > [ 2276.159007] i82590
> > > [ 2276.159007] Resuming type 'i8237':
> > > [ 2276.159007] i82370
> > > [ 2276.159007] Resuming type 'machinecheck':
> > > [ 2276.159007] machinecheck0
> > > [ 2276.159007] CPU0: Thermal LVT vector (0xfa) already installed
> > > [ 2276.159007] Resuming type 'lapic':
> > > [ 2276.159007] lapic0
> > > [ 2276.159007] Resuming type 'ioapic':
> > > [ 2276.159007] ioapic0
> > > [ 2276.159007] Resuming type 'timekeeping':
> > > [ 2276.159007] timekeeping0
> > > [ 2276.159007] Extended CMOS year: 2000
> > > [ 2276.159007] Resuming type 'clocksource':
> > > [ 2276.159007] clocksource0
> > > [ 2276.159007] Resuming type 'irqrouter':
> > > [ 2276.159007] irqrouter0
> > > [ 2276.159007] Resuming type 'edac':
> > > [ 2276.159007] Resuming type 'oprofile':
> > > [ 2276.159007] oprofile0

--
Thanks and Regards
gautham

2009-04-06 13:30:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

On Monday 06 April 2009, Gautham R Shenoy wrote:
> On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> >
> > * Rafael J. Wysocki <[email protected]> wrote:
> >
> > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > kernel version : one simple usb-serial patch against commit
> > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > >
> > > > Thanks.
> > >
> > > Hmm, CPU hotplug again, it seems.
> > >
> > > I'm not sure who's the maintainer at the moment. Andrew, is that
> > > Gautham?
> >
> > CPU hotplug tends to land on the scheduler people's desk normally.
> >
> > But i'm not sure that's the real thing here - key appears to be this
> > work_on_cpu() worklet by the cpufreq code:
>
> Actually, there are two dependency chains here which can lead to a deadlock.
> The one we're seeing here is the longer of the two.
>
> If the relevant locks are numbered as follows:
> [1]: cpu_policy_rwsem
> [2]: work_on_cpu
> [3]: cpu_hotplug.lock
> [4]: dpm_list_mtx
>
>
> The individual callpaths are:
>
> 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> |
> work_on_cpu()[2] <-- get_measured_perf() <--|
>
>
> 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> |
> [4] device_pm_add() <-- ..<-- local_pci_probe() <--|

This should block on [4] held by hibernate(). That's why it calls
device_pm_lock() after all.

> 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> |
> disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> |
> |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
>
>
> The two chains which can deadlock are
>
> a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> and
> b) [3] --> [2] --> [4] --> [3]

What exactly is the b) scenario?

> Ingo,
> do_dbs_timer() function of the ondemand governor is run from a per-cpu
> workqueue. Hence it is already running on the cpu whose perf counters
> we're interested in.
>
> Does it make sense to introduce a get_this_measured_perf() API
> for users who are already running on the relevant CPU ?
> And have get_measured_perf(cpu) for other users (currently there are
> none) ?
>
> Thus, do_dbs_timer() can avoid calling work_on_cpu() thereby preventing
> deadlock a) from occuring.
>
> Rafael,
> Sorry, I am not well versed with the hibernation code. But does the
> following make sense:

Not really ->

> create_image()
> {
> device_pm_lock();
> device_power_down(PMSG_FREEZE);
> platform_pre_snapshot(platform_mode);
>
> device_pm_unlock();

-> because dpm_list is under control of the hibernation code at this point
and it should remain locked.

> disable_nonboot_cpus()

disable_nonboot_cpus() must not take dpm_list_mtx itself.

> device_pm_lock();
> .
> .
> .
> .
> }

Thanks,
Rafael

Subject: Re: pm-hibernate : possible circular locking dependency detected

On Mon, Apr 06, 2009 at 03:29:43PM +0200, Rafael J. Wysocki wrote:
> On Monday 06 April 2009, Gautham R Shenoy wrote:
> > On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > > kernel version : one simple usb-serial patch against commit
> > > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > >
> > > > > Thanks.
> > > >
> > > > Hmm, CPU hotplug again, it seems.
> > > >
> > > > I'm not sure who's the maintainer at the moment. Andrew, is that
> > > > Gautham?
> > >
> > > CPU hotplug tends to land on the scheduler people's desk normally.
> > >
> > > But i'm not sure that's the real thing here - key appears to be this
> > > work_on_cpu() worklet by the cpufreq code:
> >
> > Actually, there are two dependency chains here which can lead to a deadlock.
> > The one we're seeing here is the longer of the two.
> >
> > If the relevant locks are numbered as follows:
> > [1]: cpu_policy_rwsem
> > [2]: work_on_cpu
> > [3]: cpu_hotplug.lock
> > [4]: dpm_list_mtx
> >
> >
> > The individual callpaths are:
> >
> > 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> > |
> > work_on_cpu()[2] <-- get_measured_perf() <--|
> >
> >
> > 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> > |
> > [4] device_pm_add() <-- ..<-- local_pci_probe() <--|
>
> This should block on [4] held by hibernate(). That's why it calls
> device_pm_lock() after all.

Agreed. But it does so holding the cpu_hotplug.lock at pci_call_probe().
See below.

>
> > 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> > |
> > disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> > |
> > |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> >
> >
> > The two chains which can deadlock are
> >
> > a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> > and
> > b) [3] --> [2] --> [4] --> [3]
>
> What exactly is the b) scenario?

pci_call_probe() calls work_on_cpu() within
get_online_cpus()/put_online_cpus(), the cpu hotplug read path.
Thus we have a cpu_hotplug.lock --> work_on_cpu dependency here.

This work_on_cpu() calls local_pci_probe() which, in one of the
registration paths calls pcie_portdrv_probe(). This would
eventually end up calling device_pm_add() which takes the
dpm_list_mtx. Thus we have a work_on_cpu --> dpm_list_mtx
dependency here. This is reflected in the lockdep log for dpm_list_mtx:

> > [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> > [ 2276.033057] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033061] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033065] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.033068] [<ffffffff803c4339>] device_pm_add+0x46/0xed
> > [ 2276.033073] [<ffffffff803bdeee>] device_add+0x488/0x61a
> > [ 2276.033077] [<ffffffff803be099>] device_register+0x19/0x1d
> > [ 2276.033080] [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> > [ 2276.033085] [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> > [ 2276.033089] [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> > [ 2276.033093] [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> > [ 2276.033097] [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> > [ 2276.033100] [<ffffffff80253d40>] kthread+0x49/0x76
> > [ 2276.033104] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > [ 2276.033107] [<ffffffffffffffff>] 0xffffffffffffffff

The dependency chain on this device_registration path would be
cpu_hotplug.lock --> work_on_cpu --> dpm_list_mtx.

On the hibernate path, we hold the dpm_list_mtx and call
disable_nonboot_cpus() in create_image().
disable_nonboot_cpus() calls _cpu_down() which again takes the
cpu_hotplug.lock, this time the write-path. Thus we have a
dpm_list_mtx --> cpu_hotplug.lock dependency here.

These two dependency chains being in reverse order can cause a
dead-lock, right ? Or am I reading something wrong here?

>
> >
> > Rafael,
> > Sorry, I am not well versed with the hibernation code. But does the
> > following make sense:
>
> Not really ->
>
> > create_image()
> > {
> > device_pm_lock();
> > device_power_down(PMSG_FREEZE);
> > platform_pre_snapshot(platform_mode);
> >
> > device_pm_unlock();
>
> -> because dpm_list is under control of the hibernation code at this point
> and it should remain locked.
>
> > disable_nonboot_cpus()
>
> disable_nonboot_cpus() must not take dpm_list_mtx itself.
>
> > device_pm_lock();
> > .
> > .
> > .
> > .
> > }
>
> Thanks,
> Rafael

--
Thanks and Regards
gautham

2009-04-06 14:37:43

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] pm-hibernate : possible circular locking dependency detected

On Mon, 6 Apr 2009, Rafael J. Wysocki wrote:

> On Monday 06 April 2009, Gautham R Shenoy wrote:
> > On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > > kernel version : one simple usb-serial patch against commit
> > > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > >
> > > > > Thanks.
> > > >
> > > > Hmm, CPU hotplug again, it seems.
> > > >
> > > > I'm not sure who's the maintainer at the moment. Andrew, is that
> > > > Gautham?
> > >
> > > CPU hotplug tends to land on the scheduler people's desk normally.
> > >
> > > But i'm not sure that's the real thing here - key appears to be this
> > > work_on_cpu() worklet by the cpufreq code:
> >
> > Actually, there are two dependency chains here which can lead to a deadlock.
> > The one we're seeing here is the longer of the two.
> >
> > If the relevant locks are numbered as follows:
> > [1]: cpu_policy_rwsem
> > [2]: work_on_cpu
> > [3]: cpu_hotplug.lock
> > [4]: dpm_list_mtx
> >
> >
> > The individual callpaths are:
> >
> > 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> > |
> > work_on_cpu()[2] <-- get_measured_perf() <--|
> >
> >
> > 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> > |
> > [4] device_pm_add() <-- ..<-- local_pci_probe() <--|
>
> This should block on [4] held by hibernate(). That's why it calls
> device_pm_lock() after all.
>
> > 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> > |
> > disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> > |
> > |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> >
> >
> > The two chains which can deadlock are
> >
> > a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> > and
> > b) [3] --> [2] --> [4] --> [3]
>
> What exactly is the b) scenario?

If I understand correctly it isn't really a deadlock scenario, but it
is a lockdep violation. The violation is:

The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
be acquired while cpu_hotplug.lock [3] is held;

The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
acquired while dpm_list_mtx [4] is held.

The two pathways cannot run simultaneously (and hence cannot deadlock)
because the prepare() stage of hibernation is supposed to stop all
device probing. But lockdep will still report a problem.

Alan Stern

Subject: Re: [linux-pm] pm-hibernate : possible circular locking dependency detected

On Mon, Apr 06, 2009 at 10:37:10AM -0400, Alan Stern wrote:
> On Mon, 6 Apr 2009, Rafael J. Wysocki wrote:
>
> > On Monday 06 April 2009, Gautham R Shenoy wrote:
> > > On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > > >
> > > > * Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > > > kernel version : one simple usb-serial patch against commit
> > > > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > Hmm, CPU hotplug again, it seems.
> > > > >
> > > > > I'm not sure who's the maintainer at the moment. Andrew, is that
> > > > > Gautham?
> > > >
> > > > CPU hotplug tends to land on the scheduler people's desk normally.
> > > >
> > > > But i'm not sure that's the real thing here - key appears to be this
> > > > work_on_cpu() worklet by the cpufreq code:
> > >
> > > Actually, there are two dependency chains here which can lead to a deadlock.
> > > The one we're seeing here is the longer of the two.
> > >
> > > If the relevant locks are numbered as follows:
> > > [1]: cpu_policy_rwsem
> > > [2]: work_on_cpu
> > > [3]: cpu_hotplug.lock
> > > [4]: dpm_list_mtx
> > >
> > >
> > > The individual callpaths are:
> > >
> > > 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> > > |
> > > work_on_cpu()[2] <-- get_measured_perf() <--|
> > >
> > >
> > > 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> > > |
> > > [4] device_pm_add() <-- ..<-- local_pci_probe() <--|
> >
> > This should block on [4] held by hibernate(). That's why it calls
> > device_pm_lock() after all.
> >
> > > 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> > > |
> > > disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> > > |
> > > |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> > >
> > >
> > > The two chains which can deadlock are
> > >
> > > a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> > > and
> > > b) [3] --> [2] --> [4] --> [3]
> >
> > What exactly is the b) scenario?
>
> If I understand correctly it isn't really a deadlock scenario, but it
> is a lockdep violation. The violation is:
>
> The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
> be acquired while cpu_hotplug.lock [3] is held;
>
> The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
> acquired while dpm_list_mtx [4] is held.
>
> The two pathways cannot run simultaneously (and hence cannot deadlock)
> because the prepare() stage of hibernation is supposed to stop all
> device probing. But lockdep will still report a problem.

Thanks for clarifying this Alan. I guess it boils down to teaching
lockdep about this false-positive.

>
> Alan Stern

--
Thanks and Regards
gautham

2009-04-06 15:25:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

On Monday 06 April 2009, Gautham R Shenoy wrote:
> On Mon, Apr 06, 2009 at 03:29:43PM +0200, Rafael J. Wysocki wrote:
> > On Monday 06 April 2009, Gautham R Shenoy wrote:
> > > On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > > >
> > > > * Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > > > kernel version : one simple usb-serial patch against commit
> > > > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > Hmm, CPU hotplug again, it seems.
> > > > >
> > > > > I'm not sure who's the maintainer at the moment. Andrew, is that
> > > > > Gautham?
> > > >
> > > > CPU hotplug tends to land on the scheduler people's desk normally.
> > > >
> > > > But i'm not sure that's the real thing here - key appears to be this
> > > > work_on_cpu() worklet by the cpufreq code:
> > >
> > > Actually, there are two dependency chains here which can lead to a deadlock.
> > > The one we're seeing here is the longer of the two.
> > >
> > > If the relevant locks are numbered as follows:
> > > [1]: cpu_policy_rwsem
> > > [2]: work_on_cpu
> > > [3]: cpu_hotplug.lock
> > > [4]: dpm_list_mtx
> > >
> > >
> > > The individual callpaths are:
> > >
> > > 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> > > |
> > > work_on_cpu()[2] <-- get_measured_perf() <--|
> > >
> > >
> > > 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> > > |
> > > [4] device_pm_add() <-- ..<-- local_pci_probe() <--|
> >
> > This should block on [4] held by hibernate(). That's why it calls
> > device_pm_lock() after all.
>
> Agreed. But it does so holding the cpu_hotplug.lock at pci_call_probe().
> See below.
>
> >
> > > 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> > > |
> > > disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> > > |
> > > |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> > >
> > >
> > > The two chains which can deadlock are
> > >
> > > a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> > > and
> > > b) [3] --> [2] --> [4] --> [3]
> >
> > What exactly is the b) scenario?
>
> pci_call_probe() calls work_on_cpu() within
> get_online_cpus()/put_online_cpus(), the cpu hotplug read path.
> Thus we have a cpu_hotplug.lock --> work_on_cpu dependency here.
>
> This work_on_cpu() calls local_pci_probe() which, in one of the
> registration paths calls pcie_portdrv_probe(). This would
> eventually end up calling device_pm_add() which takes the
> dpm_list_mtx. Thus we have a work_on_cpu --> dpm_list_mtx
> dependency here. This is reflected in the lockdep log for dpm_list_mtx:
>
> > > [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> > > [ 2276.033057] [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033061] [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033065] [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.033068] [<ffffffff803c4339>] device_pm_add+0x46/0xed
> > > [ 2276.033073] [<ffffffff803bdeee>] device_add+0x488/0x61a
> > > [ 2276.033077] [<ffffffff803be099>] device_register+0x19/0x1d
> > > [ 2276.033080] [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> > > [ 2276.033085] [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> > > [ 2276.033089] [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> > > [ 2276.033093] [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> > > [ 2276.033097] [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> > > [ 2276.033100] [<ffffffff80253d40>] kthread+0x49/0x76
> > > [ 2276.033104] [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > > [ 2276.033107] [<ffffffffffffffff>] 0xffffffffffffffff
>
> The dependency chain on this device_registration path would be
> cpu_hotplug.lock --> work_on_cpu --> dpm_list_mtx.
>
> On the hibernate path, we hold the dpm_list_mtx and call
> disable_nonboot_cpus() in create_image().
> disable_nonboot_cpus() calls _cpu_down() which again takes the
> cpu_hotplug.lock, this time the write-path. Thus we have a
> dpm_list_mtx --> cpu_hotplug.lock dependency here.
>
> These two dependency chains being in reverse order can cause a
> dead-lock, right ? Or am I reading something wrong here?

Please read the Alan Stern's reply to my message.

In short, the two code paths that cause the lockdep violation to happen
cannot run siumltaneously, because the prepare() stage of hibernation blocks
device probing altogether. So, this is a lockdep violation, but not a deadlock
scenario.

Thanks,
Rafael

2009-04-06 18:42:37

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] pm-hibernate : possible circular locking dependency detected

On Mon, 6 Apr 2009, Gautham R Shenoy wrote:

> > If I understand correctly it isn't really a deadlock scenario, but it
> > is a lockdep violation. The violation is:
> >
> > The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
> > be acquired while cpu_hotplug.lock [3] is held;
> >
> > The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
> > acquired while dpm_list_mtx [4] is held.
> >
> > The two pathways cannot run simultaneously (and hence cannot deadlock)
> > because the prepare() stage of hibernation is supposed to stop all
> > device probing. But lockdep will still report a problem.
>
> Thanks for clarifying this Alan. I guess it boils down to teaching
> lockdep about this false-positive.

Or else changing the code somehow to avoid the violation completely.
But I have no idea how... And AFAIK, teaching lockdep about special
cases like this is not so easy to do.

Alan Stern

2009-04-06 19:58:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] pm-hibernate : possible circular locking dependency detected

On Monday 06 April 2009, Alan Stern wrote:
> On Mon, 6 Apr 2009, Gautham R Shenoy wrote:
>
> > > If I understand correctly it isn't really a deadlock scenario, but it
> > > is a lockdep violation. The violation is:
> > >
> > > The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
> > > be acquired while cpu_hotplug.lock [3] is held;
> > >
> > > The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
> > > acquired while dpm_list_mtx [4] is held.
> > >
> > > The two pathways cannot run simultaneously (and hence cannot deadlock)
> > > because the prepare() stage of hibernation is supposed to stop all
> > > device probing. But lockdep will still report a problem.
> >
> > Thanks for clarifying this Alan. I guess it boils down to teaching
> > lockdep about this false-positive.
>
> Or else changing the code somehow to avoid the violation completely.
> But I have no idea how... And AFAIK, teaching lockdep about special
> cases like this is not so easy to do.

Yeah, I've just wanted to ask about that. Peter, how can we do it?

Rafael

2009-04-07 04:26:49

by Rusty Russell

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

On Sunday 05 April 2009 23:14:54 Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > On Sunday 05 April 2009, Ming Lei wrote:
> > > kernel version : one simple usb-serial patch against commit
> > > 6bb597507f9839b13498781e481f5458aea33620.
> > >
> > > Thanks.
> >
> > Hmm, CPU hotplug again, it seems.
> >
> > I'm not sure who's the maintainer at the moment. Andrew, is that
> > Gautham?
>
> CPU hotplug tends to land on the scheduler people's desk normally.
>
> But i'm not sure that's the real thing here - key appears to be this
> work_on_cpu() worklet by the cpufreq code:

Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
patch which I just put out the pull request for.

Cheers,
Rusty.

2009-04-07 07:05:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> On Sunday 05 April 2009 23:14:54 Ingo Molnar wrote:
> >
> > * Rafael J. Wysocki <[email protected]> wrote:
> >
> > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > kernel version : one simple usb-serial patch against commit
> > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > >
> > > > Thanks.
> > >
> > > Hmm, CPU hotplug again, it seems.
> > >
> > > I'm not sure who's the maintainer at the moment. Andrew, is that
> > > Gautham?
> >
> > CPU hotplug tends to land on the scheduler people's desk normally.
> >
> > But i'm not sure that's the real thing here - key appears to be this
> > work_on_cpu() worklet by the cpufreq code:
>
> Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> patch which I just put out the pull request for.

Would it make sense to teach it about a short-circuit like:

work_on_cpu() {

if (cpumask_weight(current->cpus_allowed) == 1 &&
smp_processor_id() == cpu)
return do_work_right_here();

queue_the_bugger();
}

?

2009-04-07 08:43:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [linux-pm] pm-hibernate : possible circular locking dependency detected

On Mon, 2009-04-06 at 21:58 +0200, Rafael J. Wysocki wrote:
> On Monday 06 April 2009, Alan Stern wrote:
> > On Mon, 6 Apr 2009, Gautham R Shenoy wrote:
> >
> > > > If I understand correctly it isn't really a deadlock scenario, but it
> > > > is a lockdep violation. The violation is:
> > > >
> > > > The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
> > > > be acquired while cpu_hotplug.lock [3] is held;
> > > >
> > > > The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
> > > > acquired while dpm_list_mtx [4] is held.
> > > >
> > > > The two pathways cannot run simultaneously (and hence cannot deadlock)
> > > > because the prepare() stage of hibernation is supposed to stop all
> > > > device probing. But lockdep will still report a problem.
> > >
> > > Thanks for clarifying this Alan. I guess it boils down to teaching
> > > lockdep about this false-positive.
> >
> > Or else changing the code somehow to avoid the violation completely.
> > But I have no idea how... And AFAIK, teaching lockdep about special
> > cases like this is not so easy to do.
>
> Yeah, I've just wanted to ask about that. Peter, how can we do it?

I think it would come down to modeling that blocking of probes as a lock
or something -- because that's basically what it is.

So on the regular probe path, take a read lock of this lock, and on the
suspend path take it as write or something.

2009-04-08 03:17:51

by Rusty Russell

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > patch which I just put out the pull request for.
>
> Would it make sense to teach it about a short-circuit like:
>
> work_on_cpu() {
>
> if (cpumask_weight(current->cpus_allowed) == 1 &&
> smp_processor_id() == cpu)
> return do_work_right_here();

Does that happen much? I guess put a counter in and see?

Rusty.

2009-04-08 12:27:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected


* Rusty Russell <[email protected]> wrote:

> On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> > On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > > patch which I just put out the pull request for.
> >
> > Would it make sense to teach it about a short-circuit like:
> >
> > work_on_cpu() {
> >
> > if (cpumask_weight(current->cpus_allowed) == 1 &&
> > smp_processor_id() == cpu)
> > return do_work_right_here();
>
> Does that happen much? I guess put a counter in and see?

a temporary tracepoint or trace_printk() for the workqueue tracer
would also tell this, without any long-term overhead (it will be
easy to remove it).

Ingo

2009-04-08 12:46:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

On Wed, 2009-04-08 at 12:47 +0930, Rusty Russell wrote:
> On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> > On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > > patch which I just put out the pull request for.
> >
> > Would it make sense to teach it about a short-circuit like:
> >
> > work_on_cpu() {
> >
> > if (cpumask_weight(current->cpus_allowed) == 1 &&
> > smp_processor_id() == cpu)
> > return do_work_right_here();
>
> Does that happen much? I guess put a counter in and see?

Ego spotted the case where cpufreq calls it from an cpu-affine
workqueue, it seems to me in that case its desirable to have the
short-cut, and currently that's needed for correctness too as it will
generate this circular lock thingy.

2009-04-08 23:45:32

by Rusty Russell

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

On Wednesday 08 April 2009 22:18:26 Peter Zijlstra wrote:
> On Wed, 2009-04-08 at 12:47 +0930, Rusty Russell wrote:
> > On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> > > On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > > > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > > > patch which I just put out the pull request for.
> > >
> > > Would it make sense to teach it about a short-circuit like:
> > >
> > > work_on_cpu() {
> > >
> > > if (cpumask_weight(current->cpus_allowed) == 1 &&
> > > smp_processor_id() == cpu)
> > > return do_work_right_here();
> >
> > Does that happen much? I guess put a counter in and see?
>
> Ego spotted the case where cpufreq calls it from an cpu-affine
> workqueue, it seems to me in that case its desirable to have the
> short-cut, and currently that's needed for correctness too as it will
> generate this circular lock thingy.

Well, the correctness issue is fixed by Andrew's work_on_cpu-via-new-thread
patch (hmm, which Linus hasn't taken, re-xmitting).

So it's really a pure optimization.

Cheers,
Rusty.

2009-04-09 04:18:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected


* Rusty Russell <[email protected]> wrote:

> On Wednesday 08 April 2009 22:18:26 Peter Zijlstra wrote:
> > On Wed, 2009-04-08 at 12:47 +0930, Rusty Russell wrote:
> > > On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> > > > On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > > > > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > > > > patch which I just put out the pull request for.
> > > >
> > > > Would it make sense to teach it about a short-circuit like:
> > > >
> > > > work_on_cpu() {
> > > >
> > > > if (cpumask_weight(current->cpus_allowed) == 1 &&
> > > > smp_processor_id() == cpu)
> > > > return do_work_right_here();
> > >
> > > Does that happen much? I guess put a counter in and see?
> >
> > Ego spotted the case where cpufreq calls it from an cpu-affine
> > workqueue, it seems to me in that case its desirable to have the
> > short-cut, and currently that's needed for correctness too as it will
> > generate this circular lock thingy.
>
> Well, the correctness issue is fixed by Andrew's
> work_on_cpu-via-new-thread patch (hmm, which Linus hasn't taken,
> re-xmitting).

That's now upstream as per:

6b44003: work_on_cpu(): rewrite it to create a kernel thread on demand

So re-checking whether the warning still triggers with latest -git
would be nice.

Ingo

2009-04-09 14:45:46

by Ming Lei

[permalink] [raw]
Subject: Re: pm-hibernate : possible circular locking dependency detected

2009/4/9 Ingo Molnar <[email protected]>:
>
> That's now upstream as per:
>
> ?6b44003: work_on_cpu(): rewrite it to create a kernel thread on demand
>
> So re-checking whether the warning still triggers with latest -git
> would be nice.

I don't find this warning any longer during pm-hibernate 2.6.30-rc1, which
have __not__ include 6b44003.

>
> ? ? ? ?Ingo
>



--
Lei Ming