2010-11-02 21:50:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] One more power management fix for 2.6.37

On Fri, Oct 29, 2010 at 5:58 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> Please pull one more power management fix for 2.6.37 from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-fixes
>
> It fixes a regression in the core I/O runtime PM code.

I think we have more. It may be the driver core, though. So I added
GregKH to the recipients too...

On resume-from-ram with basically current -git (-rc1 + four patches):

...
ata1.01: configured for MWDMA2
ata1: EH complete
PM: resume of devices complete after 3240.438 msecs
------------[ cut here ]------------
WARNING: at lib/kref.c:34 kref_get+0x23/0x2c()
Hardware name: HP Compaq 2510p Notebook PC
Modules linked in: iwlagn [last unloaded: scsi_wait_scan]
Pid: 7985, comm: pm-suspend Not tainted 2.6.37-rc1-00004-geb8abb9 #11
Call Trace:
[<ffffffff81036082>] warn_slowpath_common+0x80/0x98
[<ffffffff810360af>] warn_slowpath_null+0x15/0x17
[<ffffffff8120001f>] kref_get+0x23/0x2c
[<ffffffff811fee1b>] kobject_get+0x1a/0x21
[<ffffffff812d84bb>] get_device+0x14/0x1a
[<ffffffff812dfcd5>] dpm_resume_end+0x230/0x37c
[<ffffffff81060a09>] suspend_devices_and_enter+0x158/0x188
[<ffffffff81060b04>] enter_state+0xcb/0xcf
[<ffffffff810602cf>] state_store+0xa7/0xc6
[<ffffffff811fec2b>] kobj_attr_store+0x17/0x19
[<ffffffff810f75dc>] sysfs_write_file+0xf2/0x12e
[<ffffffff810ab99c>] vfs_write+0xb0/0x12f
[<ffffffff810abbf8>] sys_write+0x45/0x6c
[<ffffffff81001fab>] system_call_fastpath+0x16/0x1b
---[ end trace af18256edd598c9c ]---

Any ideas? I incuded the "ata1:..." lines, but the timestamps are actually

...
[11627.776490] ata1: EH complete
[11629.384719] PM: resume of devices complete after 3240.438 msecs
[11629.400284] ------------[ cut here ]------------
...

so it's a second and a half after that ata1 resume EH complete
message, and a bit after it says that it's completed all device
resumes.

This oops is then followed by a lot of other oopses,most of which
didn't get captured because the box hung afterwards. But the next oops
was in kmem_cache_alloc(), so I think it's because the device
refcounts were bad and had caused slab corruption when being freed too
early or something. So I think the other oopses are all a result of
this kref problem.

Hmm?

Linus


2010-11-03 02:57:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [GIT PULL] One more power management fix for 2.6.37

On Tuesday, November 02, 2010, Linus Torvalds wrote:
> On Fri, Oct 29, 2010 at 5:58 PM, Rafael J. Wysocki <[email protected]> wrote:
> >
> > Please pull one more power management fix for 2.6.37 from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-fixes
> >
> > It fixes a regression in the core I/O runtime PM code.
>
> I think we have more. It may be the driver core, though. So I added
> GregKH to the recipients too...
>
> On resume-from-ram with basically current -git (-rc1 + four patches):
>
> ...
> ata1.01: configured for MWDMA2
> ata1: EH complete
> PM: resume of devices complete after 3240.438 msecs
> ------------[ cut here ]------------
> WARNING: at lib/kref.c:34 kref_get+0x23/0x2c()
> Hardware name: HP Compaq 2510p Notebook PC
> Modules linked in: iwlagn [last unloaded: scsi_wait_scan]
> Pid: 7985, comm: pm-suspend Not tainted 2.6.37-rc1-00004-geb8abb9 #11
> Call Trace:
> [<ffffffff81036082>] warn_slowpath_common+0x80/0x98
> [<ffffffff810360af>] warn_slowpath_null+0x15/0x17
> [<ffffffff8120001f>] kref_get+0x23/0x2c
> [<ffffffff811fee1b>] kobject_get+0x1a/0x21
> [<ffffffff812d84bb>] get_device+0x14/0x1a
> [<ffffffff812dfcd5>] dpm_resume_end+0x230/0x37c
> [<ffffffff81060a09>] suspend_devices_and_enter+0x158/0x188
> [<ffffffff81060b04>] enter_state+0xcb/0xcf
> [<ffffffff810602cf>] state_store+0xa7/0xc6
> [<ffffffff811fec2b>] kobj_attr_store+0x17/0x19
> [<ffffffff810f75dc>] sysfs_write_file+0xf2/0x12e
> [<ffffffff810ab99c>] vfs_write+0xb0/0x12f
> [<ffffffff810abbf8>] sys_write+0x45/0x6c
> [<ffffffff81001fab>] system_call_fastpath+0x16/0x1b
> ---[ end trace af18256edd598c9c ]---
>
> Any ideas? I incuded the "ata1:..." lines, but the timestamps are actually

Not at the moment. I don't think this failure is related to the runtime PM code,
though.

> ...
> [11627.776490] ata1: EH complete
> [11629.384719] PM: resume of devices complete after 3240.438 msecs
> [11629.400284] ------------[ cut here ]------------
> ...
>
> so it's a second and a half after that ata1 resume EH complete
> message, and a bit after it says that it's completed all device
> resumes.
>
> This oops is then followed by a lot of other oopses,most of which
> didn't get captured because the box hung afterwards. But the next oops
> was in kmem_cache_alloc(), so I think it's because the device
> refcounts were bad and had caused slab corruption when being freed too
> early or something. So I think the other oopses are all a result of
> this kref problem.
>
> Hmm?

Can you boot with initcall_debug and try to suspend, please? That should tell
us what device this actually happens to.

I don't even think it's necessary to suspend, it should be sufficient to do

# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state

Let's see if that reproduces the problem.

Thanks,
Rafael

2010-11-03 17:41:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] One more power management fix for 2.6.37

On Tue, Nov 2, 2010 at 10:56 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> Can you boot with initcall_debug and try to suspend, please? ?That should tell
> us what device this actually happens to.

It's so far only happened that once. I'm running now with
DEBUG_PAGEALLOC, DEBUG_SLUB_ON, and with initcall_debug in case it
triggers again..

> I don't even think it's necessary to suspend, it should be sufficient to do
>
> # echo devices > /sys/power/pm_test
> # echo mem > /sys/power/state
>
> Let's see if that reproduces the problem.

I'll try that a few times too.

Linus

2010-11-03 18:37:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] One more power management fix for 2.6.37

On Wed, Nov 3, 2010 at 1:40 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Nov 2, 2010 at 10:56 PM, Rafael J. Wysocki <[email protected]> wrote:
>>
>> Can you boot with initcall_debug and try to suspend, please? ?That should tell
>> us what device this actually happens to.
>
> It's so far only happened that once. I'm running now with
> DEBUG_PAGEALLOC, DEBUG_SLUB_ON, and with initcall_debug in case it
> triggers again..

Well, _that_ particular problem didn't trigger again, but I've seen
another one instead...

There's apparently an ordering problem with dpm_list_mtx and
socket->skt_mutex. Lockdep details appended.

Dominik, Rafael? What's the proper locking order here, and how do we fix this?

Linus

---
[ 1033.118734] calling pcmcia_socket0+ @ 4009, parent: 0000:02:06.0
[ 1033.118762]
[ 1033.118764] =======================================================
[ 1033.118770] [ INFO: possible circular locking dependency detected ]
[ 1033.118777] 2.6.37-rc1-00005-gd88c092 #13
[ 1033.118781] -------------------------------------------------------
[ 1033.118786] bash/4009 is trying to acquire lock:
[ 1033.118791] (&socket->skt_mutex){+.+.+.}, at: [<ffffffff81391aa9>]
__pcmcia_pm_op+0x29/0x48
[ 1033.118811]
[ 1033.118813] but task is already holding lock:
[ 1033.118818] (dpm_list_mtx){+.+...}, at: [<ffffffff8130f102>]
dpm_suspend_noirq+0x29/0x14f
[ 1033.118833]
[ 1033.118835] which lock already depends on the new lock.
[ 1033.118838]
[ 1033.118842]
[ 1033.118844] the existing dependency chain (in reverse order) is:
[ 1033.118849]
[ 1033.118851] -> #1 (dpm_list_mtx){+.+...}:
[ 1033.118860] [<ffffffff81063cf8>] lock_acquire+0x86/0x9e
[ 1033.118871] [<ffffffff815c16c6>] mutex_lock_nested+0x5e/0x305
[ 1033.118884] [<ffffffff8130fab1>] device_pm_add+0x1a/0xb3
[ 1033.118893] [<ffffffff81308e25>] device_add+0x323/0x4d3
[ 1033.118902] [<ffffffff81308fee>] device_register+0x19/0x1e
[ 1033.118910] [<ffffffff8139392a>] pcmcia_device_add+0x2eb/0x392
[ 1033.118920] [<ffffffff81393a77>] pcmcia_card_add+0xa6/0xc3
[ 1033.118929] [<ffffffff81393ad8>] pcmcia_bus_add+0x44/0x4b
[ 1033.118937] [<ffffffff813921a1>] socket_insert+0xcc/0xe9
[ 1033.118946] [<ffffffff81392926>] pccardd+0x1b1/0x316
[ 1033.118954] [<ffffffff8105185e>] kthread+0x7d/0x85
[ 1033.118963] [<ffffffff81002e94>] kernel_thread_helper+0x4/0x10
[ 1033.118974]
[ 1033.118976] -> #0 (&socket->skt_mutex){+.+.+.}:
[ 1033.118985] [<ffffffff8106316e>] __lock_acquire+0x111e/0x17f2
[ 1033.118994] [<ffffffff81063cf8>] lock_acquire+0x86/0x9e
[ 1033.119002] [<ffffffff815c16c6>] mutex_lock_nested+0x5e/0x305
[ 1033.119012] [<ffffffff81391aa9>] __pcmcia_pm_op+0x29/0x48
[ 1033.119020] [<ffffffff81391aea>]
pcmcia_socket_dev_suspend_noirq+0x10/0x12
[ 1033.119030] [<ffffffff8130eaf6>] pm_noirq_op+0x88/0xfd
[ 1033.119038] [<ffffffff8130f12e>] dpm_suspend_noirq+0x55/0x14f
[ 1033.119046] [<ffffffff8106f4ca>] suspend_devices_and_enter+0x99/0x188
[ 1033.119057] [<ffffffff8106f684>] enter_state+0xcb/0xcf
[ 1033.119064] [<ffffffff8106ed6b>] state_store+0xa7/0xc6
[ 1033.119074] [<ffffffff81229edf>] kobj_attr_store+0x17/0x19
[ 1033.119085] [<ffffffff81112a8b>] sysfs_write_file+0x103/0x13f
[ 1033.119098] [<ffffffff810c00fd>] vfs_write+0xb0/0x12f
[ 1033.119107] [<ffffffff810c0359>] sys_write+0x45/0x6c
[ 1033.119116] [<ffffffff8100206b>] system_call_fastpath+0x16/0x1b
[ 1033.119127]
[ 1033.119129] other info that might help us debug this:
[ 1033.119132]
[ 1033.119137] 4 locks held by bash/4009:
[ 1033.119141] #0: (&buffer->mutex){+.+.+.}, at:
[<ffffffff811129bf>] sysfs_write_file+0x37/0x13f
[ 1033.119156] #1: (s_active#107){.+.+.+}, at: [<ffffffff81112a6a>]
sysfs_write_file+0xe2/0x13f
[ 1033.119173] #2: (pm_mutex){+.+.+.}, at: [<ffffffff8106f5e6>]
enter_state+0x2d/0xcf
[ 1033.119186] #3: (dpm_list_mtx){+.+...}, at: [<ffffffff8130f102>]
dpm_suspend_noirq+0x29/0x14f
[ 1033.119200]
[ 1033.119202] stack backtrace:
[ 1033.119210] Pid: 4009, comm: bash Not tainted 2.6.37-rc1-00005-gd88c092 #13
[ 1033.119215] Call Trace:
[ 1033.119226] [<ffffffff815c31ba>] ? _raw_spin_unlock_irqrestore+0x46/0x64
[ 1033.119235] [<ffffffff8105fd7c>] print_circular_bug+0xae/0xbc
[ 1033.119245] [<ffffffff8106316e>] __lock_acquire+0x111e/0x17f2
[ 1033.119256] [<ffffffff81391aa9>] ? __pcmcia_pm_op+0x29/0x48
[ 1033.119265] [<ffffffff81063cf8>] lock_acquire+0x86/0x9e
[ 1033.119273] [<ffffffff81391aa9>] ? __pcmcia_pm_op+0x29/0x48
[ 1033.119285] [<ffffffff810332cc>] ? get_parent_ip+0x11/0x42
[ 1033.119295] [<ffffffff815c16c6>] mutex_lock_nested+0x5e/0x305
[ 1033.119304] [<ffffffff81391aa9>] ? __pcmcia_pm_op+0x29/0x48
[ 1033.119312] [<ffffffff81061165>] ? mark_held_locks+0x50/0x72
[ 1033.119322] [<ffffffff815c1916>] ? mutex_lock_nested+0x2ae/0x305
[ 1033.119331] [<ffffffff810332cc>] ? get_parent_ip+0x11/0x42
[ 1033.119339] [<ffffffff81391afe>] ? socket_suspend+0x0/0x7c
[ 1033.119348] [<ffffffff81391aa9>] __pcmcia_pm_op+0x29/0x48
[ 1033.119358] [<ffffffff81391aea>] pcmcia_socket_dev_suspend_noirq+0x10/0x12
[ 1033.119366] [<ffffffff8130eaf6>] pm_noirq_op+0x88/0xfd
[ 1033.119375] [<ffffffff8130f12e>] dpm_suspend_noirq+0x55/0x14f
[ 1033.119384] [<ffffffff8106f4ca>] suspend_devices_and_enter+0x99/0x188
[ 1033.119392] [<ffffffff8106f684>] enter_state+0xcb/0xcf
[ 1033.119402] [<ffffffff8106ed6b>] state_store+0xa7/0xc6
[ 1033.119411] [<ffffffff81229edf>] kobj_attr_store+0x17/0x19
[ 1033.119420] [<ffffffff81112a8b>] sysfs_write_file+0x103/0x13f
[ 1033.119429] [<ffffffff810c00fd>] vfs_write+0xb0/0x12f
[ 1033.119438] [<ffffffff810c0359>] sys_write+0x45/0x6c
[ 1033.119448] [<ffffffff8100206b>] system_call_fastpath+0x16/0x1b
[ 1033.119494] initcall pcmcia_socket0_i+ returned 0 after 734 usecs

2010-11-03 21:19:01

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

> There's apparently an ordering problem with dpm_list_mtx and
> socket->skt_mutex. Lockdep details appended.
>
> Dominik, Rafael? What's the proper locking order here, and
> how do we fix this?

Thanks for noting this; let's see:

- We add a PCMCIA device holding skt_mutex, therefore we have the ordering
(1) skt_mutex -> (2) dpm_list_mtx

- If we're suspending, dpm_list_mtx is held, but we need to acquire
skt_mutex as we modify some data being protected by skt_mutex
(1) dpm_list_mtx -> (2) skt_mutex

Rafael, any idea on how to solve this? How do other subsystems handle such
an issue? Do they call device_add() with no locks held at all?

Best,
Dominik

2010-11-04 05:05:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Wednesday, November 03, 2010, Dominik Brodowski wrote:
> > There's apparently an ordering problem with dpm_list_mtx and
> > socket->skt_mutex. Lockdep details appended.
> >
> > Dominik, Rafael? What's the proper locking order here, and
> > how do we fix this?
>
> Thanks for noting this; let's see:
>
> - We add a PCMCIA device holding skt_mutex, therefore we have the ordering
> (1) skt_mutex -> (2) dpm_list_mtx
>
> - If we're suspending, dpm_list_mtx is held, but we need to acquire
> skt_mutex as we modify some data being protected by skt_mutex
> (1) dpm_list_mtx -> (2) skt_mutex
>
> Rafael, any idea on how to solve this? How do other subsystems handle such
> an issue? Do they call device_add() with no locks held at all?

They usually do from what I can tell.

Also only a few of them implement the ->suspend_noirq() callback, which is the
one executed under dpm_list_mtx.

What exactly is protected by skt_mutex ?

Rafael

2010-11-04 06:27:15

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Thu, Nov 04, 2010 at 06:04:05AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 03, 2010, Dominik Brodowski wrote:
> > > There's apparently an ordering problem with dpm_list_mtx and
> > > socket->skt_mutex. Lockdep details appended.
> > >
> > > Dominik, Rafael? What's the proper locking order here, and
> > > how do we fix this?
> >
> > Thanks for noting this; let's see:
> >
> > - We add a PCMCIA device holding skt_mutex, therefore we have the ordering
> > (1) skt_mutex -> (2) dpm_list_mtx
> >
> > - If we're suspending, dpm_list_mtx is held, but we need to acquire
> > skt_mutex as we modify some data being protected by skt_mutex
> > (1) dpm_list_mtx -> (2) skt_mutex
> >
> > Rafael, any idea on how to solve this? How do other subsystems handle such
> > an issue? Do they call device_add() with no locks held at all?
>
> They usually do from what I can tell.
>
> Also only a few of them implement the ->suspend_noirq() callback, which is the
> one executed under dpm_list_mtx.
>
> What exactly is protected by skt_mutex ?

e.g.
struct pcmcia_socket {
...
u_int suspended_state;
int resume_status;
...
}

Furthermore, one has to acquire skt_mutex first before obtaining ops_mutex,
which protects many more fields (and asserts exclusion for some code paths),
see Documentation/pcmcia/locking.txt for details.

Best,
Dominik

2010-11-04 13:26:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Thursday, November 04, 2010, Dominik Brodowski wrote:
> On Thu, Nov 04, 2010 at 06:04:05AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 03, 2010, Dominik Brodowski wrote:
> > > > There's apparently an ordering problem with dpm_list_mtx and
> > > > socket->skt_mutex. Lockdep details appended.
> > > >
> > > > Dominik, Rafael? What's the proper locking order here, and
> > > > how do we fix this?
> > >
> > > Thanks for noting this; let's see:
> > >
> > > - We add a PCMCIA device holding skt_mutex, therefore we have the ordering
> > > (1) skt_mutex -> (2) dpm_list_mtx
> > >
> > > - If we're suspending, dpm_list_mtx is held, but we need to acquire
> > > skt_mutex as we modify some data being protected by skt_mutex
> > > (1) dpm_list_mtx -> (2) skt_mutex
> > >
> > > Rafael, any idea on how to solve this? How do other subsystems handle such
> > > an issue? Do they call device_add() with no locks held at all?
> >
> > They usually do from what I can tell.
> >
> > Also only a few of them implement the ->suspend_noirq() callback, which is the
> > one executed under dpm_list_mtx.
> >
> > What exactly is protected by skt_mutex ?
>
> e.g.
> struct pcmcia_socket {
> ...
> u_int suspended_state;
> int resume_status;
> ...
> }
>
> Furthermore, one has to acquire skt_mutex first before obtaining ops_mutex,
> which protects many more fields (and asserts exclusion for some code paths),
> see Documentation/pcmcia/locking.txt for details.

OK, so I think we can relax the locking in dpm_[suspend/resume]_noirq() to
avoid executing callbacks under dpm_list_mtx, like in the (untested) patch
below.

Alan, do you see any immediate problem with that?

Rafael

---
drivers/base/power/main.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -480,15 +480,23 @@ void dpm_resume_noirq(pm_message_t state

mutex_lock(&dpm_list_mtx);
transition_started = false;
- list_for_each_entry(dev, &dpm_list, power.entry)
+ list_for_each_entry(dev, &dpm_list, power.entry) {
+ get_device(dev);
if (dev->power.status > DPM_OFF) {
int error;

dev->power.status = DPM_OFF;
+
+ mutex_unlock(&dpm_list_mtx);
+
error = device_resume_noirq(dev, state);
+
+ mutex_lock(&dpm_list_mtx);
if (error)
pm_dev_err(dev, state, " early", error);
}
+ put_device(dev);
+ }
mutex_unlock(&dpm_list_mtx);
dpm_show_time(starttime, state, "early");
resume_device_irqs();
@@ -796,12 +804,19 @@ int dpm_suspend_noirq(pm_message_t state
suspend_device_irqs();
mutex_lock(&dpm_list_mtx);
list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
+ get_device(dev);
+ mutex_unlock(&dpm_list_mtx);
+
error = device_suspend_noirq(dev, state);
+
+ mutex_lock(&dpm_list_mtx);
if (error) {
pm_dev_err(dev, state, " late", error);
+ put_device(dev);
break;
}
dev->power.status = DPM_OFF_IRQ;
+ put_device(dev);
}
mutex_unlock(&dpm_list_mtx);
if (error)

2010-11-04 14:51:00

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Thu, 4 Nov 2010, Rafael J. Wysocki wrote:

> OK, so I think we can relax the locking in dpm_[suspend/resume]_noirq() to
> avoid executing callbacks under dpm_list_mtx, like in the (untested) patch
> below.
>
> Alan, do you see any immediate problem with that?
>
> Rafael
>
> ---
> drivers/base/power/main.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -480,15 +480,23 @@ void dpm_resume_noirq(pm_message_t state
>
> mutex_lock(&dpm_list_mtx);
> transition_started = false;
> - list_for_each_entry(dev, &dpm_list, power.entry)
> + list_for_each_entry(dev, &dpm_list, power.entry) {
> + get_device(dev);
> if (dev->power.status > DPM_OFF) {
> int error;
>
> dev->power.status = DPM_OFF;
> +
> + mutex_unlock(&dpm_list_mtx);
> +
> error = device_resume_noirq(dev, state);
> +
> + mutex_lock(&dpm_list_mtx);
> if (error)
> pm_dev_err(dev, state, " early", error);
> }
> + put_device(dev);
> + }
> mutex_unlock(&dpm_list_mtx);
> dpm_show_time(starttime, state, "early");
> resume_device_irqs();
> @@ -796,12 +804,19 @@ int dpm_suspend_noirq(pm_message_t state
> suspend_device_irqs();
> mutex_lock(&dpm_list_mtx);
> list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
> + get_device(dev);
> + mutex_unlock(&dpm_list_mtx);
> +
> error = device_suspend_noirq(dev, state);
> +
> + mutex_lock(&dpm_list_mtx);
> if (error) {
> pm_dev_err(dev, state, " late", error);
> + put_device(dev);
> break;
> }
> dev->power.status = DPM_OFF_IRQ;
> + put_device(dev);
> }
> mutex_unlock(&dpm_list_mtx);
> if (error)

This won't work if the callback tries to unregister the device, but of
course the old code wouldn't work in that case either. We should
document this requirement. It means that your get_device and
put_device calls aren't needed.

Aside from that I don't see any problems. In principle there shouldn't
be any other processes running at this time that would want to access
either the device or the dpm_list. Maybe this means the socket locking
isn't needed in the pcmcia late-suspend and early-resume routines,
which would be a simpler solution.

Alan Stern

2010-11-04 17:08:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Thu, Nov 4, 2010 at 9:24 AM, Rafael J. Wysocki <[email protected]> wrote:
>
> OK, so I think we can relax the locking in dpm_[suspend/resume]_noirq() to
> avoid executing callbacks under dpm_list_mtx, like in the (untested) patch
> below.

ABSOLUTELY NOT.

If you drop the lock in the middle of the loop, you should remove the
lock around the loop entirely. There is absolutely no difference
between "drop lock in the middle" and "don't take lock at all".

Either that list traversal needs the lock or it does not. There is no
"it needs the lock, but not while doing random crap X in the middle of
traversal".

If nothing can possibly change the list while calling the device, then
you don't need the lock. And if something _can_ change the list,
dropping the lock means that the list is no longer trustworthy and you
can't just continue in the middle.

Don't write code like this with nonsensical locking. Not even if it is
a case of "it happens to work because we have those other totally
random rules".

Linus

2010-11-04 17:20:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] One more power management fix for 2.6.37

On Tue, Nov 2, 2010 at 10:56 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> Can you boot with initcall_debug and try to suspend, please? ?That should tell
> us what device this actually happens to.

Well, I was running with a lot of debugging for a while, and nothing
bad happened over several suspends. So I gave up on it. And then today
I got this one.. Which was followed immedately by a bluez error, which
may or may not be a coincidence.

That said, I guess I haven't historically had bluetooth enabled on
this laptop, and so it's certainly possible that it's some old problem
that I just hadn't seen before. I don't use the laptop very much, and
before the KS summit trip I re-installed the whole system, so I have
no history with this configuration.

Does this trigger any ideas?

The errors do seem to have a pattern: something got zeroed. The NULL
pointer dereference is the "sock->ops->poll()" call, but "sock->ops"
is NULL. And then there's that "VFS: Close: file count is 0" thing -
obviously something got zeroed too early.

The kref_get() warning is also about a variable being surprisingly zero.

So it smells like suspend/resume ends up zeroing some block of memory.
I just don't see why it always would seem to trigger in kref_get(). If
it was some random memory zeroing, I'd expect the result to be more
random, and not hit just that one specific WARN_ON().

Linus

---
[ 8652.088706] PM: resume of devices complete after 3268.593 msecs
[ 8652.104357] ------------[ cut here ]------------
[ 8652.104368] WARNING: at lib/kref.c:34 kref_get+0x23/0x2c()
[ 8652.104371] Hardware name: HP Compaq 2510p Notebook PC
[ 8652.104374] Modules linked in: iwlagn [last unloaded: scsi_wait_scan]
[ 8652.104382] Pid: 18012, comm: pm-suspend Not tainted
2.6.37-rc1-00027-gff8b16d #14
[ 8652.104385] Call Trace:
[ 8652.104395] [<ffffffff81036082>] warn_slowpath_common+0x80/0x98
[ 8652.104401] [<ffffffff810360af>] warn_slowpath_null+0x15/0x17
[ 8652.104407] [<ffffffff8120002b>] kref_get+0x23/0x2c
[ 8652.104412] [<ffffffff811fee27>] kobject_get+0x1a/0x21
[ 8652.104418] [<ffffffff812d84cb>] get_device+0x14/0x1a
[ 8652.104425] [<ffffffff812dfce5>] dpm_resume_end+0x230/0x37c
[ 8652.104432] [<ffffffff81060a09>] suspend_devices_and_enter+0x158/0x188
[ 8652.104437] [<ffffffff81060b04>] enter_state+0xcb/0xcf
[ 8652.104442] [<ffffffff810602cf>] state_store+0xa7/0xc6
[ 8652.104447] [<ffffffff811fec37>] kobj_attr_store+0x17/0x19
[ 8652.104453] [<ffffffff810f75e8>] sysfs_write_file+0xf2/0x12e
[ 8652.104460] [<ffffffff810ab9a8>] vfs_write+0xb0/0x12f
[ 8652.104465] [<ffffffff810abc04>] sys_write+0x45/0x6c
[ 8652.104472] [<ffffffff81001fab>] system_call_fastpath+0x16/0x1b
[ 8652.104476] ---[ end trace dca322e94d9e9dd5 ]---
bluetoothd[2862]: HCI dev 0 down
bluetoothd[2862]: Adapter /org/bluez/2857/hci0 has been disabled
bluetoothd[2862]: Stopping security manager 0
bluetoothd[2862]: HCI dev 0 unregistered
bluetoothd[2862]: Unregister path: /org/bluez/2857/hci0
bluetoothd[2862]: HCI dev 0 registered
[ 8652.104877] Restarting tasks ... done.
[ 8652.119373] video LNXVIDEO:00: Restoring backlight state
dbus-daemon: [system] Rejected send message, 2 matched rules;
type="error", sender=":1.46" (uid=500 pid=3534 comm="blueto$
bluetoothd[2862]: HCI dev 0 up
bluetoothd[2862]: Starting security manager 0
bluetoothd[2862]: Parsing /etc/bluetooth/serial.conf failed: No such
file or directory
bluetoothd[2862]: Adapter /org/bluez/2857/hci0 has been enabled
[ 8652.191020] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000040
[ 8652.191125] IP: [<ffffffff81457fcb>] sock_poll+0x12/0x17
[ 8652.191200] PGD 0
[ 8652.191226] Oops: 0000 [#1] SMP
[ 8652.191266] last sysfs file: /sys/devices/virtual/dmi/id/chassis_type
[ 8652.191325] CPU 1
[ 8652.191348] Modules linked in: iwlagn [last unloaded: scsi_wait_scan]
[ 8652.191420]
[ 8652.191441] Pid: 2862, comm: bluetoothd Tainted: G W
2.6.37-rc1-00027-gff8b16d #14 30C9/HP Compaq 2510p Noteb$
[ 8652.191550] RIP: 0010:[<ffffffff81457fcb>] [<ffffffff81457fcb>]
sock_poll+0x12/0x17
[ 8652.191641] RSP: 0018:ffff8800787c1b38 EFLAGS: 00010246
[ 8652.191713] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffffffff81672390
[ 8652.191780] RDX: 0000000000000000 RSI: ffff88003dbfc500 RDI:
ffff8800379fc9c0
[ 8652.191849] RBP: ffff8800787c1b38 R08: 0000000000000000 R09:
0000000000000000
[ 8652.191922] R10: 0000000000000001 R11: 0000000000000246 R12:
00007ff0d9208910
[ 8652.191980] R13: ffff8800787c1df8 R14: ffff8800787c1e54 R15:
0000000000000001
[ 8652.192045] FS: 00007ff0d7a4a720(0000) GS:ffff88007e500000(0000)
knlGS:0000000000000000
[ 8652.192138] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8652.192198] CR2: 0000000000000040 CR3: 000000007bb8f000 CR4:
00000000000006e0
[ 8652.192272] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 8652.192342] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 8652.192408] Process bluetoothd (pid: 2862, threadinfo
ffff8800787c0000, task ffff88007b89e9b0)
[ 8652.192488] Stack:
[ 8652.192512] ffff8800787c1f38 ffffffff810ba4df ffff8800379fc9c0
0000000000000000
[ 8652.192610] ffff88007b89e9b0 ffff8800787c1e84 0000000000000000
0000000000000000
[ 8652.192696] 0000000000000000 0000000000000000 ffffffff810b95f3
0000000000000019
[ 8652.192779] Call Trace:
[ 8652.192806] [<ffffffff810ba4df>] do_sys_poll+0x23f/0x3d0
[ 8652.192853] [<ffffffff810b95f3>] ? __pollwait+0x0/0xc7
[ 8652.192898] [<ffffffff810b96ba>] ? pollwake+0x0/0x4f
[ 8652.192940] [<ffffffff810b96ba>] ? pollwake+0x0/0x4f
[ 8652.192982] [<ffffffff810b96ba>] ? pollwake+0x0/0x4f
[ 8652.193024] [<ffffffff810b96ba>] ? pollwake+0x0/0x4f
[ 8652.193066] [<ffffffff810b96ba>] ? pollwake+0x0/0x4f
[ 8652.193108] [<ffffffff810b96ba>] ? pollwake+0x0/0x4f
[ 8652.193153] [<ffffffff81462968>] ? verify_iovec+0x4c/0x9c
[ 8652.193206] [<ffffffff8145afc0>] ? sys_sendmsg+0x1e5/0x249
[ 8652.193252] [<ffffffff810bae2b>] ? d_kill+0x55/0x5d
[ 8652.193300] [<ffffffff810bb33a>] ? dput+0x24/0x126
[ 8652.193342] [<ffffffff810acc9d>] ? fput+0x1b1/0x1c0
[ 8652.193389] [<ffffffff810ba70d>] sys_poll+0x50/0xba
[ 8652.193390] [<ffffffff81001fab>] system_call_fastpath+0x16/0x1b
[ 8652.193390] Code: 48 8b 87 10 01 00 00 a9 00 00 01 00 74 07 88 d0
83 c8 02 88 06 31 c0 c9 c3 55 48 89 f2 48 89 e5 48 8$
[ 8652.193390] RIP [<ffffffff81457fcb>] sock_poll+0x12/0x17
[ 8652.193390] RSP <ffff8800787c1b38>
[ 8652.198050] CR2: 0000000000000040
[ 8652.341807] ---[ end trace dca322e94d9e9dd6 ]---
pulseaudio[3284]: bluetooth-util.c: Error from ListDevices reply:
org.freedesktop.DBus.Error.NoReply
[ 8652.344611] VFS: Close: file count is 0

2010-11-04 19:57:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Thursday, November 04, 2010, Linus Torvalds wrote:
> On Thu, Nov 4, 2010 at 9:24 AM, Rafael J. Wysocki <[email protected]> wrote:
> >
> > OK, so I think we can relax the locking in dpm_[suspend/resume]_noirq() to
> > avoid executing callbacks under dpm_list_mtx, like in the (untested) patch
> > below.
>
> ABSOLUTELY NOT.
>
> If you drop the lock in the middle of the loop, you should remove the
> lock around the loop entirely. There is absolutely no difference
> between "drop lock in the middle" and "don't take lock at all".
>
> Either that list traversal needs the lock or it does not. There is no
> "it needs the lock, but not while doing random crap X in the middle of
> traversal".

Your're right, it only makes sense to either leave it or remove it entirely.

> If nothing can possibly change the list while calling the device, then
> you don't need the lock. And if something _can_ change the list,
> dropping the lock means that the list is no longer trustworthy and you
> can't just continue in the middle.

At this point, if everyone does everything right, there should be nothing
running in parallel with us that will attempt to modify the list. So, I'd say
let's drop the lock completely.

Thanks,
Rafael

2010-11-09 23:40:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Thursday, November 04, 2010, Alan Stern wrote:
> On Thu, 4 Nov 2010, Rafael J. Wysocki wrote:
...
> This won't work if the callback tries to unregister the device, but of
> course the old code wouldn't work in that case either. We should
> document this requirement.

As per our discussion in Boston I think we may actually allow
devices to be unregistered during the late and early phases and
avoid the locking problem with PCMCIA at the same time.

So, what about the patch below?

Rafael

---
drivers/base/power/main.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -475,20 +475,33 @@ End:
*/
void dpm_resume_noirq(pm_message_t state)
{
- struct device *dev;
+ struct list_head list;
ktime_t starttime = ktime_get();

+ INIT_LIST_HEAD(&list);
mutex_lock(&dpm_list_mtx);
- transition_started = false;
- list_for_each_entry(dev, &dpm_list, power.entry)
+ while (!list_empty(&dpm_list)) {
+ struct device *dev = to_device(dpm_list.next);
+
+ get_device(dev);
if (dev->power.status > DPM_OFF) {
int error;

dev->power.status = DPM_OFF;
+ mutex_unlock(&dpm_list_mtx);
+
error = device_resume_noirq(dev, state);
+
+ mutex_lock(&dpm_list_mtx);
if (error)
pm_dev_err(dev, state, " early", error);
}
+ if (!list_empty(&dev->power.entry))
+ list_move_tail(&dev->power.entry, &list);
+ put_device(dev);
+ }
+ list_splice(&list, &dpm_list);
+ transition_started = false;
mutex_unlock(&dpm_list_mtx);
dpm_show_time(starttime, state, "early");
resume_device_irqs();
@@ -789,20 +802,33 @@ End:
*/
int dpm_suspend_noirq(pm_message_t state)
{
- struct device *dev;
+ struct list_head list;
ktime_t starttime = ktime_get();
int error = 0;

+ INIT_LIST_HEAD(&list);
suspend_device_irqs();
mutex_lock(&dpm_list_mtx);
- list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
+ while (!list_empty(&dpm_list)) {
+ struct device *dev = to_device(dpm_list.prev);
+
+ get_device(dev);
+ mutex_unlock(&dpm_list_mtx);
+
error = device_suspend_noirq(dev, state);
+
+ mutex_lock(&dpm_list_mtx);
if (error) {
pm_dev_err(dev, state, " late", error);
+ put_device(dev);
break;
}
dev->power.status = DPM_OFF_IRQ;
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &list);
+ put_device(dev);
}
+ list_splice_tail(&list, &dpm_list);
mutex_unlock(&dpm_list_mtx);
if (error)
dpm_resume_noirq(resume_event(state));

2010-11-10 00:50:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Tue, Nov 9, 2010 at 3:39 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> So, what about the patch below?

I think it looks saner, but I also think that it would be saner yet to
have a separate list entirely, and *not* do this crazy "move things
back and forth between 'dpm_list'" thing.

So I would seriously suggest that the design should aim for each
suspend event to move things between two lists, and as devices go
through the suspend phases, they'd move to/from lists that also
indicate which suspend level they are at.

So why not introduce a new list called "dpm_list_suspended", and in
"dpm_suspend_noirq()" you move devices one at a time from the
"dpm_list" to the "dmp_list_suspended".

And then in "dpm_resume_noirq()" you move them back.

Wouldn't that be nice?

(Optimally, you'd have separate lists for "active", "suspended", and
"irq-suspended")

But regardless, your patch does seem to at least match what we
currently do in the regular suspend/resume code (ie the
non-irq's-disabled case). So I don't mind it. I just think that it
would be cleaner to not take things off one list only to put them back
on the same list again.

In particular, _if_ device add events can happen concurrently with
this, I don't understand how that would maintain the depth-first order
of the list. In contrast, if you do it with separate lists, then you
know that if a device is on the "suspended" list, then it by
definition has to be "after" all devices that are on the non-suspended
list, since you cannot have a non-suspended device that depends on a
suspended one.

So having separate lists with state should also be very sensible from
a device topology standpoint - while doing that "list_splice()" back
on the same list is _not_ at all obviously correct from a topological
standpoint (I'm not sure I'm explaining this well).

Linus

2010-11-10 03:22:43

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Tue, 9 Nov 2010, Linus Torvalds wrote:

> On Tue, Nov 9, 2010 at 3:39 PM, Rafael J. Wysocki <[email protected]> wrote:
> >
> > So, what about the patch below?

The "transition_started" thing is a little odd. I get the feeling that
it shouldn't be set back to false during dpm_resume_noirq() at all.
Maybe I'm not quite thinking straight just now...

> I think it looks saner, but I also think that it would be saner yet to
> have a separate list entirely, and *not* do this crazy "move things
> back and forth between 'dpm_list'" thing.
>
> So I would seriously suggest that the design should aim for each
> suspend event to move things between two lists, and as devices go
> through the suspend phases, they'd move to/from lists that also
> indicate which suspend level they are at.
>
> So why not introduce a new list called "dpm_list_suspended", and in
> "dpm_suspend_noirq()" you move devices one at a time from the
> "dpm_list" to the "dmp_list_suspended".
>
> And then in "dpm_resume_noirq()" you move them back.
>
> Wouldn't that be nice?
>
> (Optimally, you'd have separate lists for "active", "suspended", and
> "irq-suspended")
>
> But regardless, your patch does seem to at least match what we
> currently do in the regular suspend/resume code (ie the
> non-irq's-disabled case). So I don't mind it. I just think that it
> would be cleaner to not take things off one list only to put them back
> on the same list again.
>
> In particular, _if_ device add events can happen concurrently with
> this,

They can. We explicitly allow new devices to be registered during the
final "complete" phase, and we grudgingly allow it even during the
"resume" phase, if the parent has already been resumed.

The "prepare" traversal is ordered in the forward direction so that if
a child is registered beneath a device during that device's ->prepare
callback, it will end up in the right place (i.e., after the parent in
the list). The "complete" traversal should work out the same way, only
in reverse. Which means we should _start_ with everything on the other
list, and move each device onto the dpm_list just _before_ invoking its
->complete callback.

The way the code is now, it looks like a child registered during the
parent's callback will end up in the wrong place.

> I don't understand how that would maintain the depth-first order
> of the list. In contrast, if you do it with separate lists, then you
> know that if a device is on the "suspended" list, then it by
> definition has to be "after" all devices that are on the non-suspended
> list, since you cannot have a non-suspended device that depends on a
> suspended one.

The situation isn't quite as bad as it seems, if you assume that a
child will never be registered at a time when its parent is still
suspended. Right now we warn if such a thing happens but we don't
prevent it.

> So having separate lists with state should also be very sensible from
> a device topology standpoint - while doing that "list_splice()" back
> on the same list is _not_ at all obviously correct from a topological
> standpoint (I'm not sure I'm explaining this well).

I don't see anything wrong with changing over to multiple list heads.
It might even allow us to remove the dev->power.status field.

Alan Stern

2010-11-10 20:52:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Wednesday, November 10, 2010, Alan Stern wrote:
> On Tue, 9 Nov 2010, Linus Torvalds wrote:
>
> > On Tue, Nov 9, 2010 at 3:39 PM, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > So, what about the patch below?
>
> The "transition_started" thing is a little odd. I get the feeling that
> it shouldn't be set back to false during dpm_resume_noirq() at all.
> Maybe I'm not quite thinking straight just now...

No, I think you're right. I rethought that particular thing in the meantime
and came to the conclusion that it'd be better to keep it where it was.

> > I think it looks saner, but I also think that it would be saner yet to
> > have a separate list entirely, and *not* do this crazy "move things
> > back and forth between 'dpm_list'" thing.
> >
> > So I would seriously suggest that the design should aim for each
> > suspend event to move things between two lists, and as devices go
> > through the suspend phases, they'd move to/from lists that also
> > indicate which suspend level they are at.
> >
> > So why not introduce a new list called "dpm_list_suspended", and in
> > "dpm_suspend_noirq()" you move devices one at a time from the
> > "dpm_list" to the "dmp_list_suspended".
> >
> > And then in "dpm_resume_noirq()" you move them back.
> >
> > Wouldn't that be nice?
> >
> > (Optimally, you'd have separate lists for "active", "suspended", and
> > "irq-suspended")
> >
> > But regardless, your patch does seem to at least match what we
> > currently do in the regular suspend/resume code (ie the
> > non-irq's-disabled case). So I don't mind it. I just think that it
> > would be cleaner to not take things off one list only to put them back
> > on the same list again.
> >
> > In particular, _if_ device add events can happen concurrently with
> > this,
>
> They can. We explicitly allow new devices to be registered during the
> final "complete" phase, and we grudgingly allow it even during the
> "resume" phase, if the parent has already been resumed.
>
> The "prepare" traversal is ordered in the forward direction so that if
> a child is registered beneath a device during that device's ->prepare
> callback, it will end up in the right place (i.e., after the parent in
> the list). The "complete" traversal should work out the same way, only
> in reverse. Which means we should _start_ with everything on the other
> list, and move each device onto the dpm_list just _before_ invoking its
> ->complete callback.
>
> The way the code is now, it looks like a child registered during the
> parent's callback will end up in the wrong place.
>
> > I don't understand how that would maintain the depth-first order
> > of the list. In contrast, if you do it with separate lists, then you
> > know that if a device is on the "suspended" list, then it by
> > definition has to be "after" all devices that are on the non-suspended
> > list, since you cannot have a non-suspended device that depends on a
> > suspended one.
>
> The situation isn't quite as bad as it seems, if you assume that a
> child will never be registered at a time when its parent is still
> suspended. Right now we warn if such a thing happens but we don't
> prevent it.
>
> > So having separate lists with state should also be very sensible from
> > a device topology standpoint - while doing that "list_splice()" back
> > on the same list is _not_ at all obviously correct from a topological
> > standpoint (I'm not sure I'm explaining this well).
>
> I don't see anything wrong with changing over to multiple list heads.
> It might even allow us to remove the dev->power.status field.

I guess it would allow us to do that, but that's going to be a major change.

What about applying the patch below now and moving to the multiple list heads
approach in the next cycle?

Rafael

---
From: Rafael J. Wysocki <[email protected]>
Subject: PM: Allow devices to be removed during late suspend and early resume

Holding dpm_list_mtx across late suspend and early resume of devices
is problematic for the PCMCIA subsystem and doesn't allow device
objects to be removed by late suspend and early resume driver
callbacks. This appears to be overly restrictive, as drivers are
generally allowed to remove device objects in other phases of suspend
and resume. Therefore rework dpm_{suspend|resume}_noirq() so that
they don't have to hold dpm_list_mtx all the time.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/main.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -475,20 +475,33 @@ End:
*/
void dpm_resume_noirq(pm_message_t state)
{
- struct device *dev;
+ struct list_head list;
ktime_t starttime = ktime_get();

+ INIT_LIST_HEAD(&list);
mutex_lock(&dpm_list_mtx);
transition_started = false;
- list_for_each_entry(dev, &dpm_list, power.entry)
+ while (!list_empty(&dpm_list)) {
+ struct device *dev = to_device(dpm_list.next);
+
+ get_device(dev);
if (dev->power.status > DPM_OFF) {
int error;

dev->power.status = DPM_OFF;
+ mutex_unlock(&dpm_list_mtx);
+
error = device_resume_noirq(dev, state);
+
+ mutex_lock(&dpm_list_mtx);
if (error)
pm_dev_err(dev, state, " early", error);
}
+ if (!list_empty(&dev->power.entry))
+ list_move_tail(&dev->power.entry, &list);
+ put_device(dev);
+ }
+ list_splice(&list, &dpm_list);
mutex_unlock(&dpm_list_mtx);
dpm_show_time(starttime, state, "early");
resume_device_irqs();
@@ -789,20 +802,33 @@ End:
*/
int dpm_suspend_noirq(pm_message_t state)
{
- struct device *dev;
+ struct list_head list;
ktime_t starttime = ktime_get();
int error = 0;

+ INIT_LIST_HEAD(&list);
suspend_device_irqs();
mutex_lock(&dpm_list_mtx);
- list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
+ while (!list_empty(&dpm_list)) {
+ struct device *dev = to_device(dpm_list.prev);
+
+ get_device(dev);
+ mutex_unlock(&dpm_list_mtx);
+
error = device_suspend_noirq(dev, state);
+
+ mutex_lock(&dpm_list_mtx);
if (error) {
pm_dev_err(dev, state, " late", error);
+ put_device(dev);
break;
}
dev->power.status = DPM_OFF_IRQ;
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &list);
+ put_device(dev);
}
+ list_splice_tail(&list, &dpm_list);
mutex_unlock(&dpm_list_mtx);
if (error)
dpm_resume_noirq(resume_event(state));

2010-11-15 15:20:23

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [GIT PULL] One more power management fix for 2.6.37

On Wed, 10 Nov 2010, Rafael J. Wysocki wrote:

> What about applying the patch below now and moving to the multiple list heads
> approach in the next cycle?
>
> Rafael
>
> ---
> From: Rafael J. Wysocki <[email protected]>
> Subject: PM: Allow devices to be removed during late suspend and early resume
>
> Holding dpm_list_mtx across late suspend and early resume of devices
> is problematic for the PCMCIA subsystem and doesn't allow device
> objects to be removed by late suspend and early resume driver
> callbacks. This appears to be overly restrictive, as drivers are
> generally allowed to remove device objects in other phases of suspend
> and resume. Therefore rework dpm_{suspend|resume}_noirq() so that
> they don't have to hold dpm_list_mtx all the time.

This looks okay to me. Just like the other routines.

Alan Stern