LinuxLists.cc - swsusp 'disk' fails in bk-current

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Andy Isaacson wrote:

> Dmesg is attached; hardware is a Vaio r505te.
>
> Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
> suspends successfully, maybe one time out of 10. And thinking back, I
> *sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
> out of 20 suspends.

Does it hang hard or is sysrq still working?
If sysrq is still working, please try with "i8042.noaux" (this will kill
your touchpad, which is what i intend :-)

Best regards,

Stefan

2005-03-24 18:11:04

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, Mar 24, 2005 at 03:27:15PM +0100, Stefan Seyfried wrote:
> Andy Isaacson wrote:
> > Dmesg is attached; hardware is a Vaio r505te.
> >
> > Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
> > suspends successfully, maybe one time out of 10. And thinking back, I
> > *sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
> > out of 20 suspends.
>
> Does it hang hard or is sysrq still working?

Sysrq still prints stuff, so IRQs aren't locked. But most of the sysrq
commands don't work... S and U don't seem to do anything (not too
suprising I suppose) but B does reboot.

> If sysrq is still working, please try with "i8042.noaux" (this will kill
> your touchpad, which is what i intend :-)

So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action. Then I
suspended, and it worked fine. After restart, I suspended again - also
fine.

So I think that fixed it. But no touchpad is a bit annoying. :)

-andy

2005-03-24 19:18:47

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <[email protected]> wrote:
>
> So I added i8042.noaux to my kernel command line, rebooted, insmodded
> intel_agp, started X, and verified no touchpad action. Then I
> suspended, and it worked fine. After restart, I suspended again - also
> fine.
>
> So I think that fixed it. But no touchpad is a bit annoying. :)
>

Try adding i8042.nomux instead of i8042.noaux, it should keep your
touchpad in working condition. Please let me know if it still wiorks.

--
Dmitry

2005-03-24 20:21:06

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote:
> On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <[email protected]> wrote:
> > So I added i8042.noaux to my kernel command line, rebooted, insmodded
> > intel_agp, started X, and verified no touchpad action. Then I
> > suspended, and it worked fine. After restart, I suspended again - also
> > fine.
> >
> > So I think that fixed it. But no touchpad is a bit annoying. :)
>
> Try adding i8042.nomux instead of i8042.noaux, it should keep your
> touchpad in working condition. Please let me know if it still wiorks.

With nomux the touchpad works again, but suspend blocks in the same
place as without nomux.

(How can I verify that "nomux" was accepted? It shows up on the "Kernel
command line" but there's no other mention of it in dmesg.)

-andy

2005-03-24 20:39:39

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Andy Isaacson wrote:
> On Thu, Mar 24, 2005 at 03:27:15PM +0100, Stefan Seyfried wrote:

> Sysrq still prints stuff, so IRQs aren't locked. But most of the sysrq
> commands don't work... S and U don't seem to do anything (not too
> suprising I suppose) but B does reboot.

sysrq-t will probably show a stuck kseriod. Unfortunately it only
happens on one machine for me (toshiba P10-550 IIRC, P4HT but with
non-smp kernel) which has no serial port for console.

>> If sysrq is still working, please try with "i8042.noaux" (this will kill
>> your touchpad, which is what i intend :-)
>
> So I added i8042.noaux to my kernel command line, rebooted, insmodded
> intel_agp, started X, and verified no touchpad action. Then I
> suspended, and it worked fine. After restart, I suspended again - also
> fine.
>
> So I think that fixed it. But no touchpad is a bit annoying. :)

Yes, it was not thought as a fix but just for verification, since i have
seen something similar.
We have a SUSE bug for this, i believe Vojtech and Pavel will take care
of this one. Thanks for confirming, i almost started to believe i was
seeing ghosts :-)
--
seife
Never trust a computer you can't lift.

2005-03-24 21:10:48

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <[email protected]> wrote:
> On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote:
> > On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <[email protected]> wrote:
> > > So I added i8042.noaux to my kernel command line, rebooted, insmodded
> > > intel_agp, started X, and verified no touchpad action. Then I
> > > suspended, and it worked fine. After restart, I suspended again - also
> > > fine.
> > >
> > > So I think that fixed it. But no touchpad is a bit annoying. :)
> >
> > Try adding i8042.nomux instead of i8042.noaux, it should keep your
> > touchpad in working condition. Please let me know if it still wiorks.
>
> With nomux the touchpad works again, but suspend blocks in the same
> place as without nomux.
>
> (How can I verify that "nomux" was accepted? It shows up on the "Kernel
> command line" but there's no other mention of it in dmesg.)
>
> -andy
>

If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
have MUX mode active.

--
Dmitry

2005-03-24 21:14:56

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <[email protected]> wrote:
> On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote:
> > On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <[email protected]> wrote:
> > > So I added i8042.noaux to my kernel command line, rebooted, insmodded
> > > intel_agp, started X, and verified no touchpad action. Then I
> > > suspended, and it worked fine. After restart, I suspended again - also
> > > fine.
> > >
> > > So I think that fixed it. But no touchpad is a bit annoying. :)
> >
> > Try adding i8042.nomux instead of i8042.noaux, it should keep your
> > touchpad in working condition. Please let me know if it still wiorks.
>
> With nomux the touchpad works again, but suspend blocks in the same
> place as without nomux.
>
> (How can I verify that "nomux" was accepted? It shows up on the "Kernel
> command line" but there's no other mention of it in dmesg.)
>

Ignore my babbling, I just noticed in your dmesg that your KBC does
not support MUX mode to begin with.

--
Dmitry

2005-03-24 23:54:44

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, Mar 24, 2005 at 04:10:39PM -0500, Dmitry Torokhov wrote:
> If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
> have MUX mode active.

Just serio0 and serio1.

On Thu, Mar 24, 2005 at 04:14:52PM -0500, Dmitry Torokhov wrote:
> On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <[email protected]> wrote:
> > (How can I verify that "nomux" was accepted? It shows up on the "Kernel
> > command line" but there's no other mention of it in dmesg.)
>
> Ignore my babbling, I just noticed in your dmesg that your KBC does
> not support MUX mode to begin with.

OK, anything else I should try?

Why does it only fail when I have *both* intel_agp and i8042 aux?

In the SysRq-T trace I see one interesting process: most things are
in D state in refrigerator(), but sh shows the following traceback:

wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
class_device_del
class_device_unregister
mousedev_disconnect
input_unregister_device
alps_disconnect
psmouse_disconnect
serio_driver_remove
device_release_driver
serio_release_driver
serio_resume
resume_device
dpm_resume
device_resume
swsusp_write
pm_suspend_disk
enter_state
state_store
subsys_attr_store
flush_write_buffer
sysfs_write_file
...

That seems odd to me...

Also, khelper has the following trace:
io_schedule
sync_buffer
__wait_on_bit
out_of_line_wait_on_bit
ext3_find_entry
ext3_lookup
real_lookup
do_lookup
__link_path_walk
link_path_walk
path_lookup
open_exec
do_execve
...

-andy

2005-03-25 09:22:43

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Andy Isaacson wrote:

> OK, anything else I should try?

not really, i just wait for Vojtech and Pavel :-)

> Why does it only fail when I have *both* intel_agp and i8042 aux?

later...

> In the SysRq-T trace I see one interesting process: most things are
> in D state in refrigerator(), but sh shows the following traceback:
>
> wait_for_completion
> call_usermodehelper
> kobject_hotplug
> kobject_del
> class_device_del
> class_device_unregister
> mousedev_disconnect
> input_unregister_device
> alps_disconnect
> psmouse_disconnect
> serio_driver_remove
> device_release_driver
> serio_release_driver

i think the following happens (but i am in no case an expert for this):
- alps driver suspends
- alps driver unregisters the device
- udev is called via call_usermodehelper (which fails since userspace
is stopped)
- now somebody wants to wait for udev which does not work right.

Why only with the ALPS driver and intel_agp?
I think this is an accident. For me, it happens only with init=/bin/bash
and _no_ other drivers loaded (only IDE drivers and psmouse built-in).
As soon as i load any other drivers (i have only tried ehci_hcd and
8139too, to be honest) it works fine again. This leads me to believe it
is a race condition since the extra driver that has to be suspended may
give the ALPS driver the extra time needed to finish the race. For you,
it may be the other way round.

This is mostly guesswork, i am no kernel expert at all.
--
seife
Never trust a computer you can't lift.

2005-03-25 10:14:31

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > OK, anything else I should try?
>
> not really, i just wait for Vojtech and Pavel :-)

Try commenting out "call_usermodehelper". If that helps, Stefan's
theory is confirmed, and this waits for Vojtech to fix it.

> > In the SysRq-T trace I see one interesting process: most things are
> > in D state in refrigerator(), but sh shows the following traceback:
> >
> > wait_for_completion
> > call_usermodehelper
> > kobject_hotplug
> > kobject_del
> > class_device_del
> > class_device_unregister
> > mousedev_disconnect
> > input_unregister_device
> > alps_disconnect
> > psmouse_disconnect
> > serio_driver_remove
> > device_release_driver
> > serio_release_driver

Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-25 14:19:28

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi,

On Fri, 25 Mar 2005 11:13:44 +0100, Pavel Machek <[email protected]> wrote:
> Hi!
>
> > > OK, anything else I should try?
> >
> > not really, i just wait for Vojtech and Pavel :-)
>
> Try commenting out "call_usermodehelper". If that helps, Stefan's
> theory is confirmed, and this waits for Vojtech to fix it.
>

This is more of a general swsusp problem I believe - the second phase
when it blindly resumes entire system. Resume of a device can fail
(any reason whatsoever) and it will attempt to clean up after itself,
but userspace is dead and hotplug never completes. While I am
interested to know why ALPS does not want to resume on ANdy's laptop
the issue will never be completely resolved from within the input
system.

Pavel, is it possible for swsusp to disable hotplug (probably just do
hotplug_path[0] = 0) before resuming in suspend phase?

A bit on tangent - you need to resume system so you can write the
image, right? I wonder if we could add a flag to struct device that
would mark device as "on_resume_path". The flag would be set when you
select resume partition and propagated to the root of the system. Then
when resume after making the image you could skip all devices that are
not on resume path.

--
Dmitry

2005-03-25 14:24:41

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > > > OK, anything else I should try?
> > >
> > > not really, i just wait for Vojtech and Pavel :-)
> >
> > Try commenting out "call_usermodehelper". If that helps, Stefan's
> > theory is confirmed, and this waits for Vojtech to fix it.
> >
>
> This is more of a general swsusp problem I believe - the second phase
> when it blindly resumes entire system. Resume of a device can fail
> (any reason whatsoever) and it will attempt to clean up after itself,
> but userspace is dead and hotplug never completes. While I am
> interested to know why ALPS does not want to resume on ANdy's laptop
> the issue will never be completely resolved from within the input
> system.

When device fails to resume, what should I do? I think I could

if (error)
panic("Device resume failed\n");

, but... that does not look like what you want.

> Pavel, is it possible for swsusp to disable hotplug (probably just do
> hotplug_path[0] = 0) before resuming in suspend phase?

It feels like a hack, but yes, I probably could do that. (Do you have
patch to try?)

> A bit on tangent - you need to resume system so you can write the
> image, right? I wonder if we could add a flag to struct device that
> would mark device as "on_resume_path". The flag would be set when you
> select resume partition and propagated to the root of the system. Then
> when resume after making the image you could skip all devices that are
> not on resume path.

I'm not going to do that, see FAQ in swsusp.txt.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-25 14:52:31

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Fri, 25 Mar 2005 15:24:15 +0100, Pavel Machek <[email protected]> wrote:
> Hi!
>
> > > > > OK, anything else I should try?
> > > >
> > > > not really, i just wait for Vojtech and Pavel :-)
> > >
> > > Try commenting out "call_usermodehelper". If that helps, Stefan's
> > > theory is confirmed, and this waits for Vojtech to fix it.
> > >
> >
> > This is more of a general swsusp problem I believe - the second phase
> > when it blindly resumes entire system. Resume of a device can fail
> > (any reason whatsoever) and it will attempt to clean up after itself,
> > but userspace is dead and hotplug never completes. While I am
> > interested to know why ALPS does not want to resume on ANdy's laptop
> > the issue will never be completely resolved from within the input
> > system.
>
> When device fails to resume, what should I do? I think I could
>
> if (error)
> panic("Device resume failed\n");
>
> , but... that does not look like what you want.

Oh, always panic-happy Pavel ;). It really depends on what kind of
device has faled to resume. If the device is really needed for writing
image then panic is the only recourse, but if it some other device you
resuming just ignore it, who cares...

Btw, I dont think that doing selective resume (as opposed to selective
suspend and Nigel's partial device trees) would be so much
complicated. You'd always resume sysdevs and then, when iterating over
"normal" devices, just skip ones not in resume path. It can all be
contained in driver core I believe (sorry but no patch, for now at
least).

>
> > Pavel, is it possible for swsusp to disable hotplug (probably just do
> > hotplug_path[0] = 0) before resuming in suspend phase?
>
> It feels like a hack, but yes, I probably could do that. (Do you have
> patch to try?)
>

Not really, I won't be able to write any code anything till next week I think.

--
Dmitry

2005-03-25 14:58:44

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, 24 Mar 2005 15:54:39 -0800, Andy Isaacson <[email protected]> wrote:
> On Thu, Mar 24, 2005 at 04:10:39PM -0500, Dmitry Torokhov wrote:
> > If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
> > have MUX mode active.
>
> Just serio0 and serio1.
>
> On Thu, Mar 24, 2005 at 04:14:52PM -0500, Dmitry Torokhov wrote:
> > On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <[email protected]> wrote:
> > > (How can I verify that "nomux" was accepted? It shows up on the "Kernel
> > > command line" but there's no other mention of it in dmesg.)
> >
> > Ignore my babbling, I just noticed in your dmesg that your KBC does
> > not support MUX mode to begin with.
>
> OK, anything else I should try?
>
> Why does it only fail when I have *both* intel_agp and i8042 aux?
>
> In the SysRq-T trace I see one interesting process: most things are
> in D state in refrigerator(), but sh shows the following traceback:
>
> wait_for_completion
> call_usermodehelper
> kobject_hotplug
> kobject_del
> class_device_del
> class_device_unregister
> mousedev_disconnect
> input_unregister_device
> alps_disconnect
> psmouse_disconnect
> serio_driver_remove
> device_release_driver
> serio_release_driver
> serio_resume

I wonder why ALPS reconnect failed. You don't have a serial console
set up, do you? If not then maybe you could make a huge framebuffer to
capture as much info as you can... I hope you have a digital camera ;)

Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to
suspend. I am interested of data coming in and out of i8042.

--
Dmitry

2005-03-25 15:42:55

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > > This is more of a general swsusp problem I believe - the second phase
> > > when it blindly resumes entire system. Resume of a device can fail
> > > (any reason whatsoever) and it will attempt to clean up after itself,
> > > but userspace is dead and hotplug never completes. While I am
> > > interested to know why ALPS does not want to resume on ANdy's laptop
> > > the issue will never be completely resolved from within the input
> > > system.
> >
> > When device fails to resume, what should I do? I think I could
> >
> > if (error)
> > panic("Device resume failed\n");
> >
> > , but... that does not look like what you want.
>
> Oh, always panic-happy Pavel ;). It really depends on what kind of
> device has faled to resume. If the device is really needed for writing
> image then panic is the only recourse, but if it some other device you
> resuming just ignore it, who cares...

You are right, for resume-during-suspend, we may as well risk it. We
have consistent state, and if we happen to write it on disk,
everything is okay.

For resume-during-resume, I don't really know how we can handle
that. Running with some devices non-working seems dangerous to me.

> Btw, I dont think that doing selective resume (as opposed to selective
> suspend and Nigel's partial device trees) would be so much
> complicated. You'd always resume sysdevs and then, when iterating over
> "normal" devices, just skip ones not in resume path. It can all be
> contained in driver core I believe (sorry but no patch, for now at
> least).

:-) I think we can simply make device freeze/unfreeze fast enough.
[We do not need to do full suspend/resume; freeze is enough].

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-25 16:04:33

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Fri, 25 Mar 2005 16:42:37 +0100, Pavel Machek <[email protected]> wrote:
> Hi!
>
> > > > This is more of a general swsusp problem I believe - the second phase
> > > > when it blindly resumes entire system. Resume of a device can fail
> > > > (any reason whatsoever) and it will attempt to clean up after itself,
> > > > but userspace is dead and hotplug never completes. While I am
> > > > interested to know why ALPS does not want to resume on ANdy's laptop
> > > > the issue will never be completely resolved from within the input
> > > > system.
> > >
> > > When device fails to resume, what should I do? I think I could
> > >
> > > if (error)
> > > panic("Device resume failed\n");
> > >
> > > , but... that does not look like what you want.
> >
> > Oh, always panic-happy Pavel ;). It really depends on what kind of
> > device has faled to resume. If the device is really needed for writing
> > image then panic is the only recourse, but if it some other device you
> > resuming just ignore it, who cares...
>
> You are right, for resume-during-suspend, we may as well risk it. We
> have consistent state, and if we happen to write it on disk,
> everything is okay.
>
> For resume-during-resume, I don't really know how we can handle
> that. Running with some devices non-working seems dangerous to me.
>

I think it again varies, and the driver would have to decide what to
do if it can not resume hardware. Take for example USB - i believe USB
guys are shooting at being able to disconnect device while the box is
suspended and have it removed from the system when resuming. In
Probably every driver that has even a slighest notion of
hot-pluggability should just properly clean up after itself and not
signal error to the core.

> > Btw, I dont think that doing selective resume (as opposed to selective
> > suspend and Nigel's partial device trees) would be so much
> > complicated. You'd always resume sysdevs and then, when iterating over
> > "normal" devices, just skip ones not in resume path. It can all be
> > contained in driver core I believe (sorry but no patch, for now at
> > least).
>
> :-) I think we can simply make device freeze/unfreeze fast enough.
> [We do not need to do full suspend/resume; freeze is enough].

It is not suspend/freeze here that gets us but resume and with resume
the driver (at least for now) does not have any idea if it is
"unfreeze" or "full-resume". I mean I could have serio just ignore
"unfreeze" requests (as I doubt anyone would ever try to suspend over
PS/2 port ;) ) but I think it should be really handled by the core.

--
Dmitry

2005-03-25 18:36:29

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Fri, Mar 25, 2005 at 11:13:44AM +0100, Pavel Machek wrote:
> Hi!
>
> > > OK, anything else I should try?
> >
> > not really, i just wait for Vojtech and Pavel :-)
>
> Try commenting out "call_usermodehelper". If that helps, Stefan's
> theory is confirmed, and this waits for Vojtech to fix it.
>
> > > wait_for_completion
> > > call_usermodehelper
> > > kobject_hotplug
> > > kobject_del

Without the call_usermodehelper in kobject_hotplug, the first suspend
seems to work OK (which I think confirms the theory). But after resume,
the second suspend hangs in the same place. It's calling
call_usermodehelper from input_call_hotplug... time to comment out
another one and recompile.

I also tried -mm1 and it hangs in the same place.

-andy

2005-03-28 23:01:05

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > > Btw, I dont think that doing selective resume (as opposed to selective
> > > suspend and Nigel's partial device trees) would be so much
> > > complicated. You'd always resume sysdevs and then, when iterating over
> > > "normal" devices, just skip ones not in resume path. It can all be
> > > contained in driver core I believe (sorry but no patch, for now at
> > > least).
> >
> > :-) I think we can simply make device freeze/unfreeze fast enough.
> > [We do not need to do full suspend/resume; freeze is enough].
>
> It is not suspend/freeze here that gets us but resume and with resume
> the driver (at least for now) does not have any idea if it is
> "unfreeze" or "full-resume". I mean I could have serio just ignore
> "unfreeze" requests (as I doubt anyone would ever try to suspend over
> PS/2 port ;) ) but I think it should be really handled by the core.

Please just always do full-resume... for now. Patches that enable you
to detect "unfreeze" are not in, yet. If something fails, just printk
with big enough severity and continue, as you don't have method of
signaling error, anyway.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-29 16:18:19

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Fri, 25 Mar 2005 10:22:28 +0100, Stefan Seyfried <[email protected]> wrote:
> Andy Isaacson wrote:
>
> > In the SysRq-T trace I see one interesting process: most things are
> > in D state in refrigerator(), but sh shows the following traceback:
> >
> > wait_for_completion
> > call_usermodehelper
> > kobject_hotplug
> > kobject_del
> > class_device_del
> > class_device_unregister
> > mousedev_disconnect
> > input_unregister_device
> > alps_disconnect
> > psmouse_disconnect
> > serio_driver_remove
> > device_release_driver
> > serio_release_driver
>
> i think the following happens (but i am in no case an expert for this):
> - alps driver suspends
> - alps driver unregisters the device
> - udev is called via call_usermodehelper (which fails since userspace
> is stopped)
> - now somebody wants to wait for udev which does not work right.

The thing is that kobject_uevent calls call_usermodehelper with
wait=0. That means that it conly waits for execve("/sbin/hotplug")
call to complete, it does not wait for the entire process ti complete.

If you look at Andy's second trace you will see that we are waiting
for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
know why IO does not complete? khelper is a kernel thread so it is
marked with
PF_NOFREEZE. Could it be that we managed to freeze kblockd?

--
Dmitry

2005-03-29 18:18:59

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > > In the SysRq-T trace I see one interesting process: most things are
> > > in D state in refrigerator(), but sh shows the following traceback:
> > >
> > > wait_for_completion
> > > call_usermodehelper
> > > kobject_hotplug
> > > kobject_del
> > > class_device_del
> > > class_device_unregister
> > > mousedev_disconnect
> > > input_unregister_device
> > > alps_disconnect
> > > psmouse_disconnect
> > > serio_driver_remove
> > > device_release_driver
> > > serio_release_driver
> >
> > i think the following happens (but i am in no case an expert for this):
> > - alps driver suspends
> > - alps driver unregisters the device
> > - udev is called via call_usermodehelper (which fails since userspace
> > is stopped)
> > - now somebody wants to wait for udev which does not work right.
>
> The thing is that kobject_uevent calls call_usermodehelper with
> wait=0. That means that it conly waits for execve("/sbin/hotplug")
> call to complete, it does not wait for the entire process ti complete.
>
> If you look at Andy's second trace you will see that we are waiting
> for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
> know why IO does not complete? khelper is a kernel thread so it is
> marked with
> PF_NOFREEZE. Could it be that we managed to freeze kblockd?

Uf, no idea about kblockd freezing -- we certainly should not.

*But*, if we are doing execve while system is frozen, something is
very wrong. We should not be doing execve in the first place.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-29 19:11:37

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tue, 29 Mar 2005 20:18:31 +0200, Pavel Machek <[email protected]> wrote:
> Hi!
>
> > If you look at Andy's second trace you will see that we are waiting
> > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
> > know why IO does not complete? khelper is a kernel thread so it is
> > marked with
> > PF_NOFREEZE. Could it be that we managed to freeze kblockd?
>
> Uf, no idea about kblockd freezing -- we certainly should not.
>
> *But*, if we are doing execve while system is frozen, something is
> very wrong. We should not be doing execve in the first place.

Well, there lies a problem - some devices have to do execve because
they need firmware to operate. Also, again, some buses with
hot-pluggable devices will attempt to clean up unsuccessful resume and
this will cause hotplug events. The point is you either resume system
or you don't. We probably need a separate "unfreeze" callback,
although this is kind of messy.

--
Dmitry

2005-03-29 19:32:07

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > > If you look at Andy's second trace you will see that we are waiting
> > > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
> > > know why IO does not complete? khelper is a kernel thread so it is
> > > marked with
> > > PF_NOFREEZE. Could it be that we managed to freeze kblockd?
> >
> > Uf, no idea about kblockd freezing -- we certainly should not.
> >
> > *But*, if we are doing execve while system is frozen, something is
> > very wrong. We should not be doing execve in the first place.
>
> Well, there lies a problem - some devices have to do execve because
> they need firmware to operate. Also, again, some buses with
> hot-pluggable devices will attempt to clean up unsuccessful resume and
> this will cause hotplug events. The point is you either resume system
> or you don't. We probably need a separate "unfreeze" callback,
> although this is kind of messy.

There's a better solution for firmware: You should load your firmware
prior to suspend and store it in RAM. Anything else just plain does
not work. (Because your wireless firmware might be on NFS mounted over
that wireless card).

Hotplug... I guess udev just needs to hold that callbacks before
system is fully up... it has to do something similar on regular boot,
no?
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-29 19:37:41

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thursday 24 March 2005 15:38, Stefan Seyfried wrote:
> Andy Isaacson wrote:
> > So I added i8042.noaux to my kernel command line, rebooted, insmodded
> > intel_agp, started X, and verified no touchpad action. Then I
> > suspended, and it worked fine. After restart, I suspended again - also
> > fine.
> >
> > So I think that fixed it. But no touchpad is a bit annoying. :)
>
> Yes, it was not thought as a fix but just for verification, since i have
> seen something similar.
> We have a SUSE bug for this, i believe Vojtech and Pavel will take care
> of this one. Thanks for confirming, i almost started to believe i was
> seeing ghosts :-)

Could you please try the patch below - it should fix the issues you are
seeing although there may be other devices (really any hot-pluggable
device) that will show the same behaviour. In the long run swsusp should
not attempt resuming devices when the system can not handle the process
properly.

--
Dmitry

===================================================================

Input: serio - do not attempt to immediately disconnect port if
resume failed, let kseriod take care of it. Otherwise we
may attempt to unregister associated input devices which
will generate hotplug events which are not handled well
during swsusp.

Signed-off-by: Dmitry Torokhov <[email protected]>

serio.c | 1 -
1 files changed, 1 deletion(-)

Index: dtor/drivers/input/serio/serio.c
===================================================================
--- dtor.orig/drivers/input/serio/serio.c
+++ dtor/drivers/input/serio/serio.c
@@ -779,7 +779,6 @@ static int serio_resume(struct device *d
struct serio *serio = to_serio_port(dev);

if (!serio->drv || !serio->drv->reconnect || serio->drv->reconnect(serio)) {
- serio_disconnect_port(serio);
/*
* Driver re-probing can take a while, so better let kseriod
* deal with it.

2005-03-29 20:05:49

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tue, 29 Mar 2005 21:23:39 +0200, Pavel Machek <[email protected]> wrote:
> Hi!
>
> > > > If you look at Andy's second trace you will see that we are waiting
> > > > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
> > > > know why IO does not complete? khelper is a kernel thread so it is
> > > > marked with
> > > > PF_NOFREEZE. Could it be that we managed to freeze kblockd?
> > >
> > > Uf, no idea about kblockd freezing -- we certainly should not.
> > >
> > > *But*, if we are doing execve while system is frozen, something is
> > > very wrong. We should not be doing execve in the first place.
> >
> > Well, there lies a problem - some devices have to do execve because
> > they need firmware to operate. Also, again, some buses with
> > hot-pluggable devices will attempt to clean up unsuccessful resume and
> > this will cause hotplug events. The point is you either resume system
> > or you don't. We probably need a separate "unfreeze" callback,
> > although this is kind of messy.
>
> There's a better solution for firmware: You should load your firmware
> prior to suspend and store it in RAM. Anything else just plain does
> not work. (Because your wireless firmware might be on NFS mounted over
> that wireless card).
>
> Hotplug... I guess udev just needs to hold that callbacks before
> system is fully up... it has to do something similar on regular boot,
> no?

Well, I did not really look into udev but hotplug (which can iteract
with udev) does not keep anything. If it fails its ok - that's why
there are coldplug scripts that "recover" lost events. But here we
block trying to start hotplug - we not getting an error - and this is
bad. Unfortunately I am not familiar with block devices working to say
why it hangs.

Should we pull Jens into the discussion?

--
Dmitry

2005-03-29 20:53:04

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > > Well, there lies a problem - some devices have to do execve because
> > > they need firmware to operate. Also, again, some buses with
> > > hot-pluggable devices will attempt to clean up unsuccessful resume and
> > > this will cause hotplug events. The point is you either resume system
> > > or you don't. We probably need a separate "unfreeze" callback,
> > > although this is kind of messy.
> >
> > There's a better solution for firmware: You should load your firmware
> > prior to suspend and store it in RAM. Anything else just plain does
> > not work. (Because your wireless firmware might be on NFS mounted over
> > that wireless card).
> >
> > Hotplug... I guess udev just needs to hold that callbacks before
> > system is fully up... it has to do something similar on regular boot,
> > no?
>
> Well, I did not really look into udev but hotplug (which can iteract
> with udev) does not keep anything. If it fails its ok - that's why
> there are coldplug scripts that "recover" lost events. But here we
> block trying to start hotplug - we not getting an error - and this is
> bad. Unfortunately I am not familiar with block devices working to say
> why it hangs.
>
> Should we pull Jens into the discussion?

I don't really want us to try execve during resume... Could we simply
artifically fail that execve with something if (in_suspend()) return
-EINVAL; [except that in_suspend() just is not there, but there were
some proposals to add it].

Or just avoid calling hotplug at all in resume case? And then do
coldplug-like scan when userspace is ready...

But we perhaps should cc linux-pm list.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-29 21:07:21

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tue, 29 Mar 2005 22:52:25 +0200, Pavel Machek <[email protected]> wrote:
> I don't really want us to try execve during resume... Could we simply
> artifically fail that execve with something if (in_suspend()) return
> -EINVAL; [except that in_suspend() just is not there, but there were
> some proposals to add it].
>
> Or just avoid calling hotplug at all in resume case? And then do
> coldplug-like scan when userspace is ready...
>

I am leaning towards calling disable_usermodehelper (not writtent yet)
after swsusp completes snapshotting memory. We really don't care about
hotplug events in this case and this will allow keeping "normal"
resume in drivers as is. What do you think?

--
Dmitry

2005-03-29 21:14:17

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > I don't really want us to try execve during resume... Could we simply
> > artifically fail that execve with something if (in_suspend()) return
> > -EINVAL; [except that in_suspend() just is not there, but there were
> > some proposals to add it].
> >
> > Or just avoid calling hotplug at all in resume case? And then do
> > coldplug-like scan when userspace is ready...
> >
>
> I am leaning towards calling disable_usermodehelper (not writtent yet)
> after swsusp completes snapshotting memory. We really don't care about
> hotplug events in this case and this will allow keeping "normal"
> resume in drivers as is. What do you think?

That would certianly do the trick.

[Or perhaps in_suspend() is slightly nicer solution? People wanted it
for other stuff (sanity checking, like BUG_ON(in_suspend())), too....]

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-29 21:25:13

by Patrick Mochel

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tue, 29 Mar 2005, Pavel Machek wrote:

> I don't really want us to try execve during resume... Could we simply
> artifically fail that execve with something if (in_suspend()) return
> -EINVAL; [except that in_suspend() just is not there, but there were
> some proposals to add it].
>
> Or just avoid calling hotplug at all in resume case? And then do
> coldplug-like scan when userspace is ready...

I thought that cold-plugging only worked for devices, not all objects.

Can we just queue up hotplug events? That way we wouldn't lose any across
the transition, and could be used to send resume events to userspace for
various devices that need help..

Pat

2005-03-29 21:36:59

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tue, 29 Mar 2005 23:12:39 +0200, Pavel Machek <[email protected]> wrote:
> >
> > I am leaning towards calling disable_usermodehelper (not writtent yet)
> > after swsusp completes snapshotting memory. We really don't care about
> > hotplug events in this case and this will allow keeping "normal"
> > resume in drivers as is. What do you think?
>
> That would certianly do the trick.
>
> [Or perhaps in_suspend() is slightly nicer solution? People wanted it
> for other stuff (sanity checking, like BUG_ON(in_suspend())), too....]
>

We might want having both... Hmm... in_suspend - is it only for swsusp
(in_swsusp) or for suspend-to-ram as well? For suspend to ram we might
need slightly different rules, I don't know. A separate call will
allow more fine-grained control and will explicitely tell reader what
is happening.

I do not have a strong preference though.

--
Dmitry

2005-03-29 21:41:22

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tue, 29 Mar 2005 13:23:35 -0800 (PST), Patrick Mochel
<[email protected]> wrote:
>
> On Tue, 29 Mar 2005, Pavel Machek wrote:
>
> > I don't really want us to try execve during resume... Could we simply
> > artifically fail that execve with something if (in_suspend()) return
> > -EINVAL; [except that in_suspend() just is not there, but there were
> > some proposals to add it].
> >
> > Or just avoid calling hotplug at all in resume case? And then do
> > coldplug-like scan when userspace is ready...
>
> I thought that cold-plugging only worked for devices, not all objects.
>

It really depens on the script - nothing stops it from traversing
entire /sys tree and if an object it not exported in the tree I'd say
userspace should not care about such object anyway.

> Can we just queue up hotplug events? That way we wouldn't lose any across
> the transition, and could be used to send resume events to userspace for
> various devices that need help..
>

The point is that at this point any changes to the system state will
be discarded - we already did the image and about to write it. When we
resume for real all those events will be regenerated once again.

--
Dmitry

2005-03-29 21:45:05

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On ?t 29-03-05 16:33:04, Dmitry Torokhov wrote:
> On Tue, 29 Mar 2005 23:12:39 +0200, Pavel Machek <[email protected]> wrote:
> > >
> > > I am leaning towards calling disable_usermodehelper (not writtent yet)
> > > after swsusp completes snapshotting memory. We really don't care about
> > > hotplug events in this case and this will allow keeping "normal"
> > > resume in drivers as is. What do you think?
> >
> > That would certianly do the trick.
> >
> > [Or perhaps in_suspend() is slightly nicer solution? People wanted it
> > for other stuff (sanity checking, like BUG_ON(in_suspend())), too....]
> >
>
> We might want having both... Hmm... in_suspend - is it only for swsusp
> (in_swsusp) or for suspend-to-ram as well? For suspend to ram we might
> need slightly different rules, I don't know. A separate call will
> allow more fine-grained control and will explicitely tell reader what
> is happening.

We currently freeze processes for suspend-to-ram, too. I guess that
disable_usermodehelper is probably better and that in_suspend() should
only be used for sanity checks... go with disable_usermodehelper and
sorry for the noise.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-29 22:33:22

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi.

On Wed, 2005-03-30 at 07:44, Pavel Machek wrote:
> We currently freeze processes for suspend-to-ram, too. I guess that
> disable_usermodehelper is probably better and that in_suspend() should
> only be used for sanity checks... go with disable_usermodehelper and
> sorry for the noise.

Here's another possibility: Freeze the workqueue that
call_usermodehelper uses (remember that code I didn't push hard enough
to Andrew?), and let invocations of call_usermodehelper block in
TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on kernel
processes in that state. Of course if you won't want the freeze
processes for str, but do want to freeze call_usermodehelper, I guess
you'd still need the in_suspend() macro.

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-03-29 22:40:02

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > We currently freeze processes for suspend-to-ram, too. I guess that
> > disable_usermodehelper is probably better and that in_suspend() should
> > only be used for sanity checks... go with disable_usermodehelper and
> > sorry for the noise.
>
> Here's another possibility: Freeze the workqueue that
> call_usermodehelper uses (remember that code I didn't push hard enough
> to Andrew?), and let invocations of call_usermodehelper block in
> TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on

There may be many devices in the system, and you are going to need
quite a lot of RAM for all that... That's why they do not queue it
during boot, IIRC. Disabling usermode helper seems right.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-29 23:21:51

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi,

On Tuesday, 29 of March 2005 23:12, Pavel Machek wrote:
> Hi!
>
> > > I don't really want us to try execve during resume... Could we simply
> > > artifically fail that execve with something if (in_suspend()) return
> > > -EINVAL; [except that in_suspend() just is not there, but there were
> > > some proposals to add it].
> > >
> > > Or just avoid calling hotplug at all in resume case? And then do
> > > coldplug-like scan when userspace is ready...
> > >
> >
> > I am leaning towards calling disable_usermodehelper (not writtent yet)
> > after swsusp completes snapshotting memory. We really don't care about
> > hotplug events in this case and this will allow keeping "normal"
> > resume in drivers as is. What do you think?
>
> That would certianly do the trick.
>
> [Or perhaps in_suspend() is slightly nicer solution? People wanted it
> for other stuff (sanity checking, like BUG_ON(in_suspend())), too....]

IMHO, they are not mutually exclusive. However, by using
disable_usermodehelper we would get rid of the reason (ie hotplug events)
instead of just curing the symptoms (ie execve() during suspend).

Greets,
Rafael

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-03-29 23:21:42

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi,

On Friday, 25 of March 2005 15:52, Dmitry Torokhov wrote:
> On Fri, 25 Mar 2005 15:24:15 +0100, Pavel Machek <[email protected]> wrote:
> > Hi!
> >
> > > > > > OK, anything else I should try?
> > > > >
> > > > > not really, i just wait for Vojtech and Pavel :-)
> > > >
> > > > Try commenting out "call_usermodehelper". If that helps, Stefan's
> > > > theory is confirmed, and this waits for Vojtech to fix it.
> > > >
> > >
> > > This is more of a general swsusp problem I believe - the second phase
> > > when it blindly resumes entire system. Resume of a device can fail
> > > (any reason whatsoever) and it will attempt to clean up after itself,
> > > but userspace is dead and hotplug never completes. While I am
> > > interested to know why ALPS does not want to resume on ANdy's laptop
> > > the issue will never be completely resolved from within the input
> > > system.
> >
> > When device fails to resume, what should I do? I think I could
> >
> > if (error)
> > panic("Device resume failed\n");
> >
> > , but... that does not look like what you want.
>
> Oh, always panic-happy Pavel ;). It really depends on what kind of
> device has faled to resume. If the device is really needed for writing
> image then panic is the only recourse, but if it some other device you
> resuming just ignore it, who cares...

Moreover, if we panic() here, we potentially lose data. IMO we should not
do this for a device that is not needed for saving the image and/or
contains the root filesystem.

> Btw, I dont think that doing selective resume (as opposed to selective
> suspend and Nigel's partial device trees) would be so much
> complicated. You'd always resume sysdevs and then, when iterating over
> "normal" devices, just skip ones not in resume path. It can all be
> contained in driver core I believe (sorry but no patch, for now at
> least).

In fact, the only devices that we really need to resume-during-suspend are
those necessary for saving the image.

Greets,
Rafael

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-03-29 23:23:50

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi,

On Friday, 25 of March 2005 17:04, Dmitry Torokhov wrote:
> On Fri, 25 Mar 2005 16:42:37 +0100, Pavel Machek <[email protected]> wrote:
> > Hi!
> >
> > > > > This is more of a general swsusp problem I believe - the second phase
> > > > > when it blindly resumes entire system. Resume of a device can fail
> > > > > (any reason whatsoever) and it will attempt to clean up after itself,
> > > > > but userspace is dead and hotplug never completes. While I am
> > > > > interested to know why ALPS does not want to resume on ANdy's laptop
> > > > > the issue will never be completely resolved from within the input
> > > > > system.
> > > >
> > > > When device fails to resume, what should I do? I think I could
> > > >
> > > > if (error)
> > > > panic("Device resume failed\n");
> > > >
> > > > , but... that does not look like what you want.
> > >
> > > Oh, always panic-happy Pavel ;). It really depends on what kind of
> > > device has faled to resume. If the device is really needed for writing
> > > image then panic is the only recourse, but if it some other device you
> > > resuming just ignore it, who cares...
> >
> > You are right, for resume-during-suspend, we may as well risk it. We
> > have consistent state, and if we happen to write it on disk,
> > everything is okay.
> >
> > For resume-during-resume, I don't really know how we can handle
> > that. Running with some devices non-working seems dangerous to me.
> >
>
> I think it again varies, and the driver would have to decide what to
> do if it can not resume hardware.

Well, I don't think that the driver would be able to state that its failure
is "serious enough", for example, to panic(). This is only known to the
higher-level code that calls the driver's _resume() routine. IMO the driver
should not make any assumptions of its importance (eg a SCSI driver
that panic()s, because it's unable to resume a disk which does not
even contain a mounted partition is not a good idea ;-)).

> Take for example USB - i believe USB
> guys are shooting at being able to disconnect device while the box is
> suspended and have it removed from the system when resuming. In
> Probably every driver that has even a slighest notion of
> hot-pluggability should just properly clean up after itself and not
> signal error to the core.

Unless, for instance, one of its devices contains the root filesystem.

> > > Btw, I dont think that doing selective resume (as opposed to selective
> > > suspend and Nigel's partial device trees) would be so much
> > > complicated. You'd always resume sysdevs and then, when iterating over
> > > "normal" devices, just skip ones not in resume path. It can all be
> > > contained in driver core I believe (sorry but no patch, for now at
> > > least).
> >
> > :-) I think we can simply make device freeze/unfreeze fast enough.
> > [We do not need to do full suspend/resume; freeze is enough].

If the driver is compiled as a module, its devices may be uninitialized
when its _resume() routine is called (eg in the resume-during-resume).
Hence, IMHO, we can forget the "unfreeze" thing until we can differentiate
the resume-during-suspend from the resume-during-resume etc. ...

Greets,
Rafael

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-03-29 23:46:05

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi.

On Wed, 2005-03-30 at 08:35, Pavel Machek wrote:
> Hi!
>
> > > We currently freeze processes for suspend-to-ram, too. I guess that
> > > disable_usermodehelper is probably better and that in_suspend() should
> > > only be used for sanity checks... go with disable_usermodehelper and
> > > sorry for the noise.
> >
> > Here's another possibility: Freeze the workqueue that
> > call_usermodehelper uses (remember that code I didn't push hard enough
> > to Andrew?), and let invocations of call_usermodehelper block in
> > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on
>
> There may be many devices in the system, and you are going to need
> quite a lot of RAM for all that... That's why they do not queue it
> during boot, IIRC. Disabling usermode helper seems right.

Many devices is true, but very few of them invoke usermode helpers.

[desktop build-2.6.12-rc1]# find -name *.[ch] | xargs grep usermodehelper
./drivers/s390/crypto/z90main.c: call_usermodehelper(argv[0], argv, envp, 0);
./drivers/net/hamradio/baycom_epp.c: return call_usermodehelper(eppconfig_path, argv, envp, 1);
./drivers/acpi/thermal.c: call_usermodehelper(argv[0], argv, envp, 0);
./drivers/acpi/thermal.mod.c: { 0x436006da, "call_usermodehelper" },
./drivers/input/input.c: value = call_usermodehelper(argv [0], argv, envp, 0);
./drivers/pnp/pnpbios/core.c: value = call_usermodehelper (argv [0], argv, envp, 0);
./drivers/macintosh/therm_pm72.c: return call_usermodehelper(critical_overtemp_path, argv, envp, 0);
./arch/i386/mach-voyager/voyager_thread.c: if ((ret = call_usermodehelper(argv[0], argv, envp, 1)) != 0) {
./include/linux/kmod.h:extern int call_usermodehelper(char *path, char *argv[], char *envp[], int wait);
./include/linux/kmod.h:extern void usermodehelper_init(void);
./kernel/power/main.c: return call_usermodehelper(argv[0], argv, envp, 1);
./kernel/power/suspend_userui.c: retval = call_usermodehelper(userui_program, argv, envp, 0);
./kernel/kmod.c: call_usermodehelper wait flag, and remove exec_usermodehelper.
./kernel/kmod.c: ret = call_usermodehelper(modprobe_path, argv, envp, 1);
./kernel/kmod.c:static int ____call_usermodehelper(void *data)
./kernel/kmod.c: pid = kernel_thread(____call_usermodehelper, sub_info, SIGCHLD);
./kernel/kmod.c:static void __call_usermodehelper(void *data)
./kernel/kmod.c: pid = kernel_thread(____call_usermodehelper, sub_info,
./kernel/kmod.c: * call_usermodehelper - start a usermode application
./kernel/kmod.c:int call_usermodehelper(char *path, char **argv, char **envp, int wait)
./kernel/kmod.c: DECLARE_WORK(work, __call_usermodehelper, &sub_info);
./kernel/kmod.c:EXPORT_SYMBOL(call_usermodehelper);
./kernel/kmod.c:void __init usermodehelper_init(void)
./kernel/cpuset.c: * Note final arg to call_usermodehelper() is 0 - that means
./kernel/cpuset.c: return call_usermodehelper(argv[0], argv, envp, 0);
./security/keys/request_key.c: return call_usermodehelper(argv[0], argv, envp, 1);
./lib/kobject_uevent.c: retval = call_usermodehelper (argv[0], argv, envp, 0);
./lib/kobject_uevent.c: pr_debug ("%s - call_usermodehelper returned %d\n",
./init/main.c: usermodehelper_init();

Of course there will be indirect invocations (via kobjects, for
example), but I still think the number is not that great. I'm already
using the method I suggested in unreleased Suspend2 code, and the only
invocation I'm catch is at resume time, for the keseriod.

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-03-30 07:25:24

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tue, Mar 29, 2005 at 01:42:26PM -0500, Dmitry Torokhov wrote:
> Could you please try the patch below - it should fix the issues you are
[snip]
> --- dtor.orig/drivers/input/serio/serio.c
> +++ dtor/drivers/input/serio/serio.c
> if (!serio->drv || !serio->drv->reconnect || serio->drv->reconnect(serio)) {
> - serio_disconnect_port(serio);
> /*
> * Driver re-probing can take a while, so better let kseriod

Yep, that fixes it. I applied your patch to 2.6.12-rc1-mm1 and
suspended and resumed 5 times in a row without any difficulty. Thanks!

-andy

2005-03-30 07:27:45

[permalink] [raw]

Subject: Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Fri, Mar 25, 2005 at 09:58:40AM -0500, Dmitry Torokhov wrote:
> I wonder why ALPS reconnect failed. You don't have a serial console
> set up, do you? If not then maybe you could make a huge framebuffer to
> capture as much info as you can... I hope you have a digital camera ;)

No serial ports brought out on this laptop, and I've not tried
framebuffer...

> Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to
> suspend. I am interested of data coming in and out of i8042.

Transcribed by hand, the last few bytes are
< fa ACK
> d4 e9 GETINFO
< fa 20 00 64
> d4 ff RESET_BAT
< fa aa 00 RET_BAT

(Because I used O= the __FILE__ is very long so each dbg() takes two lines
of my 80x25 console...)

Dunno if that's helpful, sorry...

-andy

2005-03-30 10:03:39

by Greg KH

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tue, Mar 29, 2005 at 01:23:35PM -0800, Patrick Mochel wrote:
>
> On Tue, 29 Mar 2005, Pavel Machek wrote:
>
> > I don't really want us to try execve during resume... Could we simply
> > artifically fail that execve with something if (in_suspend()) return
> > -EINVAL; [except that in_suspend() just is not there, but there were
> > some proposals to add it].
> >
> > Or just avoid calling hotplug at all in resume case? And then do
> > coldplug-like scan when userspace is ready...
>
> I thought that cold-plugging only worked for devices, not all objects.

We can walk the whole sysfs tree and create "cold" hotplug events.
udevstart does that for devices that udev cares about (as an example.)

> Can we just queue up hotplug events? That way we wouldn't lose any across
> the transition, and could be used to send resume events to userspace for
> various devices that need help..

Ick, I really hate this idea, but there is a patch in the SuSE kernel to
do this at boot time. Hopefully the author of that patch resubmitts it
again and maybe it will make it eventually into mainline...

thanks,

greg k-h

2005-03-31 07:30:26

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Tuesday 29 March 2005 17:35, Pavel Machek wrote:
> Hi!
>
> > > We currently freeze processes for suspend-to-ram, too. I guess that
> > > disable_usermodehelper is probably better and that in_suspend() should
> > > only be used for sanity checks... go with disable_usermodehelper and
> > > sorry for the noise.
> >
> > Here's another possibility: Freeze the workqueue that
> > call_usermodehelper uses (remember that code I didn't push hard enough
> > to Andrew?), and let invocations of call_usermodehelper block in
> > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on
>
> There may be many devices in the system, and you are going to need
> quite a lot of RAM for all that... That's why they do not queue it
> during boot, IIRC. Disabling usermode helper seems right.

Ok, what do you think about this one?

===================================================================

swsusp: disable usermodehelper after generating memory snapshot and
before resuming devices, so when device fails to resume we
won't try to call hotplug - userspace stopped anyway.

Signed-off-by: Dmitry Torokhov <[email protected]>

include/linux/kmod.h | 3 +++
kernel/kmod.c | 14 +++++++++++++-
kernel/power/disk.c | 2 ++
kernel/power/swsusp.c | 1 -
4 files changed, 18 insertions(+), 2 deletions(-)

Index: dtor/kernel/power/disk.c
===================================================================
--- dtor.orig/kernel/power/disk.c
+++ dtor/kernel/power/disk.c
@@ -205,6 +205,8 @@ int pm_suspend_disk(void)

if (in_suspend) {
pr_debug("PM: writing image.\n");
+ usermodehelper_disable();
+ device_resume();
error = swsusp_write();
if (!error)
power_down(pm_disk_mode);
Index: dtor/kernel/power/swsusp.c
===================================================================
--- dtor.orig/kernel/power/swsusp.c
+++ dtor/kernel/power/swsusp.c
@@ -853,7 +853,6 @@ static int suspend_prepare_image(void)
int swsusp_write(void)
{
int error;
- device_resume();
lock_swapdevices();
error = write_suspend_image();
/* This will unlock ignored swap devices since writing is finished */
Index: dtor/kernel/kmod.c
===================================================================
--- dtor.orig/kernel/kmod.c
+++ dtor/kernel/kmod.c
@@ -124,6 +124,8 @@ struct subprocess_info {
int retval;
};

+static int usermodehelper_disabled;
+
/*
* This is the task which runs the usermode application
*/
@@ -240,7 +242,7 @@ int call_usermodehelper(char *path, char
if (!khelper_wq)
return -EBUSY;

- if (path[0] == '\0')
+ if (usermodehelper_disabled || path[0] == '\0')
return 0;

queue_work(khelper_wq, &work);
@@ -249,6 +251,16 @@ int call_usermodehelper(char *path, char
}
EXPORT_SYMBOL(call_usermodehelper);

+void usermodehelper_enable(void)
+{
+ usermodehelper_disabled = 0;
+}
+
+void usermodehelper_disable(void)
+{
+ usermodehelper_disabled = 1;
+}
+
void __init usermodehelper_init(void)
{
khelper_wq = create_singlethread_workqueue("khelper");
Index: dtor/include/linux/kmod.h
===================================================================
--- dtor.orig/include/linux/kmod.h
+++ dtor/include/linux/kmod.h
@@ -34,7 +34,10 @@ static inline int request_module(const c
#endif

#define try_then_request_module(x, mod...) ((x) ?: (request_module(mod), (x)))
+
extern int call_usermodehelper(char *path, char *argv[], char *envp[], int wait);
extern void usermodehelper_init(void);
+extern void usermodehelper_enable(void);
+extern void usermodehelper_disable(void);

#endif /* __LINUX_KMOD_H__ */

2005-03-31 08:39:33

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > > > We currently freeze processes for suspend-to-ram, too. I guess that
> > > > disable_usermodehelper is probably better and that in_suspend() should
> > > > only be used for sanity checks... go with disable_usermodehelper and
> > > > sorry for the noise.
> > >
> > > Here's another possibility: Freeze the workqueue that
> > > call_usermodehelper uses (remember that code I didn't push hard enough
> > > to Andrew?), and let invocations of call_usermodehelper block in
> > > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on
> >
> > There may be many devices in the system, and you are going to need
> > quite a lot of RAM for all that... That's why they do not queue it
> > during boot, IIRC. Disabling usermode helper seems right.
>
> Ok, what do you think about this one?
>
> ===================================================================
>
> swsusp: disable usermodehelper after generating memory snapshot and
> before resuming devices, so when device fails to resume we
> won't try to call hotplug - userspace stopped anyway.
>
> Signed-off-by: Dmitry Torokhov <[email protected]>
>
>
> include/linux/kmod.h | 3 +++
> kernel/kmod.c | 14 +++++++++++++-
> kernel/power/disk.c | 2 ++
> kernel/power/swsusp.c | 1 -
> 4 files changed, 18 insertions(+), 2 deletions(-)
>
> Index: dtor/kernel/power/disk.c
> ===================================================================
> --- dtor.orig/kernel/power/disk.c
> +++ dtor/kernel/power/disk.c
> @@ -205,6 +205,8 @@ int pm_suspend_disk(void)
>
> if (in_suspend) {
> pr_debug("PM: writing image.\n");
> + usermodehelper_disable();
> + device_resume();
> error = swsusp_write();
> if (!error)
> power_down(pm_disk_mode);
> Index: dtor/kernel/power/swsusp.c
> ===================================================================
> --- dtor.orig/kernel/power/swsusp.c
> +++ dtor/kernel/power/swsusp.c
> @@ -853,7 +853,6 @@ static int suspend_prepare_image(void)
> int swsusp_write(void)
> {
> int error;
> - device_resume();
> lock_swapdevices();
> error = write_suspend_image();
> /* This will unlock ignored swap devices since writing is

Looks good, except... why move code around? Could you just call
usermodehelper_disable from swsusp_write?
Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-31 15:04:48

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, 31 Mar 2005 10:39:10 +0200, Pavel Machek <[email protected]> wrote:
> > int swsusp_write(void)
> > {
> > int error;
> > - device_resume();
> > lock_swapdevices();
> > error = write_suspend_image();
> > /* This will unlock ignored swap devices since writing is
>
> Looks good, except... why move code around? Could you just call
> usermodehelper_disable from swsusp_write?

That's because I don't think that swsusp_write is a proper place for
it ;) It looks like a lean and mean function that does just write and
manipulating usermodehelper state _and_ system (device) state is
wrong. Let it do one thing, don't overload with actions that I think
belong to the upper level. Do you agree?

I think I need to stick in usermodehelper_enable call in case
swsusp_write fails though.

--
Dmitry

2005-03-31 16:03:05

by Patrick Mochel

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, 31 Mar 2005, Dmitry Torokhov wrote:

> Ok, what do you think about this one?
>
> ===================================================================
>
> swsusp: disable usermodehelper after generating memory snapshot and
> before resuming devices, so when device fails to resume we
> won't try to call hotplug - userspace stopped anyway.

Hm, shouldn't we disable it before we start to freeze processes? We don't
want any more processes trying to start up after we've taken care of
them..

Thanks,

Pat

2005-03-31 16:32:55

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

On Thu, 31 Mar 2005 08:02:44 -0800 (PST), Patrick Mochel
<[email protected]> wrote:
>
> On Thu, 31 Mar 2005, Dmitry Torokhov wrote:
>
> > Ok, what do you think about this one?
> >
> > ===================================================================
> >
> > swsusp: disable usermodehelper after generating memory snapshot and
> > before resuming devices, so when device fails to resume we
> > won't try to call hotplug - userspace stopped anyway.
>
> Hm, shouldn't we disable it before we start to freeze processes? We don't
> want any more processes trying to start up after we've taken care of
> them..
>

Can't a device be removed (for any reason) _while_ we are freezing
processes? I think freeszing code will properly deal with it... What
about suspend semantics - if suspend fails do we say the device should
be operational or the system should attempt to re-initialize? I.e. we
are not doing suspend after all - can we still drop messages on the
floor? After all, we still have ability to run coldplug after failed
suspend.

I frankly am not sure at what point to disable usermode helper. Or
maybe we need to have a list of pending events and suspend khelper_wq
while suspending.

--
Dmitry

2005-03-31 22:17:13

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi.

On Fri, 2005-04-01 at 02:32, Dmitry Torokhov wrote:
> On Thu, 31 Mar 2005 08:02:44 -0800 (PST), Patrick Mochel
> <[email protected]> wrote:
> >
> > On Thu, 31 Mar 2005, Dmitry Torokhov wrote:
> >
> > > Ok, what do you think about this one?
> > >
> > > ===================================================================
> > >
> > > swsusp: disable usermodehelper after generating memory snapshot and
> > > before resuming devices, so when device fails to resume we
> > > won't try to call hotplug - userspace stopped anyway.
> >
> > Hm, shouldn't we disable it before we start to freeze processes? We don't
> > want any more processes trying to start up after we've taken care of
> > them..
> >
>
> Can't a device be removed (for any reason) _while_ we are freezing
> processes? I think freeszing code will properly deal with it... What
> about suspend semantics - if suspend fails do we say the device should
> be operational or the system should attempt to re-initialize? I.e. we
> are not doing suspend after all - can we still drop messages on the
> floor? After all, we still have ability to run coldplug after failed
> suspend.
>
> I frankly am not sure at what point to disable usermode helper. Or
> maybe we need to have a list of pending events and suspend khelper_wq
> while suspending.

FWIW, my solution is purely freezer based. I freeze khelper and in the
freezer code ignore kernel threads in state uninterruptible (which is
where kseriod, eg, will be while it waits for the usermode helper
process (which also gets frozen).

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-03-31 22:22:14

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi!

> > > Ok, what do you think about this one?
> > >
> > > ===================================================================
> > >
> > > swsusp: disable usermodehelper after generating memory snapshot and
> > > before resuming devices, so when device fails to resume we
> > > won't try to call hotplug - userspace stopped anyway.
> >
> > Hm, shouldn't we disable it before we start to freeze processes? We don't
> > want any more processes trying to start up after we've taken care of
> > them..
> >
>
> Can't a device be removed (for any reason) _while_ we are freezing
> processes? I think freeszing code will properly deal with it... What
> about suspend semantics - if suspend fails do we say the device should
> be operational or the system should attempt to re-initialize? I.e. we
> are not doing suspend after all - can we still drop messages on the
> floor? After all, we still have ability to run coldplug after failed
> suspend.

I believe we should freeze hotplug before processes. Dropping messages
on the floor should not be a problem, we should just call coldplug
after failed suspend.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-31 22:29:22

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi.

On Fri, 2005-04-01 at 08:18, Pavel Machek wrote:
> Hi!
>
> > > > Ok, what do you think about this one?
> > > >
> > > > ===================================================================
> > > >
> > > > swsusp: disable usermodehelper after generating memory snapshot and
> > > > before resuming devices, so when device fails to resume we
> > > > won't try to call hotplug - userspace stopped anyway.
> > >
> > > Hm, shouldn't we disable it before we start to freeze processes? We don't
> > > want any more processes trying to start up after we've taken care of
> > > them..
> > >
> >
> > Can't a device be removed (for any reason) _while_ we are freezing
> > processes? I think freeszing code will properly deal with it... What
> > about suspend semantics - if suspend fails do we say the device should
> > be operational or the system should attempt to re-initialize? I.e. we
> > are not doing suspend after all - can we still drop messages on the
> > floor? After all, we still have ability to run coldplug after failed
> > suspend.
>
> I believe we should freeze hotplug before processes. Dropping messages
> on the floor should not be a problem, we should just call coldplug
> after failed suspend.

How will you know which devices to call coldplug for, post resume? (Or
does it figure that out itself somehow?)

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-04-01 08:50:55

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Hi,

On Friday, 1 of April 2005 00:28, Nigel Cunningham wrote:
> Hi.
>
> On Fri, 2005-04-01 at 08:18, Pavel Machek wrote:
> > Hi!
> >
> > > > > Ok, what do you think about this one?
> > > > >
> > > > > ===================================================================
> > > > >
> > > > > swsusp: disable usermodehelper after generating memory snapshot and
> > > > > before resuming devices, so when device fails to resume we
> > > > > won't try to call hotplug - userspace stopped anyway.
> > > >
> > > > Hm, shouldn't we disable it before we start to freeze processes? We don't
> > > > want any more processes trying to start up after we've taken care of
> > > > them..
> > > >
> > >
> > > Can't a device be removed (for any reason) _while_ we are freezing
> > > processes? I think freeszing code will properly deal with it... What
> > > about suspend semantics - if suspend fails do we say the device should
> > > be operational or the system should attempt to re-initialize? I.e. we
> > > are not doing suspend after all - can we still drop messages on the
> > > floor? After all, we still have ability to run coldplug after failed
> > > suspend.
> >
> > I believe we should freeze hotplug before processes.

I agree. IMO user space should not be considered as available once we have
started freezing processes, so hotplug should be disabled before. By the same
token, it should only be enabled after the processes have been restarted
during resume (or after suspend has failed).

BTW, it seems to me that the forking of new processes could be disabled
before we start to freeze the existing ones.

> > Dropping messages on the floor should not be a problem, we should just
> > call coldplug after failed suspend.
>
> How will you know which devices to call coldplug for, post resume? (Or
> does it figure that out itself somehow?)

I think the drivers that need the hotplug to resume should defer their resume
routines until usermodehelper is enabled (it seems to me that we can use
a completion to handle this).

Greets,
Rafael

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-04-01 10:33:44

[permalink] [raw]

Subject: Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

Rafael J. Wysocki wrote:
> Hi,
>> On Fri, 2005-04-01 at 08:18, Pavel Machek wrote:
>> > I believe we should freeze hotplug before processes.
>
> I agree. IMO user space should not be considered as available once we have
> started freezing processes, so hotplug should be disabled before. By the same
> token, it should only be enabled after the processes have been restarted
> during resume (or after suspend has failed).

it has probably to be enabled before the processes are restarted - they
may rightfully assume that hotplug is working.
--
seife
Never trust a computer you can't lift.

2005-05-27 17:45:44