2013-08-12 19:44:45

by Andrew Savchenko

[permalink] [raw]
Subject: 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

Hello,

after a kernel update from 3.5.7 to the latest stable I found that
user-space resume (from suspend-1.0 aka uswsusp) no longer works.
Kernel-space suspend and resume work fine (e.g. echo disk
> /sys/power/state), problem is with user-space support. (I need
user-space version because it supports image encryption.)

After resume (essentially linuxrc) application loads image it fails
to apply it:

========================================================
Processes could not be frozen, cannot continue resuming.
Error 11: Resource temporarily unavailable

You can now boot the system and lose the saved state
or reboot and try again.

[Notice that if you decide to reboot, you MUST NOT mount
any filesystems before a successful resume.
Resuming after some filesystems have been mounted
will badly damage these filesystems.]

Do you want to continue booting (Y/n)?
========================================================

Error code wasn't originally showed, I added it to suspend tool to
aid debugging. Essentially freeze ioctl on /dev/snapshot fails with
this error.

I bisected a commit which introduces this bug:

========================================================
commit ba4df2808a86f8b103c4db0b8807649383e9bd13
Author: Al Viro <[email protected]>
Date: Tue Oct 2 15:29:10 2012 -0400

don't bother with kernel_thread/kernel_execve for launching
linuxrc
exec_usermodehelper_fns() will do just fine...

Signed-off-by: Al Viro <[email protected]>
========================================================

In fact this commit induced/triggered at least two bugs: the first one
I'm facing now and the second one was fixed in commit
f0de17c0babe7f29381892def6b37e9181a53410:
make sure that /linuxrc has std{in,out,err}.

As a temporarily workaround for this issue I reverted all changes for
init/do_mounts_initrd.c up to the latest working commit
cb450766bcafc7bd7d40e9a5a0050745e8c68b3e considering the kernel API
changes (kernel_execve -> sys_execve). See linuxrc-workaround.patch.
I understand this isn't a proper solution, I just want to show what
code works for me.

I also found an interesting LKML discussion about s2disk and freezer
issue: http://www.spinics.net/lists/linux-nfs/msg38160.html
Maybe it is related to this bug, but patch proposed there doesn't in
my case.

Kernel config which fails with
ba4df2808a86f8b103c4db0b8807649383e9bd13 and works with
f0de17c0babe7f29381892def6b37e9181a53410 is also attached.

As this issue maybe hardware related, the system is 32-bit EEE PC
1000H with Atom N270, 2GB RAM, 750 GB SATA drive.

Additional (but probably useless) information on this bug may be found
here: https://forums.gentoo.org/viewtopic-p-7371120.html

Best regards,
Andrew Savchenko


Attachments:
(No filename) (2.69 kB)
config.xz (9.59 kB)
linuxrc-workaround.patch (2.96 kB)
(No filename) (836.00 B)
Download all attachments

2013-08-27 03:49:21

by Andrew Savchenko

[permalink] [raw]
Subject: Re: [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

Hello,

On Mon, 12 Aug 2013 23:44:15 +0400 Andrew Savchenko wrote:
> after a kernel update from 3.5.7 to the latest stable I found that
> user-space resume (from suspend-1.0 aka uswsusp) no longer works.
> Kernel-space suspend and resume work fine (e.g. echo disk
> > /sys/power/state), problem is with user-space support. (I need
> user-space version because it supports image encryption.)
>
> After resume (essentially linuxrc) application loads image it fails
> to apply it:
>
> ========================================================
> Processes could not be frozen, cannot continue resuming.
> Error 11: Resource temporarily unavailable
>
> You can now boot the system and lose the saved state
> or reboot and try again.
>
> [Notice that if you decide to reboot, you MUST NOT mount
> any filesystems before a successful resume.
> Resuming after some filesystems have been mounted
> will badly damage these filesystems.]
>
> Do you want to continue booting (Y/n)?
> ========================================================
>
> Error code wasn't originally showed, I added it to suspend tool to
> aid debugging. Essentially freeze ioctl on /dev/snapshot fails with
> this error.
>
> I bisected a commit which introduces this bug:
>
> ========================================================
> commit ba4df2808a86f8b103c4db0b8807649383e9bd13
> Author: Al Viro <[email protected]>
> Date: Tue Oct 2 15:29:10 2012 -0400
>
> don't bother with kernel_thread/kernel_execve for launching
> linuxrc
> exec_usermodehelper_fns() will do just fine...
>
> Signed-off-by: Al Viro <[email protected]>
> ========================================================
>
> In fact this commit induced/triggered at least two bugs: the first one
> I'm facing now and the second one was fixed in commit
> f0de17c0babe7f29381892def6b37e9181a53410:
> make sure that /linuxrc has std{in,out,err}.
>
> As a temporarily workaround for this issue I reverted all changes for
> init/do_mounts_initrd.c up to the latest working commit
> cb450766bcafc7bd7d40e9a5a0050745e8c68b3e considering the kernel API
> changes (kernel_execve -> sys_execve). See linuxrc-workaround.patch.
> I understand this isn't a proper solution, I just want to show what
> code works for me.
>
> I also found an interesting LKML discussion about s2disk and freezer
> issue: http://www.spinics.net/lists/linux-nfs/msg38160.html
> Maybe it is related to this bug, but patch proposed there doesn't in
> my case.
>
> Kernel config which fails with
> ba4df2808a86f8b103c4db0b8807649383e9bd13 and works with
> f0de17c0babe7f29381892def6b37e9181a53410 is also attached.
>
> As this issue maybe hardware related, the system is 32-bit EEE PC
> 1000H with Atom N270, 2GB RAM, 750 GB SATA drive.
>
> Additional (but probably useless) information on this bug may be found
> here: https://forums.gentoo.org/viewtopic-p-7371120.html

This bug is still here with 3.11-rc7 and 3.10.9.
I opened a kernel bug 60802 for this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=60802

Any ideas?

Best regards,
Andrew Savchenko


Attachments:
(No filename) (3.03 kB)
(No filename) (836.00 B)
Download all attachments

2013-09-05 12:08:15

by Pavel Machek

[permalink] [raw]
Subject: Re: [Suspend-devel] [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

Hi!

Rafael, Al: apparently we have a regression caused by
ba4df2808a86f8b103c4db0b8807649383e9bd13 .

> > after a kernel update from 3.5.7 to the latest stable I found that
> > user-space resume (from suspend-1.0 aka uswsusp) no longer works.
> > Kernel-space suspend and resume work fine (e.g. echo disk
> > > /sys/power/state), problem is with user-space support. (I need
> > user-space version because it supports image encryption.)
> >
> > After resume (essentially linuxrc) application loads image it fails
> > to apply it:
> >
> > ========================================================
> > Processes could not be frozen, cannot continue resuming.
> > Error 11: Resource temporarily unavailable
> >
> > You can now boot the system and lose the saved state
> > or reboot and try again.
> >
> > [Notice that if you decide to reboot, you MUST NOT mount
> > any filesystems before a successful resume.
> > Resuming after some filesystems have been mounted
> > will badly damage these filesystems.]
> >
> > Do you want to continue booting (Y/n)?
> > ========================================================
> >
> > Error code wasn't originally showed, I added it to suspend tool to
> > aid debugging. Essentially freeze ioctl on /dev/snapshot fails with
> > this error.
> >
> > I bisected a commit which introduces this bug:
> >
> > ========================================================
> > commit ba4df2808a86f8b103c4db0b8807649383e9bd13
> > Author: Al Viro <[email protected]>
> > Date: Tue Oct 2 15:29:10 2012 -0400
> >
> > don't bother with kernel_thread/kernel_execve for launching
> > linuxrc
> > exec_usermodehelper_fns() will do just fine...
> >
> > Signed-off-by: Al Viro <[email protected]>
> > ========================================================
> >
> > In fact this commit induced/triggered at least two bugs: the first one
> > I'm facing now and the second one was fixed in commit
> > f0de17c0babe7f29381892def6b37e9181a53410:
> > make sure that /linuxrc has std{in,out,err}.
> >
> > As a temporarily workaround for this issue I reverted all changes for
> > init/do_mounts_initrd.c up to the latest working commit
> > cb450766bcafc7bd7d40e9a5a0050745e8c68b3e considering the kernel API
> > changes (kernel_execve -> sys_execve). See linuxrc-workaround.patch.
> > I understand this isn't a proper solution, I just want to show what
> > code works for me.
> >
> > I also found an interesting LKML discussion about s2disk and freezer
> > issue: http://www.spinics.net/lists/linux-nfs/msg38160.html
> > Maybe it is related to this bug, but patch proposed there doesn't in
> > my case.
> >
> > Kernel config which fails with
> > ba4df2808a86f8b103c4db0b8807649383e9bd13 and works with
> > f0de17c0babe7f29381892def6b37e9181a53410 is also attached.
> >
> > As this issue maybe hardware related, the system is 32-bit EEE PC
> > 1000H with Atom N270, 2GB RAM, 750 GB SATA drive.
> >
> > Additional (but probably useless) information on this bug may be found
> > here: https://forums.gentoo.org/viewtopic-p-7371120.html
>
> This bug is still here with 3.11-rc7 and 3.10.9.
> I opened a kernel bug 60802 for this issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=60802
>
> Any ideas?
>
> Best regards,
> Andrew Savchenko



> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk

> _______________________________________________
> Suspend-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/suspend-devel


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-09-05 12:12:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Suspend-devel] [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

On Thursday, September 05, 2013 02:08:11 PM Pavel Machek wrote:
> Hi!
>
> Rafael, Al: apparently we have a regression caused by
> ba4df2808a86f8b103c4db0b8807649383e9bd13 .

I noticed that, but I'm not sure how to deal with it.

Also, s2disk still works on my test machines, so that seems to be
specific to this particular configuration.

> > > after a kernel update from 3.5.7 to the latest stable I found that
> > > user-space resume (from suspend-1.0 aka uswsusp) no longer works.
> > > Kernel-space suspend and resume work fine (e.g. echo disk
> > > > /sys/power/state), problem is with user-space support. (I need
> > > user-space version because it supports image encryption.)
> > >
> > > After resume (essentially linuxrc) application loads image it fails
> > > to apply it:
> > >
> > > ========================================================
> > > Processes could not be frozen, cannot continue resuming.
> > > Error 11: Resource temporarily unavailable
> > >
> > > You can now boot the system and lose the saved state
> > > or reboot and try again.
> > >
> > > [Notice that if you decide to reboot, you MUST NOT mount
> > > any filesystems before a successful resume.
> > > Resuming after some filesystems have been mounted
> > > will badly damage these filesystems.]
> > >
> > > Do you want to continue booting (Y/n)?
> > > ========================================================
> > >
> > > Error code wasn't originally showed, I added it to suspend tool to
> > > aid debugging. Essentially freeze ioctl on /dev/snapshot fails with
> > > this error.
> > >
> > > I bisected a commit which introduces this bug:
> > >
> > > ========================================================
> > > commit ba4df2808a86f8b103c4db0b8807649383e9bd13
> > > Author: Al Viro <[email protected]>
> > > Date: Tue Oct 2 15:29:10 2012 -0400
> > >
> > > don't bother with kernel_thread/kernel_execve for launching
> > > linuxrc
> > > exec_usermodehelper_fns() will do just fine...
> > >
> > > Signed-off-by: Al Viro <[email protected]>
> > > ========================================================
> > >
> > > In fact this commit induced/triggered at least two bugs: the first one
> > > I'm facing now and the second one was fixed in commit
> > > f0de17c0babe7f29381892def6b37e9181a53410:
> > > make sure that /linuxrc has std{in,out,err}.
> > >
> > > As a temporarily workaround for this issue I reverted all changes for
> > > init/do_mounts_initrd.c up to the latest working commit
> > > cb450766bcafc7bd7d40e9a5a0050745e8c68b3e considering the kernel API
> > > changes (kernel_execve -> sys_execve). See linuxrc-workaround.patch.
> > > I understand this isn't a proper solution, I just want to show what
> > > code works for me.
> > >
> > > I also found an interesting LKML discussion about s2disk and freezer
> > > issue: http://www.spinics.net/lists/linux-nfs/msg38160.html
> > > Maybe it is related to this bug, but patch proposed there doesn't in
> > > my case.
> > >
> > > Kernel config which fails with
> > > ba4df2808a86f8b103c4db0b8807649383e9bd13 and works with
> > > f0de17c0babe7f29381892def6b37e9181a53410 is also attached.
> > >
> > > As this issue maybe hardware related, the system is 32-bit EEE PC
> > > 1000H with Atom N270, 2GB RAM, 750 GB SATA drive.
> > >
> > > Additional (but probably useless) information on this bug may be found
> > > here: https://forums.gentoo.org/viewtopic-p-7371120.html
> >
> > This bug is still here with 3.11-rc7 and 3.10.9.
> > I opened a kernel bug 60802 for this issue:
> > https://bugzilla.kernel.org/show_bug.cgi?id=60802
> >
> > Any ideas?
> >
> > Best regards,
> > Andrew Savchenko
>
>
>
> > ------------------------------------------------------------------------------
> > Introducing Performance Central, a new site from SourceForge and
> > AppDynamics. Performance Central is your source for news, insights,
> > analysis and resources for efficient Application Performance Management.
> > Visit us today!
> > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>
> > _______________________________________________
> > Suspend-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/suspend-devel
>
>
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-09-12 08:32:44

by Andrew Savchenko

[permalink] [raw]
Subject: Re: [Suspend-devel] [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

Hello,

On Thu, 05 Sep 2013 14:23:25 +0200 Rafael J. Wysocki wrote:
> On Thursday, September 05, 2013 02:08:11 PM Pavel Machek wrote:
> > Hi!
> >
> > Rafael, Al: apparently we have a regression caused by
> > ba4df2808a86f8b103c4db0b8807649383e9bd13 .
>
> I noticed that, but I'm not sure how to deal with it.
>
> Also, s2disk still works on my test machines, so that seems to be
> specific to this particular configuration.

Is there any way to debug this issue further than just bisecting
commit? Perhaps suspend ioctl should be traced to see why it fails,
but I don't know what exactly inside the kernel should be debugged
and how.

Best regards,
Andrew Savchenko


Attachments:
(No filename) (669.00 B)
(No filename) (836.00 B)
Download all attachments

2013-09-18 13:02:22

by Pavel Machek

[permalink] [raw]
Subject: Re: [Suspend-devel] [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

On Thu 2013-09-12 12:32:17, Andrew Savchenko wrote:
> Hello,
>
> On Thu, 05 Sep 2013 14:23:25 +0200 Rafael J. Wysocki wrote:
> > On Thursday, September 05, 2013 02:08:11 PM Pavel Machek wrote:
> > > Hi!
> > >
> > > Rafael, Al: apparently we have a regression caused by
> > > ba4df2808a86f8b103c4db0b8807649383e9bd13 .
> >
> > I noticed that, but I'm not sure how to deal with it.
> >
> > Also, s2disk still works on my test machines, so that seems to be
> > specific to this particular configuration.
>
> Is there any way to debug this issue further than just bisecting
> commit? Perhaps suspend ioctl should be traced to see why it fails,
> but I don't know what exactly inside the kernel should be debugged
> and how.

If revert of ba4df2808a86f8b103c4db0b8807649383e9bd13 fixes it, then I
guess it should be just reverted...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-09-18 13:52:42

by Al Viro

[permalink] [raw]
Subject: Re: [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

On Tue, Aug 27, 2013 at 07:48:43AM +0400, Andrew Savchenko wrote:
> > Additional (but probably useless) information on this bug may be found
> > here: https://forums.gentoo.org/viewtopic-p-7371120.html

Something's very fishy there:

[quote]
Digging into suspend-utils code shows that the following ioctl fails on
"/dev/snapshot":

Code:
ioctl(dev, _IO(3, 1), 0);
[end quote]

but that's _not_ anything freeze-related - that's HDIO_GETGEO, and with zero
as last argument it will fail, no matter what. With EFAULT, if nothing
else...

Which ioctl() it really is? A bit further down you write "I modified suspend
code to see errno, so freeze on /dev/snapshot fails [with EAGAIN]", so you
have isolated the call in question. Could you quote the actual code?

2013-09-18 15:21:51

by Al Viro

[permalink] [raw]
Subject: Re: [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

On Wed, Sep 18, 2013 at 02:52:39PM +0100, Al Viro wrote:
> On Tue, Aug 27, 2013 at 07:48:43AM +0400, Andrew Savchenko wrote:
> > > Additional (but probably useless) information on this bug may be found
> > > here: https://forums.gentoo.org/viewtopic-p-7371120.html
>
> Something's very fishy there:
>
> [quote]
> Digging into suspend-utils code shows that the following ioctl fails on
> "/dev/snapshot":
>
> Code:
> ioctl(dev, _IO(3, 1), 0);
> [end quote]
>
> but that's _not_ anything freeze-related - that's HDIO_GETGEO, and with zero
> as last argument it will fail, no matter what. With EFAULT, if nothing
> else...
>
> Which ioctl() it really is? A bit further down you write "I modified suspend
> code to see errno, so freeze on /dev/snapshot fails [with EAGAIN]", so you
> have isolated the call in question. Could you quote the actual code?

*scratches head* _IO('3', 1), perhaps? At least that would make sense in
such context... Assuming that's the case, slap
printk(KERN_INFO "freeze_process() => %d", error);
after the call of freeze_process() in kernel/power/user.c along with
printk(KERN_INFO "__usermodehelper_disable() => %d", error);
and
printk(KERN_INFO "try_to_freeze_tasks() => %d", error);
in kernel/power/process.c:freeze_process(), after the calls of
__usermodehelper_disable() and try_to_freeze_tasks() resp.

FWIW, I suspect that it's __usermodehelper_disable() - it does
retval = wait_event_timeout(running_helpers_waitq,
atomic_read(&running_helpers) == 0,
RUNNING_HELPERS_TIMEOUT);
and returns -EAGAIN on timeout. I'm not familiar with swsusp code, but
it smells like we end up waiting for linuxrc itself to finish.

Pavel, any suggestions? If SNAPSHOT_FREEZE really wants everything run
via usermodehelper gone for some reason, what makes /linuxrc different
from e.g /sbin/modprobe?

2013-09-18 18:41:06

by Andrew Savchenko

[permalink] [raw]
Subject: Re: [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

Hello,

On Wed, 18 Sep 2013 14:52:39 +0100 Al Viro wrote:
> On Tue, Aug 27, 2013 at 07:48:43AM +0400, Andrew Savchenko wrote:
> > > Additional (but probably useless) information on this bug may be found
> > > here: https://forums.gentoo.org/viewtopic-p-7371120.html
>
> Something's very fishy there:
>
> [quote]
> Digging into suspend-utils code shows that the following ioctl fails on
> "/dev/snapshot":
>
> Code:
> ioctl(dev, _IO(3, 1), 0);
> [end quote]
>
> but that's _not_ anything freeze-related - that's HDIO_GETGEO, and with zero
> as last argument it will fail, no matter what. With EFAULT, if nothing
> else...
>
> Which ioctl() it really is? A bit further down you write "I modified suspend
> code to see errno, so freeze on /dev/snapshot fails [with EAGAIN]", so you
> have isolated the call in question. Could you quote the actual code?

Actual code is from suspend-utils tree, swsusp.h:

static inline int freeze(int dev)
{
return ioctl(dev, SNAPSHOT_FREEZE, 0);
}

And from suspend_ioctls.h:
#define SNAPSHOT_IOC_MAGIC '3'
#define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)

My mistake, should be '3' instead of 3.

Best regards,
Andrew Savchenko


Attachments:
(No filename) (1.44 kB)
(No filename) (836.00 B)
Download all attachments

2013-09-18 19:16:11

by Al Viro

[permalink] [raw]
Subject: Re: [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

On Wed, Sep 18, 2013 at 10:40:32PM +0400, Andrew Savchenko wrote:

> And from suspend_ioctls.h:
> #define SNAPSHOT_IOC_MAGIC '3'
> #define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
>
> My mistake, should be '3' instead of 3.

OK... The thing to test, then, is what does __usermodehelper_disable()
return to freeze_processes(). If that's where this -EAGAIN comes from,
we at least have a plausible theory re what's going on.

freeze_processes() uses __usermodehelper_disable() to stop any new userland
processes spawned by UMH (modprobe, etc.) and waits for ones it might be
waiting for to complete. Then it does try_to_freeze_tasks(), which
freezes remaining userland, carefully skipping the current thread.
However, it misses the possibility that current thread might have been
spawned by something that had been launched by UMH, with UMH waiting
for it. Which is the case of everything spawned by linuxrc.

I'd try something like diff below, but I'm *NOT* familiar with swsusp at
all; it's not for mainline until ACKed by swsusp folks.

diff --git a/kernel/kmod.c b/kernel/kmod.c
index fb32636..d968882 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -571,7 +571,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
DECLARE_COMPLETION_ONSTACK(done);
int retval = 0;

- helper_lock();
+ if (!(current->flags & PF_FREEZER_SKIP))
+ helper_lock();
if (!khelper_wq || usermodehelper_disabled) {
retval = -EBUSY;
goto out;
@@ -611,7 +612,8 @@ wait_done:
out:
call_usermodehelper_freeinfo(sub_info);
unlock:
- helper_unlock();
+ if (!(current->flags & PF_FREEZER_SKIP))
+ helper_unlock();
return retval;
}
EXPORT_SYMBOL(call_usermodehelper_exec);

2013-09-18 22:13:32

by Andrew Savchenko

[permalink] [raw]
Subject: Re: [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

On Wed, 18 Sep 2013 20:16:07 +0100 Al Viro wrote:
> On Wed, Sep 18, 2013 at 10:40:32PM +0400, Andrew Savchenko wrote:
>
> > And from suspend_ioctls.h:
> > #define SNAPSHOT_IOC_MAGIC '3'
> > #define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
> >
> > My mistake, should be '3' instead of 3.
>
> OK... The thing to test, then, is what does __usermodehelper_disable()
> return to freeze_processes(). If that's where this -EAGAIN comes from,
> we at least have a plausible theory re what's going on.
>
> freeze_processes() uses __usermodehelper_disable() to stop any new userland
> processes spawned by UMH (modprobe, etc.) and waits for ones it might be
> waiting for to complete. Then it does try_to_freeze_tasks(), which
> freezes remaining userland, carefully skipping the current thread.
> However, it misses the possibility that current thread might have been
> spawned by something that had been launched by UMH, with UMH waiting
> for it. Which is the case of everything spawned by linuxrc.
>
> I'd try something like diff below, but I'm *NOT* familiar with swsusp at
> all; it's not for mainline until ACKed by swsusp folks.
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index fb32636..d968882 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -571,7 +571,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> DECLARE_COMPLETION_ONSTACK(done);
> int retval = 0;
>
> - helper_lock();
> + if (!(current->flags & PF_FREEZER_SKIP))
> + helper_lock();
> if (!khelper_wq || usermodehelper_disabled) {
> retval = -EBUSY;
> goto out;
> @@ -611,7 +612,8 @@ wait_done:
> out:
> call_usermodehelper_freeinfo(sub_info);
> unlock:
> - helper_unlock();
> + if (!(current->flags & PF_FREEZER_SKIP))
> + helper_unlock();
> return retval;
> }
> EXPORT_SYMBOL(call_usermodehelper_exec);

With this patch and 3.11.1 kernel resume works fine.

Best regards,
Andrew Savchenko


Attachments:
(No filename) (1.90 kB)
(No filename) (836.00 B)
Download all attachments

2013-09-24 00:21:15

by Pavel Machek

[permalink] [raw]
Subject: Re: [Suspend-devel] [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

Hi!

> > And from suspend_ioctls.h:
> > #define SNAPSHOT_IOC_MAGIC '3'
> > #define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
> >
> > My mistake, should be '3' instead of 3.
>
> OK... The thing to test, then, is what does __usermodehelper_disable()
> return to freeze_processes(). If that's where this -EAGAIN comes from,
> we at least have a plausible theory re what's going on.
>
> freeze_processes() uses __usermodehelper_disable() to stop any new userland
> processes spawned by UMH (modprobe, etc.) and waits for ones it might be
> waiting for to complete. Then it does try_to_freeze_tasks(), which
> freezes remaining userland, carefully skipping the current thread.
> However, it misses the possibility that current thread might have been
> spawned by something that had been launched by UMH, with UMH waiting
> for it. Which is the case of everything spawned by linuxrc.
>
> I'd try something like diff below, but I'm *NOT* familiar with swsusp at
> all; it's not for mainline until ACKed by swsusp folks.
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index fb32636..d968882 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -571,7 +571,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> DECLARE_COMPLETION_ONSTACK(done);
> int retval = 0;
>
> - helper_lock();
> + if (!(current->flags & PF_FREEZER_SKIP))
> + helper_lock();
> if (!khelper_wq || usermodehelper_disabled) {
> retval = -EBUSY;
> goto out;
> @@ -611,7 +612,8 @@ wait_done:
> out:
> call_usermodehelper_freeinfo(sub_info);
> unlock:
> - helper_unlock();
> + if (!(current->flags & PF_FREEZER_SKIP))
> + helper_unlock();
> return retval;
> }
> EXPORT_SYMBOL(call_usermodehelper_exec);

PF_FREEZER_SKIP flag is manipulated at about 1000 places, so I'm not
sure this will nest correctly. They seem to be in form of

|= FREEZER_SKIP
schedule()
&= ~FREEZER_SKIP

so this should be safe, but...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-10-17 21:23:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Suspend-devel] [BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

Sorry for the huge delay.

On Tuesday, September 24, 2013 02:21:11 AM Pavel Machek wrote:
> Hi!
>
> > > And from suspend_ioctls.h:
> > > #define SNAPSHOT_IOC_MAGIC '3'
> > > #define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
> > >
> > > My mistake, should be '3' instead of 3.
> >
> > OK... The thing to test, then, is what does __usermodehelper_disable()
> > return to freeze_processes(). If that's where this -EAGAIN comes from,
> > we at least have a plausible theory re what's going on.
> >
> > freeze_processes() uses __usermodehelper_disable() to stop any new userland
> > processes spawned by UMH (modprobe, etc.) and waits for ones it might be
> > waiting for to complete. Then it does try_to_freeze_tasks(), which
> > freezes remaining userland, carefully skipping the current thread.
> > However, it misses the possibility that current thread might have been
> > spawned by something that had been launched by UMH, with UMH waiting
> > for it. Which is the case of everything spawned by linuxrc.
> >
> > I'd try something like diff below, but I'm *NOT* familiar with swsusp at
> > all; it's not for mainline until ACKed by swsusp folks.
> >
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index fb32636..d968882 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -571,7 +571,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> > DECLARE_COMPLETION_ONSTACK(done);
> > int retval = 0;
> >
> > - helper_lock();
> > + if (!(current->flags & PF_FREEZER_SKIP))
> > + helper_lock();
> > if (!khelper_wq || usermodehelper_disabled) {
> > retval = -EBUSY;
> > goto out;
> > @@ -611,7 +612,8 @@ wait_done:
> > out:
> > call_usermodehelper_freeinfo(sub_info);
> > unlock:
> > - helper_unlock();
> > + if (!(current->flags & PF_FREEZER_SKIP))
> > + helper_unlock();
> > return retval;
> > }
> > EXPORT_SYMBOL(call_usermodehelper_exec);
>
> PF_FREEZER_SKIP flag is manipulated at about 1000 places, so I'm not
> sure this will nest correctly.

This is not exactly correct unless 1000 is about 50. And none of them leads to
call_usermodehelper_exec() as far as I can say.

> They seem to be in form of
>
> |= FREEZER_SKIP
> schedule()
> &= ~FREEZER_SKIP
>
> so this should be safe, but...

I think the patch is correct, so

Acked-by: Rafael J. Wysocki <[email protected]>

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.