2014-10-24 07:43:31

by Imre Deak

[permalink] [raw]
Subject: [PATCH] PM / Sleep: fix recovery during s2ram/hibernation

Atm, if one of the dev_pm_ops::freeze callbacks fails during the QUIESCE
phase we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.

Also if an asynchronous suspend_late or freeze_late callback fails
during the SUSPEND, FREEZE or QUIESCE phases we don't propagate the
corresponding error correctly, in effect ignoring the error and
continuing the suspend-to-ram/hibernation. During suspend-to-ram this
could leave some devices without a valid saved context, leading to a
failure to reinitialize them during resume. During hibernation this
could leave some devices active interfeering with the creation /
restoration of the hibernation image. Also this could leave the
corresponding devices without a valid saved context and failure to
reinitialize them during resume.

Signed-off-by: Imre Deak <[email protected]>
---
drivers/base/power/main.c | 2 ++
kernel/power/hibernate.c | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 4497319..9717d5f 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1266,6 +1266,8 @@ int dpm_suspend_late(pm_message_t state)
}
mutex_unlock(&dpm_list_mtx);
async_synchronize_full();
+ if (!error)
+ error = async_error;
if (error) {
suspend_stats.failed_suspend_late++;
dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index a9dfa79..05768d5 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -502,8 +502,10 @@ int hibernation_restore(int platform_mode)
error = dpm_suspend_start(PMSG_QUIESCE);
if (!error) {
error = resume_target_kernel(platform_mode);
- dpm_resume_end(PMSG_RECOVER);
+ BUG_ON(!error);
}
+ if (error)
+ dpm_resume_end(PMSG_RECOVER);
pm_restore_gfp_mask();
resume_console();
pm_restore_console();
--
1.9.1


2014-10-24 07:59:14

by Imre Deak

[permalink] [raw]
Subject: [PATCH v2] PM / Sleep: fix recovery during s2ram/hibernation

Atm, if one of the dev_pm_ops::freeze callbacks fails during the QUIESCE
phase we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.

Also if an asynchronous suspend_late or freeze_late callback fails
during the SUSPEND, FREEZE or QUIESCE phases we don't propagate the
corresponding error correctly, in effect ignoring the error and
continuing the suspend-to-ram/hibernation. During suspend-to-ram this
could leave some devices without a valid saved context, leading to a
failure to reinitialize them during resume. During hibernation this
could leave some devices active interfeering with the creation /
restoration of the hibernation image. Also this could leave the
corresponding devices without a valid saved context and failure to
reinitialize them during resume.

v2:
- call dpm_resume_end() unconditionally, it's guaranteed that error
is non-zero

Signed-off-by: Imre Deak <[email protected]>
---
drivers/base/power/main.c | 2 ++
kernel/power/hibernate.c | 3 ++-
2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 4497319..9717d5f 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1266,6 +1266,8 @@ int dpm_suspend_late(pm_message_t state)
}
mutex_unlock(&dpm_list_mtx);
async_synchronize_full();
+ if (!error)
+ error = async_error;
if (error) {
suspend_stats.failed_suspend_late++;
dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index a9dfa79..594ee7f 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -502,8 +502,9 @@ int hibernation_restore(int platform_mode)
error = dpm_suspend_start(PMSG_QUIESCE);
if (!error) {
error = resume_target_kernel(platform_mode);
- dpm_resume_end(PMSG_RECOVER);
+ BUG_ON(!error);
}
+ dpm_resume_end(PMSG_RECOVER);
pm_restore_gfp_mask();
resume_console();
pm_restore_console();
--
1.8.4

2014-10-24 13:43:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / Sleep: fix recovery during s2ram/hibernation

On Friday, October 24, 2014 10:59:09 AM Imre Deak wrote:
> Atm, if one of the dev_pm_ops::freeze callbacks fails during the QUIESCE
> phase we don't rollback things correctly calling the thaw and complete
> callbacks. This could leave some devices in a suspended state in case of
> an error during resuming from hibernation.
>
> Also if an asynchronous suspend_late or freeze_late callback fails
> during the SUSPEND, FREEZE or QUIESCE phases we don't propagate the
> corresponding error correctly, in effect ignoring the error and
> continuing the suspend-to-ram/hibernation. During suspend-to-ram this
> could leave some devices without a valid saved context, leading to a
> failure to reinitialize them during resume. During hibernation this
> could leave some devices active interfeering with the creation /
> restoration of the hibernation image. Also this could leave the
> corresponding devices without a valid saved context and failure to
> reinitialize them during resume.
>
> v2:
> - call dpm_resume_end() unconditionally, it's guaranteed that error
> is non-zero
>
> Signed-off-by: Imre Deak <[email protected]>

These are two unrelated fixes, so please send them as two patches.

> ---
> drivers/base/power/main.c | 2 ++
> kernel/power/hibernate.c | 3 ++-
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 4497319..9717d5f 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1266,6 +1266,8 @@ int dpm_suspend_late(pm_message_t state)
> }
> mutex_unlock(&dpm_list_mtx);
> async_synchronize_full();
> + if (!error)
> + error = async_error;
> if (error) {
> suspend_stats.failed_suspend_late++;
> dpm_save_failed_step(SUSPEND_SUSPEND_LATE);

So the above will be the first fix and the below will be the second one.

> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index a9dfa79..594ee7f 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -502,8 +502,9 @@ int hibernation_restore(int platform_mode)
> error = dpm_suspend_start(PMSG_QUIESCE);
> if (!error) {
> error = resume_target_kernel(platform_mode);
> - dpm_resume_end(PMSG_RECOVER);
> + BUG_ON(!error);

Why BUG_ON()? Is crashing the kernel necessary here?

> }
> + dpm_resume_end(PMSG_RECOVER);
> pm_restore_gfp_mask();
> resume_console();
> pm_restore_console();
>

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-10-24 14:17:17

by Imre Deak

[permalink] [raw]
Subject: Re: [PATCH v2] PM / Sleep: fix recovery during s2ram/hibernation

On Fri, 2014-10-24 at 16:04 +0200, Rafael J. Wysocki wrote:
> On Friday, October 24, 2014 10:59:09 AM Imre Deak wrote:
> > Atm, if one of the dev_pm_ops::freeze callbacks fails during the QUIESCE
> > phase we don't rollback things correctly calling the thaw and complete
> > callbacks. This could leave some devices in a suspended state in case of
> > an error during resuming from hibernation.
> >
> > Also if an asynchronous suspend_late or freeze_late callback fails
> > during the SUSPEND, FREEZE or QUIESCE phases we don't propagate the
> > corresponding error correctly, in effect ignoring the error and
> > continuing the suspend-to-ram/hibernation. During suspend-to-ram this
> > could leave some devices without a valid saved context, leading to a
> > failure to reinitialize them during resume. During hibernation this
> > could leave some devices active interfeering with the creation /
> > restoration of the hibernation image. Also this could leave the
> > corresponding devices without a valid saved context and failure to
> > reinitialize them during resume.
> >
> > v2:
> > - call dpm_resume_end() unconditionally, it's guaranteed that error
> > is non-zero
> >
> > Signed-off-by: Imre Deak <[email protected]>
>
> These are two unrelated fixes, so please send them as two patches.
>
> > ---
> > drivers/base/power/main.c | 2 ++
> > kernel/power/hibernate.c | 3 ++-
> > 2 files changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index 4497319..9717d5f 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -1266,6 +1266,8 @@ int dpm_suspend_late(pm_message_t state)
> > }
> > mutex_unlock(&dpm_list_mtx);
> > async_synchronize_full();
> > + if (!error)
> > + error = async_error;
> > if (error) {
> > suspend_stats.failed_suspend_late++;
> > dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
>
> So the above will be the first fix and the below will be the second one.

Ok.

>
> > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> > index a9dfa79..594ee7f 100644
> > --- a/kernel/power/hibernate.c
> > +++ b/kernel/power/hibernate.c
> > @@ -502,8 +502,9 @@ int hibernation_restore(int platform_mode)
> > error = dpm_suspend_start(PMSG_QUIESCE);
> > if (!error) {
> > error = resume_target_kernel(platform_mode);
> > - dpm_resume_end(PMSG_RECOVER);
> > + BUG_ON(!error);
>
> Why BUG_ON()? Is crashing the kernel necessary here?

I figured that this being an undefined state after restoration of a
memory image, it's better to crash than to continue and risk corrupting
some user data.

>
> > }
> > + dpm_resume_end(PMSG_RECOVER);
> > pm_restore_gfp_mask();
> > resume_console();
> > pm_restore_console();
> >
>

2014-10-24 14:34:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / Sleep: fix recovery during s2ram/hibernation

On Friday, October 24, 2014 05:17:12 PM Imre Deak wrote:
> On Fri, 2014-10-24 at 16:04 +0200, Rafael J. Wysocki wrote:
> > On Friday, October 24, 2014 10:59:09 AM Imre Deak wrote:
> > > Atm, if one of the dev_pm_ops::freeze callbacks fails during the QUIESCE
> > > phase we don't rollback things correctly calling the thaw and complete
> > > callbacks. This could leave some devices in a suspended state in case of
> > > an error during resuming from hibernation.
> > >
> > > Also if an asynchronous suspend_late or freeze_late callback fails
> > > during the SUSPEND, FREEZE or QUIESCE phases we don't propagate the
> > > corresponding error correctly, in effect ignoring the error and
> > > continuing the suspend-to-ram/hibernation. During suspend-to-ram this
> > > could leave some devices without a valid saved context, leading to a
> > > failure to reinitialize them during resume. During hibernation this
> > > could leave some devices active interfeering with the creation /
> > > restoration of the hibernation image. Also this could leave the
> > > corresponding devices without a valid saved context and failure to
> > > reinitialize them during resume.
> > >
> > > v2:
> > > - call dpm_resume_end() unconditionally, it's guaranteed that error
> > > is non-zero
> > >
> > > Signed-off-by: Imre Deak <[email protected]>
> >
> > These are two unrelated fixes, so please send them as two patches.
> >
> > > ---
> > > drivers/base/power/main.c | 2 ++
> > > kernel/power/hibernate.c | 3 ++-
> > > 2 files changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > index 4497319..9717d5f 100644
> > > --- a/drivers/base/power/main.c
> > > +++ b/drivers/base/power/main.c
> > > @@ -1266,6 +1266,8 @@ int dpm_suspend_late(pm_message_t state)
> > > }
> > > mutex_unlock(&dpm_list_mtx);
> > > async_synchronize_full();
> > > + if (!error)
> > > + error = async_error;
> > > if (error) {
> > > suspend_stats.failed_suspend_late++;
> > > dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
> >
> > So the above will be the first fix and the below will be the second one.
>
> Ok.
>
> >
> > > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> > > index a9dfa79..594ee7f 100644
> > > --- a/kernel/power/hibernate.c
> > > +++ b/kernel/power/hibernate.c
> > > @@ -502,8 +502,9 @@ int hibernation_restore(int platform_mode)
> > > error = dpm_suspend_start(PMSG_QUIESCE);
> > > if (!error) {
> > > error = resume_target_kernel(platform_mode);
> > > - dpm_resume_end(PMSG_RECOVER);
> > > + BUG_ON(!error);
> >
> > Why BUG_ON()? Is crashing the kernel necessary here?
>
> I figured that this being an undefined state after restoration of a
> memory image, it's better to crash than to continue and risk corrupting
> some user data.

So can you please add a comment explaining that new BUG_ON() along with it?

>
> >
> > > }
> > > + dpm_resume_end(PMSG_RECOVER);
> > > pm_restore_gfp_mask();
> > > resume_console();
> > > pm_restore_console();
> > >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-10-24 17:29:15

by Imre Deak

[permalink] [raw]
Subject: [PATCH v3 1/2] PM / Sleep: fix async suspend_late/freeze_late error handling

If an asynchronous suspend_late or freeze_late callback fails
during the SUSPEND, FREEZE or QUIESCE phases we don't propagate the
corresponding error correctly, in effect ignoring the error and
continuing the suspend-to-ram/hibernation. During suspend-to-ram this
could leave some devices without a valid saved context, leading to a
failure to reinitialize them during resume. During hibernation this
could leave some devices active interfeering with the creation /
restoration of the hibernation image. Also this could leave the
corresponding devices without a valid saved context and failure to
reinitialize them during resume.

v3 (added to patchset):
- split out this fix into a separate patch (Rafael)

Signed-off-by: Imre Deak <[email protected]>
---
drivers/base/power/main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 4497319..9717d5f 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1266,6 +1266,8 @@ int dpm_suspend_late(pm_message_t state)
}
mutex_unlock(&dpm_list_mtx);
async_synchronize_full();
+ if (!error)
+ error = async_error;
if (error) {
suspend_stats.failed_suspend_late++;
dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
--
1.8.4

2014-10-24 17:31:20

by Imre Deak

[permalink] [raw]
Subject: [PATCH v3 2/2] PM / Sleep: fix recovery during resuming from hibernation

If a device's dev_pm_ops::freeze callback fails during the QUIESCE
phase we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.

v2:
- call dpm_resume_end() unconditionally, it's guaranteed that error
is non-zero
v3:
- split out this fix into a separate patch (Rafael)
- add code comment on why BUG_ON() is used (Rafael)

Signed-off-by: Imre Deak <[email protected]>
---
kernel/power/hibernate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index a9dfa79..1f35a34 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -502,8 +502,14 @@ int hibernation_restore(int platform_mode)
error = dpm_suspend_start(PMSG_QUIESCE);
if (!error) {
error = resume_target_kernel(platform_mode);
- dpm_resume_end(PMSG_RECOVER);
+ /*
+ * The above should either succeed and jump to the new kernel,
+ * or return with an error. Otherwise things are just
+ * undefined, so let's be paranoid.
+ */
+ BUG_ON(!error);
}
+ dpm_resume_end(PMSG_RECOVER);
pm_restore_gfp_mask();
resume_console();
pm_restore_console();
--
1.8.4

2014-10-27 22:24:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] PM / Sleep: fix async suspend_late/freeze_late error handling

On Friday, October 24, 2014 08:29:09 PM Imre Deak wrote:
> If an asynchronous suspend_late or freeze_late callback fails
> during the SUSPEND, FREEZE or QUIESCE phases we don't propagate the
> corresponding error correctly, in effect ignoring the error and
> continuing the suspend-to-ram/hibernation. During suspend-to-ram this
> could leave some devices without a valid saved context, leading to a
> failure to reinitialize them during resume. During hibernation this
> could leave some devices active interfeering with the creation /
> restoration of the hibernation image. Also this could leave the
> corresponding devices without a valid saved context and failure to
> reinitialize them during resume.
>
> v3 (added to patchset):
> - split out this fix into a separate patch (Rafael)
>
> Signed-off-by: Imre Deak <[email protected]>

Both [1-2/2] applied, thanks!

> ---
> drivers/base/power/main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 4497319..9717d5f 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1266,6 +1266,8 @@ int dpm_suspend_late(pm_message_t state)
> }
> mutex_unlock(&dpm_list_mtx);
> async_synchronize_full();
> + if (!error)
> + error = async_error;
> if (error) {
> suspend_stats.failed_suspend_late++;
> dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
>

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.