2011-08-04 22:23:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)

On Thursday, August 04, 2011, Rafael J. Wysocki wrote:
> On Wednesday, August 03, 2011, Pavel Machek wrote:
> > Hi!
> >
> > > Freeze all filesystems during the freezing of tasks by calling
> > > freeze_bdev() for each of them and thaw them during the thawing
> > > of tasks with the help of thaw_bdev().
> > >
> > > This is needed by hibernation, because some filesystems (e.g. XFS)
> > > deadlock with the preallocation of memory used by it if the memory
> > > pressure caused by it is too heavy.
> > >
> > > The additional benefit of this change is that, if something goes
> > > wrong after filesystems have been frozen, they will stay in a
> > > consistent state and journal replays won't be necessary (e.g. after
> > > a failing suspend or resume). In particular, this should help to
> > > solve a long-standing issue that in some cases during resume from
> > > hibernation the boot loader causes the journal to be replied for the
> > > filesystem containing the kernel image and initrd causing it to
> > > become inconsistent with the information stored in the hibernation
> > > image.
> >
> > > +/**
> > > + * freeze_filesystems - Force all filesystems into a consistent state.
> > > + */
> > > +void freeze_filesystems(void)
> > > +{
> > > + struct super_block *sb;
> > > +
> > > + lockdep_off();
> >
> > Ouch. So... why do we need to silence this?
>
> So that it doesn't complain? :-)
>
> I'll need some time to get the exact details here.

So, this is because ext3_freeze() that doesn't call
journal_unlock_updates() on success, which quite frankly looks like
a bug in ext3 to me. At least that's different from what ext4 does
in exactly the same situation (which looks correct).

If ext3_freeze() called journal_unlock_updates() on success too and
the call to journal_unlock_updates() is removed from ext3_unfreeze(),
we wouldn't need that lockdep_off()/lockdep_on() around the loop.

I need someone with ext3/ext4 knowledge to comment here, though.

Moreover, I'm not sure if other filesystems don't do such things.

Anyway, this is just a false-positive, even with the ext3 code as is.

> > > + /*
> > > + * Freeze in reverse order so filesystems dependant upon others are
> > > + * frozen in the right order (eg. loopback on ext3).
> > > + */
> > > + list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > > + if (!sb->s_root || !sb->s_bdev ||
> > > + (sb->s_frozen == SB_FREEZE_TRANS) ||
> > > + (sb->s_flags & MS_RDONLY) ||
> > > + (sb->s_flags & MS_FROZEN))
> > > + continue;
> >
> > Should we stop NFS from modifying remote server, too?
>
> What do you mean exactly?
>
> > Plus... ext3 writes to read-only filesystems on mount; not sure if it
> > does it later. But RDONLY means 'user cant write to it' not 'bdev will
> > not be modified'. Should we freeze all?
> >
> > How can 'already frozen' happen?
> >
> > > + list_for_each_entry(sb, &super_blocks, s_list)
> > > + if (sb->s_flags & MS_FROZEN) {
> > > + sb->s_flags &= ~MS_FROZEN;
> > > + thaw_bdev(sb->s_bdev, sb);
> > > + }
> >
> > ...because we'll unfreeze it even if we did not freeze it...
>
> So we need not check MS_FROZEN in freeze_filesystems(). OK

Thanks,
Rafael


2011-08-06 21:16:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

From: Rafael J. Wysocki <[email protected]>

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume). In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---

OK, so nobody except for Pavel appears to have any comments, so I assume
that everyone except for Pavel is fine with the approach, interestingly enough.

I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
and added comments explaining why lockdep_off/on() are used.

Thanks,
Rafael

---
fs/block_dev.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 6 +++++
kernel/power/process.c | 7 +++++-
3 files changed, 68 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
#define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
#define MS_I_VERSION (1<<23) /* Update inode I_version field */
#define MS_STRICTATIME (1<<24) /* Always perform atime updates */
+#define MS_FROZEN (1<<25) /* bdev has been frozen */
#define MS_NOSEC (1<<28)
#define MS_BORN (1<<29)
#define MS_ACTIVE (1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
extern void emergency_thaw_all(void);
extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
#else
static inline void bd_forget(struct inode *inode) {}
static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
{
return 0;
}
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
#endif
extern int sync_filesystem(struct super_block *);
extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,62 @@ out:
}
EXPORT_SYMBOL(thaw_bdev);

+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+ struct super_block *sb;
+
+ /*
+ * This is necessary, because some filesystems (e.g. ext3) lock
+ * mutexes in their .freeze_fs() callbacks and leave them locked for
+ * their .unfreeze_fs() callbacks to unlock. This is done under
+ * bdev->bd_fsfreeze_mutex, which is then released, but it makes
+ * lockdep think something may be wrong when freeze_bdev() attempts
+ * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
+ */
+ lockdep_off();
+
+ /*
+ * Freeze in reverse order so filesystems depending on others are
+ * frozen in the right order (eg. loopback on ext3).
+ */
+ list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+ if (!sb->s_root || !sb->s_bdev ||
+ (sb->s_frozen == SB_FREEZE_TRANS) ||
+ (sb->s_flags & MS_RDONLY))
+ continue;
+
+ freeze_bdev(sb->s_bdev);
+ sb->s_flags |= MS_FROZEN;
+ }
+
+ lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+ struct super_block *sb;
+
+ /*
+ * This is necessary for the same reason as in freeze_filesystems()
+ * above.
+ */
+ lockdep_off();
+
+ list_for_each_entry(sb, &super_blocks, s_list)
+ if (sb->s_flags & MS_FROZEN) {
+ sb->s_flags &= ~MS_FROZEN;
+ thaw_bdev(sb->s_bdev, sb);
+ }
+
+ lockdep_on();
+}
+
static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
{
return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
#include <linux/oom.h>
#include <linux/suspend.h>
#include <linux/module.h>
-#include <linux/syscalls.h>
#include <linux/freezer.h>
#include <linux/delay.h>
#include <linux/workqueue.h>
+#include <linux/fs.h>

/*
* Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
goto Exit;
printk("done.\n");

+ pr_info("Freezing filesystems ... ");
+ freeze_filesystems();
+ pr_info("done.\n");
+
printk("Freezing remaining freezable tasks ... ");
error = try_to_freeze_tasks(false);
if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
printk("Restarting tasks ... ");
thaw_workqueues();
thaw_tasks(true);
+ thaw_filesystems();
thaw_tasks(false);
schedule();
printk("done.\n");

2011-08-07 00:14:51

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
>
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
>
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume). In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.
>
> This change is based on earlier work by Nigel Cunningham.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
> OK, so nobody except for Pavel appears to have any comments, so I assume
> that everyone except for Pavel is fine with the approach, interestingly enough.
>
> I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> and added comments explaining why lockdep_off/on() are used.
>
> Thanks,
> Rafael
>
> ---
> fs/block_dev.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 6 +++++
> kernel/power/process.c | 7 +++++-
> 3 files changed, 68 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -211,6 +211,7 @@ struct inodes_stat_t {
> #define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
> #define MS_I_VERSION (1<<23) /* Update inode I_version field */
> #define MS_STRICTATIME (1<<24) /* Always perform atime updates */
> +#define MS_FROZEN (1<<25) /* bdev has been frozen */
> #define MS_NOSEC (1<<28)
> #define MS_BORN (1<<29)
> #define MS_ACTIVE (1<<30)
> @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
> extern void emergency_thaw_all(void);
> extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> extern int fsync_bdev(struct block_device *);
> +extern void freeze_filesystems(void);
> +extern void thaw_filesystems(void);
> #else
> static inline void bd_forget(struct inode *inode) {}
> static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
> {
> return 0;
> }
> +
> +static inline void freeze_filesystems(void) {}
> +static inline void thaw_filesystems(void) {}
> #endif
> extern int sync_filesystem(struct super_block *);
> extern const struct file_operations def_blk_fops;
> Index: linux-2.6/fs/block_dev.c
> ===================================================================
> --- linux-2.6.orig/fs/block_dev.c
> +++ linux-2.6/fs/block_dev.c
> @@ -314,6 +314,62 @@ out:
> }
> EXPORT_SYMBOL(thaw_bdev);
>
> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> + struct super_block *sb;
> +
> + /*
> + * This is necessary, because some filesystems (e.g. ext3) lock
> + * mutexes in their .freeze_fs() callbacks and leave them locked for
> + * their .unfreeze_fs() callbacks to unlock. This is done under
> + * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> + * lockdep think something may be wrong when freeze_bdev() attempts
> + * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> + */
> + lockdep_off();

I thought those problems were fixed. If they aren't, then they most
certainly need to be because holding mutexes over system calls is a
bug.

Well, well:

[252182.603134] ================================================
[252182.604832] [ BUG: lock held when returning to user space! ]
[252182.606086] ------------------------------------------------
[252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
[252182.608905] 1 lock held by xfs_io/4917:
[252182.609739] #0: (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100

<sigh>

Looks like the problem was fixed for ext4, but not ext3. Please
report this to the ext3/4 list and get it fixed, don't work around
it here.

> + /*
> + * Freeze in reverse order so filesystems depending on others are
> + * frozen in the right order (eg. loopback on ext3).
> + */
> + list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> + if (!sb->s_root || !sb->s_bdev ||
> + (sb->s_frozen == SB_FREEZE_TRANS) ||
> + (sb->s_flags & MS_RDONLY))
> + continue;
> +
> + freeze_bdev(sb->s_bdev);
> + sb->s_flags |= MS_FROZEN;
> + }

AFAIK, that won't work for btrfs - you have to call freeze_super()
directly for btrfs because it has a special relationship with
sb->s_bdev. And besides, all freeze_bdev does is get an active
reference on the superblock and call freeze_super().

Also, that's traversing the list of superblock with locking and
dereferencing the superblock without properly checking that the
superblock is not being torn down. You should probably use
iterate_supers (or at least copy the code), with a function that
drops the s_umount read lock befor calling freeze_super() and then
picks it back up afterwards.

> +
> + lockdep_on();
> +}
> +
> +/**
> + * thaw_filesystems - Make all filesystems active again.
> + */
> +void thaw_filesystems(void)
> +{
> + struct super_block *sb;
> +
> + /*
> + * This is necessary for the same reason as in freeze_filesystems()
> + * above.
> + */
> + lockdep_off();
> +
> + list_for_each_entry(sb, &super_blocks, s_list)
> + if (sb->s_flags & MS_FROZEN) {
> + sb->s_flags &= ~MS_FROZEN;
> + thaw_bdev(sb->s_bdev, sb);
> + }

And once again, iterate_supers() is what you want here. And you
should only call thaw_bdev() as it needs to do checks other than
checking MS_FROZEN e.g. the above will unfreeze filesystems that
were already frozen at the time a suspend occurs, and that could
lead to corruption depending on why the filesystem was frozen...

Also, you still need to check for a valid sb->s_bdev here, otherwise
<splat>.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2011-08-08 21:10:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> >
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> >
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume). In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> >
> > This change is based on earlier work by Nigel Cunningham.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> >
> > OK, so nobody except for Pavel appears to have any comments, so I assume
> > that everyone except for Pavel is fine with the approach, interestingly enough.
> >
> > I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> > and added comments explaining why lockdep_off/on() are used.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > fs/block_dev.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 6 +++++
> > kernel/power/process.c | 7 +++++-
> > 3 files changed, 68 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/include/linux/fs.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -211,6 +211,7 @@ struct inodes_stat_t {
> > #define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
> > #define MS_I_VERSION (1<<23) /* Update inode I_version field */
> > #define MS_STRICTATIME (1<<24) /* Always perform atime updates */
> > +#define MS_FROZEN (1<<25) /* bdev has been frozen */
> > #define MS_NOSEC (1<<28)
> > #define MS_BORN (1<<29)
> > #define MS_ACTIVE (1<<30)
> > @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
> > extern void emergency_thaw_all(void);
> > extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> > extern int fsync_bdev(struct block_device *);
> > +extern void freeze_filesystems(void);
> > +extern void thaw_filesystems(void);
> > #else
> > static inline void bd_forget(struct inode *inode) {}
> > static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> > @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
> > {
> > return 0;
> > }
> > +
> > +static inline void freeze_filesystems(void) {}
> > +static inline void thaw_filesystems(void) {}
> > #endif
> > extern int sync_filesystem(struct super_block *);
> > extern const struct file_operations def_blk_fops;
> > Index: linux-2.6/fs/block_dev.c
> > ===================================================================
> > --- linux-2.6.orig/fs/block_dev.c
> > +++ linux-2.6/fs/block_dev.c
> > @@ -314,6 +314,62 @@ out:
> > }
> > EXPORT_SYMBOL(thaw_bdev);
> >
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > + struct super_block *sb;
> > +
> > + /*
> > + * This is necessary, because some filesystems (e.g. ext3) lock
> > + * mutexes in their .freeze_fs() callbacks and leave them locked for
> > + * their .unfreeze_fs() callbacks to unlock. This is done under
> > + * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> > + * lockdep think something may be wrong when freeze_bdev() attempts
> > + * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> > + */
> > + lockdep_off();
>
> I thought those problems were fixed. If they aren't, then they most
> certainly need to be because holding mutexes over system calls is a
> bug.
>
> Well, well:
>
> [252182.603134] ================================================
> [252182.604832] [ BUG: lock held when returning to user space! ]
> [252182.606086] ------------------------------------------------
> [252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
> [252182.608905] 1 lock held by xfs_io/4917:
> [252182.609739] #0: (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100
>
> <sigh>
>
> Looks like the problem was fixed for ext4, but not ext3. Please
> report this to the ext3/4 list and get it fixed, don't work around
> it here.

OK, but I guess I'll have to post a patch to fix this myself so that
anyone notices. :-)

> > + /*
> > + * Freeze in reverse order so filesystems depending on others are
> > + * frozen in the right order (eg. loopback on ext3).
> > + */
> > + list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > + if (!sb->s_root || !sb->s_bdev ||
> > + (sb->s_frozen == SB_FREEZE_TRANS) ||
> > + (sb->s_flags & MS_RDONLY))
> > + continue;
> > +
> > + freeze_bdev(sb->s_bdev);
> > + sb->s_flags |= MS_FROZEN;
> > + }
>
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().

OK, so do you mean I should call freeze_super() rather than freeze_bdev()?

> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

Hmm, I'll try that, but I doubt I'll get it right first time. :-)

> > +
> > + lockdep_on();
> > +}
> > +
> > +/**
> > + * thaw_filesystems - Make all filesystems active again.
> > + */
> > +void thaw_filesystems(void)
> > +{
> > + struct super_block *sb;
> > +
> > + /*
> > + * This is necessary for the same reason as in freeze_filesystems()
> > + * above.
> > + */
> > + lockdep_off();
> > +
> > + list_for_each_entry(sb, &super_blocks, s_list)
> > + if (sb->s_flags & MS_FROZEN) {
> > + sb->s_flags &= ~MS_FROZEN;
> > + thaw_bdev(sb->s_bdev, sb);
> > + }
>
> And once again, iterate_supers() is what you want here.

OK

> And you should only call thaw_bdev() as it needs to do checks other
> than checking MS_FROZEN

Hmm, I'm not really sure what you mean?

> e.g. the above will unfreeze filesystems that
> were already frozen at the time a suspend occurs, and that could
> lead to corruption depending on why the filesystem was frozen...
>
> Also, you still need to check for a valid sb->s_bdev here, otherwise
> <splat>.

I see.

Thanks,
Rafael

2011-08-14 00:16:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
...
> > + /*
> > + * Freeze in reverse order so filesystems depending on others are
> > + * frozen in the right order (eg. loopback on ext3).
> > + */
> > + list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > + if (!sb->s_root || !sb->s_bdev ||
> > + (sb->s_frozen == SB_FREEZE_TRANS) ||
> > + (sb->s_flags & MS_RDONLY))
> > + continue;
> > +
> > + freeze_bdev(sb->s_bdev);
> > + sb->s_flags |= MS_FROZEN;
> > + }
>
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().
>
> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

So, what about the patch below? It appears to work on my test boxes.

Thanks,
Rafael

---
From: Rafael J. Wysocki <[email protected]>
Subject: PM / Freezer: Freeze filesystems while freezing processes (v3)

Freeze all filesystems during the freezing of tasks by calling
freeze_super() for all superblocks and thaw them during the thawing
of tasks with the help of thaw_super().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume). In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
fs/super.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 3 ++
kernel/power/process.c | 9 +++++-
3 files changed, 81 insertions(+), 1 deletion(-)

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
#define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
#define MS_I_VERSION (1<<23) /* Update inode I_version field */
#define MS_STRICTATIME (1<<24) /* Always perform atime updates */
+#define MS_FROZEN (1<<25) /* Frozen filesystem */
#define MS_NOSEC (1<<28)
#define MS_BORN (1<<29)
#define MS_ACTIVE (1<<30)
@@ -2497,6 +2498,8 @@ extern void drop_super(struct super_bloc
extern void iterate_supers(void (*)(struct super_block *, void *), void *);
extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
+extern int freeze_supers(void);
+extern void thaw_supers(void);

extern int dcache_dir_open(struct inode *, struct file *);
extern int dcache_dir_close(struct inode *, struct file *);
Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -12,10 +12,10 @@
#include <linux/oom.h>
#include <linux/suspend.h>
#include <linux/module.h>
-#include <linux/syscalls.h>
#include <linux/freezer.h>
#include <linux/delay.h>
#include <linux/workqueue.h>
+#include <linux/fs.h>

/*
* Timeout for stopping processes
@@ -147,6 +147,12 @@ int freeze_processes(void)
goto Exit;
printk("done.\n");

+ printk("Freezing filesystems ... ");
+ error = freeze_supers();
+ if (error)
+ goto Exit;
+ printk("done.\n");
+
printk("Freezing remaining freezable tasks ... ");
error = try_to_freeze_tasks(false);
if (error)
@@ -188,6 +194,7 @@ void thaw_processes(void)
printk("Restarting tasks ... ");
thaw_workqueues();
thaw_tasks(true);
+ thaw_supers();
thaw_tasks(false);
schedule();
printk("done.\n");
Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c
+++ linux/fs/super.c
@@ -590,6 +590,76 @@ void iterate_supers_type(struct file_sys
EXPORT_SYMBOL(iterate_supers_type);

/**
+ * freeze_supers - call freeze_super() for all superblocks
+ */
+int freeze_supers(void)
+{
+ struct super_block *sb, *p = NULL;
+ int error = 0;
+
+ spin_lock(&sb_lock);
+ /*
+ * Freeze in reverse order so filesystems depending on others are
+ * frozen in the right order (eg. loopback on ext3).
+ */
+ list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+ if (list_empty(&sb->s_instances))
+ continue;
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+
+ if (sb->s_root && sb->s_frozen != SB_FREEZE_TRANS
+ && !(sb->s_flags & MS_RDONLY)) {
+ error = freeze_super(sb);
+ if (!error)
+ sb->s_flags |= MS_FROZEN;
+ }
+
+ spin_lock(&sb_lock);
+ if (error)
+ break;
+ if (p)
+ __put_super(p);
+ p = sb;
+ }
+ if (p)
+ __put_super(p);
+ spin_unlock(&sb_lock);
+
+ return error;
+}
+
+/**
+ * thaw_supers - call thaw_super() for all superblocks
+ */
+void thaw_supers(void)
+{
+ struct super_block *sb, *p = NULL;
+
+ spin_lock(&sb_lock);
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if (list_empty(&sb->s_instances))
+ continue;
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+
+ if (sb->s_flags & MS_FROZEN) {
+ thaw_super(sb);
+ sb->s_flags &= ~MS_FROZEN;
+ }
+
+ spin_lock(&sb_lock);
+ if (p)
+ __put_super(p);
+ p = sb;
+ }
+ if (p)
+ __put_super(p);
+ spin_unlock(&sb_lock);
+}
+
+
+/**
* get_super - get the superblock of a device
* @bdev: device to get the superblock for
*

2011-09-24 22:54:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> >
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> >
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume). In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> >
> > This change is based on earlier work by Nigel Cunningham.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---

Below is an alternative fix, the changelog pretty much explains the idea.

I've tested it on Toshiba Portege R500, but I don't have an XFS partition
to verify that it really helps, so I'd appreciate it if someone able to
reproduce the original issue could test it and report back.

Thanks,
Rafael

---
From: Rafael J. Wysocki <[email protected]>
Subject: PM / Hibernate: Freeze kernel threads after preallocating memory

There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers. Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out. Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation. If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
include/linux/freezer.h | 4 +++-
kernel/power/hibernate.c | 12 ++++++++----
kernel/power/power.h | 3 ++-
kernel/power/process.c | 30 ++++++++++++++++++++----------
4 files changed, 33 insertions(+), 16 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
}

/**
- * freeze_processes - tell processes to enter the refrigerator
+ * freeze_processes - Signal user space processes to enter the refrigerator.
*/
int freeze_processes(void)
{
@@ -143,20 +143,30 @@ int freeze_processes(void)

printk("Freezing user space processes ... ");
error = try_to_freeze_tasks(true);
- if (error)
- goto Exit;
- printk("done.\n");
+ if (!error) {
+ printk("done.");
+ oom_killer_disable();
+ }
+ printk("\n");
+ BUG_ON(in_atomic());
+
+ return error;
+}
+
+/**
+ * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
+ */
+int freeze_kernel_threads(void)
+{
+ int error;

printk("Freezing remaining freezable tasks ... ");
error = try_to_freeze_tasks(false);
- if (error)
- goto Exit;
- printk("done.");
+ if (!error)
+ printk("done.");

- oom_killer_disable();
- Exit:
- BUG_ON(in_atomic());
printk("\n");
+ BUG_ON(in_atomic());

return error;
}
Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru

extern void refrigerator(void);
extern int freeze_processes(void);
+extern int freeze_kernel_threads(void);
extern void thaw_processes(void);

static inline int try_to_freeze(void)
@@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
static inline int thaw_process(struct task_struct *p) { return 1; }

static inline void refrigerator(void) {}
-static inline int freeze_processes(void) { BUG(); return 0; }
+static inline int freeze_processes(void) { return -ENOSYS; }
+static inline int freeze_kernel_threads(void) { return -ENOSYS; }
static inline void thaw_processes(void) {}

static inline int try_to_freeze(void) { return 0; }
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -228,7 +228,8 @@ extern int pm_test_level;
#ifdef CONFIG_SUSPEND_FREEZER
static inline int suspend_freeze_processes(void)
{
- return freeze_processes();
+ int error = freeze_processes();
+ return error ? : freeze_kernel_threads();
}

static inline void suspend_thaw_processes(void)
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
if (error)
goto Close;

- error = dpm_prepare(PMSG_FREEZE);
- if (error)
- goto Complete_devices;

2011-09-25 05:32:39

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

Hi.

On 25/09/11 08:56, Rafael J. Wysocki wrote:
> On Sunday, August 07, 2011, Dave Chinner wrote:
>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <[email protected]>
>>>
>>> Freeze all filesystems during the freezing of tasks by calling
>>> freeze_bdev() for each of them and thaw them during the thawing
>>> of tasks with the help of thaw_bdev().
>>>
>>> This is needed by hibernation, because some filesystems (e.g. XFS)
>>> deadlock with the preallocation of memory used by it if the memory
>>> pressure caused by it is too heavy.
>>>
>>> The additional benefit of this change is that, if something goes
>>> wrong after filesystems have been frozen, they will stay in a
>>> consistent state and journal replays won't be necessary (e.g. after
>>> a failing suspend or resume). In particular, this should help to
>>> solve a long-standing issue that in some cases during resume from
>>> hibernation the boot loader causes the journal to be replied for the
>>> filesystem containing the kernel image and initrd causing it to
>>> become inconsistent with the information stored in the hibernation
>>> image.
>>>
>>> This change is based on earlier work by Nigel Cunningham.
>>>
>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>>> ---
>
> Below is an alternative fix, the changelog pretty much explains the idea.
>
> I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> to verify that it really helps, so I'd appreciate it if someone able to
> reproduce the original issue could test it and report back.
>
> Thanks,
> Rafael
>
> ---
> From: Rafael J. Wysocki <[email protected]>
> Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
>
> There is a problem with the current ordering of hibernate code which
> leads to deadlocks in some filesystems' memory shrinkers. Namely,
> some filesystems use freezable kernel threads that are inactive when
> the hibernate memory preallocation is carried out. Those same
> filesystems use memory shrinkers that may be triggered by the
> hibernate memory preallocation. If those memory shrinkers wait for
> the frozen kernel threads, the hibernate process deadlocks (this
> happens with XFS, for one example).
>
> Apparently, it is not technically viable to redesign the filesystems
> in question to avoid the situation described above, so the only
> possible solution of this issue is to defer the freezing of kernel
> threads until the hibernate memory preallocation is done, which is
> implemented by this change.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

TuxOnIce has the following logic at the moment: Freeze all threads.
Calculate whether we have enough memory for the image, thaw kernel
threads only, free memory and try again if it looks like we don't have
enough. I've never heard of a deadlock, though I suppose one would be
possible if you had the added complication of userspace
drivers/filesystems - it would be good to be able to distinguish and
thaw them.

It does this prior to the atomic copy, using a user-supplied estimate of
the amount of memory drivers will need - the actual amount used is show
in debugging info at the end of the cycle. Apart from that, if you have
everything else frozen, everything else is pretty deterministic
(assuming you don't have any memory leaks in your image-writing code).

Regards,

Nigel
--
Evolution (n): A hypothetical process whereby improbable
events occur with alarming frequency, order arises from chaos, and
no one is given credit.

2011-09-25 13:30:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Sunday, September 25, 2011, Christoph wrote:
> test results of the patch below:
>
> 1. real machine
>
> suspends fine but on wakeup, after loading image: hard reset.
> nvidia gpu => disabled compitz => wakeup worked two times.

Hmm, so there's a separate bug related to NVidia I guess.

> 2. virtualbox / stress test / xfs and ext4
>
> on 3rd resume, it booted up "normal" like this:
>
> [ 3.351813] Freeing unused kernel memory: 568k freed
> [ 3.460973] Freeing unused kernel memory: 284k freed
>
> [ 17.328356] PM: Preparing processes for restore.
>
> [ 17.328357] Freezing user space processes ...
> [ 37.345414] Freezing of tasks failed after 20.01 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [ 37.475244] ffff88001f06fd68 0000000000000086 0000000000000000
> 0000000000000000
> [ 37.526163] ffff88001f06e010 ffff88001f4c4410 0000000000012ec0
> ffff88001f06ffd8
> [ 37.580110] ffff88001f06ffd8 0000000000012ec0 ffffffff8160d020
> ffff88001f4c4410
> [ 37.626167] Call Trace:
> [ 37.626769] [<ffffffff81049944>] schedule+0x55/0x57
> [ 37.674925] [<ffffffff81360dbe>] __mutex_lock_common+0x117/0x178
> [ 37.792559] [<ffffffff81113ef2>] ? user_path_at+0x61/0x90
> [ 37.888501] [<ffffffff81360e35>] __mutex_lock_slowpath+0x16/0x18
> [ 37.986966] [<ffffffff81360efb>] mutex_lock+0x1e/0x32
> [ 38.086931] [<ffffffffa00a4d43>] show_manufacturer+0x23/0x51 [usbcore]
> [ 38.212500] [<ffffffff8125cd44>] dev_attr_show+0x22/0x49
> [ 38.282319] [<ffffffff810c6f8c>] ? __get_free_pages+0x9/0x38
> [ 38.397449] [<ffffffff8115febb>] sysfs_read_file+0xa9/0x12b
> [ 38.491607] [<ffffffff81107ff6>] vfs_read+0xa6/0x102
> [ 38.541994] [<ffffffff81105ebf>] ? do_sys_open+0xee/0x100
> [ 38.564907] [<ffffffff8110810b>] sys_read+0x45/0x6c
> [ 38.578397] [<ffffffff81368412>] system_call_fastpath+0x16/0x1b
> [ 38.590083]
> [ 38.598046] Restarting tasks ... done.
> [ 38.660448] XFS (sda3): Mounting Filesystem
>
> restarted the test runs, increased delay between awake and sleep from 20
> to 25 sec:
>
> 36 time successful hibernate+resume so far.

OK, cool. Thanks for testing!

Rafael


> On 25.09.2011 00:56, Rafael J. Wysocki wrote:
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <[email protected]>
> >>>
> >>> Freeze all filesystems during the freezing of tasks by calling
> >>> freeze_bdev() for each of them and thaw them during the thawing
> >>> of tasks with the help of thaw_bdev().
> >>>
> >>> This is needed by hibernation, because some filesystems (e.g. XFS)
> >>> deadlock with the preallocation of memory used by it if the memory
> >>> pressure caused by it is too heavy.
> >>>
> >>> The additional benefit of this change is that, if something goes
> >>> wrong after filesystems have been frozen, they will stay in a
> >>> consistent state and journal replays won't be necessary (e.g. after
> >>> a failing suspend or resume). In particular, this should help to
> >>> solve a long-standing issue that in some cases during resume from
> >>> hibernation the boot loader causes the journal to be replied for the
> >>> filesystem containing the kernel image and initrd causing it to
> >>> become inconsistent with the information stored in the hibernation
> >>> image.
> >>>
> >>> This change is based on earlier work by Nigel Cunningham.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <[email protected]>
> >>> ---
> >
> > Below is an alternative fix, the changelog pretty much explains the idea.
> >
> > I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> > to verify that it really helps, so I'd appreciate it if someone able to
> > reproduce the original issue could test it and report back.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > From: Rafael J. Wysocki <[email protected]>
> > Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> >
> > There is a problem with the current ordering of hibernate code which
> > leads to deadlocks in some filesystems' memory shrinkers. Namely,
> > some filesystems use freezable kernel threads that are inactive when
> > the hibernate memory preallocation is carried out. Those same
> > filesystems use memory shrinkers that may be triggered by the
> > hibernate memory preallocation. If those memory shrinkers wait for
> > the frozen kernel threads, the hibernate process deadlocks (this
> > happens with XFS, for one example).
> >
> > Apparently, it is not technically viable to redesign the filesystems
> > in question to avoid the situation described above, so the only
> > possible solution of this issue is to defer the freezing of kernel
> > threads until the hibernate memory preallocation is done, which is
> > implemented by this change.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > include/linux/freezer.h | 4 +++-
> > kernel/power/hibernate.c | 12 ++++++++----
> > kernel/power/power.h | 3 ++-
> > kernel/power/process.c | 30 ++++++++++++++++++++----------
> > 4 files changed, 33 insertions(+), 16 deletions(-)
> >
> > Index: linux/kernel/power/process.c
> > ===================================================================
> > --- linux.orig/kernel/power/process.c
> > +++ linux/kernel/power/process.c
> > @@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
> > }
> >
> > /**
> > - * freeze_processes - tell processes to enter the refrigerator
> > + * freeze_processes - Signal user space processes to enter the refrigerator.
> > */
> > int freeze_processes(void)
> > {
> > @@ -143,20 +143,30 @@ int freeze_processes(void)
> >
> > printk("Freezing user space processes ... ");
> > error = try_to_freeze_tasks(true);
> > - if (error)
> > - goto Exit;
> > - printk("done.\n");
> > + if (!error) {
> > + printk("done.");
> > + oom_killer_disable();
> > + }
> > + printk("\n");
> > + BUG_ON(in_atomic());
> > +
> > + return error;
> > +}
> > +
> > +/**
> > + * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
> > + */
> > +int freeze_kernel_threads(void)
> > +{
> > + int error;
> >
> > printk("Freezing remaining freezable tasks ... ");
> > error = try_to_freeze_tasks(false);
> > - if (error)
> > - goto Exit;
> > - printk("done.");
> > + if (!error)
> > + printk("done.");
> >
> > - oom_killer_disable();
> > - Exit:
> > - BUG_ON(in_atomic());
> > printk("\n");
> > + BUG_ON(in_atomic());
> >
> > return error;
> > }
> > Index: linux/include/linux/freezer.h
> > ===================================================================
> > --- linux.orig/include/linux/freezer.h
> > +++ linux/include/linux/freezer.h
> > @@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
> >
> > extern void refrigerator(void);
> > extern int freeze_processes(void);
> > +extern int freeze_kernel_threads(void);
> > extern void thaw_processes(void);
> >
> > static inline int try_to_freeze(void)
> > @@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
> > static inline int thaw_process(struct task_struct *p) { return 1; }
> >
> > static inline void refrigerator(void) {}
> > -static inline int freeze_processes(void) { BUG(); return 0; }
> > +static inline int freeze_processes(void) { return -ENOSYS; }
> > +static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> > static inline void thaw_processes(void) {}
> >
> > static inline int try_to_freeze(void) { return 0; }
> > Index: linux/kernel/power/power.h
> > ===================================================================
> > --- linux.orig/kernel/power/power.h
> > +++ linux/kernel/power/power.h
> > @@ -228,7 +228,8 @@ extern int pm_test_level;
> > #ifdef CONFIG_SUSPEND_FREEZER
> > static inline int suspend_freeze_processes(void)
> > {
> > - return freeze_processes();
> > + int error = freeze_processes();
> > + return error ? : freeze_kernel_threads();
> > }
> >
> > static inline void suspend_thaw_processes(void)
> > Index: linux/kernel/power/hibernate.c
> > ===================================================================
> > --- linux.orig/kernel/power/hibernate.c
> > +++ linux/kernel/power/hibernate.c
> > @@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
> > if (error)
> > goto Close;
> >
> > - error = dpm_prepare(PMSG_FREEZE);
> > - if (error)
> > - goto Complete_devices;
> > -
> > /* Preallocate image memory before shutting down devices. */
> > error = hibernate_preallocate_memory();
> > if (error)
> > + goto Close;
> > +
> > + error = freeze_kernel_threads();
> > + if (error)
> > + goto Close;
> > +
> > + error = dpm_prepare(PMSG_FREEZE);
> > + if (error)
> > goto Complete_devices;
> >
> > suspend_console();
>


2011-09-25 13:35:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Sunday, September 25, 2011, Nigel Cunningham wrote:
> Hi.
>
> On 25/09/11 08:56, Rafael J. Wysocki wrote:
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <[email protected]>
> >>>
> >>> Freeze all filesystems during the freezing of tasks by calling
> >>> freeze_bdev() for each of them and thaw them during the thawing
> >>> of tasks with the help of thaw_bdev().
> >>>
> >>> This is needed by hibernation, because some filesystems (e.g. XFS)
> >>> deadlock with the preallocation of memory used by it if the memory
> >>> pressure caused by it is too heavy.
> >>>
> >>> The additional benefit of this change is that, if something goes
> >>> wrong after filesystems have been frozen, they will stay in a
> >>> consistent state and journal replays won't be necessary (e.g. after
> >>> a failing suspend or resume). In particular, this should help to
> >>> solve a long-standing issue that in some cases during resume from
> >>> hibernation the boot loader causes the journal to be replied for the
> >>> filesystem containing the kernel image and initrd causing it to
> >>> become inconsistent with the information stored in the hibernation
> >>> image.
> >>>
> >>> This change is based on earlier work by Nigel Cunningham.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <[email protected]>
> >>> ---
> >
> > Below is an alternative fix, the changelog pretty much explains the idea.
> >
> > I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> > to verify that it really helps, so I'd appreciate it if someone able to
> > reproduce the original issue could test it and report back.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > From: Rafael J. Wysocki <[email protected]>
> > Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> >
> > There is a problem with the current ordering of hibernate code which
> > leads to deadlocks in some filesystems' memory shrinkers. Namely,
> > some filesystems use freezable kernel threads that are inactive when
> > the hibernate memory preallocation is carried out. Those same
> > filesystems use memory shrinkers that may be triggered by the
> > hibernate memory preallocation. If those memory shrinkers wait for
> > the frozen kernel threads, the hibernate process deadlocks (this
> > happens with XFS, for one example).
> >
> > Apparently, it is not technically viable to redesign the filesystems
> > in question to avoid the situation described above, so the only
> > possible solution of this issue is to defer the freezing of kernel
> > threads until the hibernate memory preallocation is done, which is
> > implemented by this change.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> TuxOnIce has the following logic at the moment: Freeze all threads.
> Calculate whether we have enough memory for the image, thaw kernel
> threads only, free memory and try again if it looks like we don't have
> enough.

Well, it seems that the freezing of kernel threads in the first step
is not necessary. You can do (1) freeze user space, (2) check if there's
enough free memory, (3) free memory if necessary, (4) freeze kernel
threads instead. Which is what my patch does, actually. :-)

> I've never heard of a deadlock, though I suppose one would be
> possible if you had the added complication of userspace
> drivers/filesystems - it would be good to be able to distinguish and
> thaw them.

Yes, there is a known problem with FUSE in that area.

> It does this prior to the atomic copy, using a user-supplied estimate of
> the amount of memory drivers will need - the actual amount used is show
> in debugging info at the end of the cycle. Apart from that, if you have
> everything else frozen, everything else is pretty deterministic
> (assuming you don't have any memory leaks in your image-writing code).

Thanks,
Rafael

2011-09-25 13:40:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Update][PATCH] PM / Hibernate: Freeze kernel threads after preallocating memory

From: Rafael J. Wysocki <[email protected]>

There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers. Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out. Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation. If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/power/devices.txt | 4 ----
include/linux/freezer.h | 4 +++-
kernel/power/hibernate.c | 12 ++++++++----
kernel/power/power.h | 3 ++-
kernel/power/process.c | 30 ++++++++++++++++++++----------
5 files changed, 33 insertions(+), 20 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
}

/**
- * freeze_processes - tell processes to enter the refrigerator
+ * freeze_processes - Signal user space processes to enter the refrigerator.
*/
int freeze_processes(void)
{
@@ -143,20 +143,30 @@ int freeze_processes(void)

printk("Freezing user space processes ... ");
error = try_to_freeze_tasks(true);
- if (error)
- goto Exit;
- printk("done.\n");
+ if (!error) {
+ printk("done.");
+ oom_killer_disable();
+ }
+ printk("\n");
+ BUG_ON(in_atomic());
+
+ return error;
+}
+
+/**
+ * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
+ */
+int freeze_kernel_threads(void)
+{
+ int error;

printk("Freezing remaining freezable tasks ... ");
error = try_to_freeze_tasks(false);
- if (error)
- goto Exit;
- printk("done.");
+ if (!error)
+ printk("done.");

- oom_killer_disable();
- Exit:
- BUG_ON(in_atomic());
printk("\n");
+ BUG_ON(in_atomic());

return error;
}
Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru

extern void refrigerator(void);
extern int freeze_processes(void);
+extern int freeze_kernel_threads(void);
extern void thaw_processes(void);

static inline int try_to_freeze(void)
@@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
static inline int thaw_process(struct task_struct *p) { return 1; }

static inline void refrigerator(void) {}
-static inline int freeze_processes(void) { BUG(); return 0; }
+static inline int freeze_processes(void) { return -ENOSYS; }
+static inline int freeze_kernel_threads(void) { return -ENOSYS; }
static inline void thaw_processes(void) {}

static inline int try_to_freeze(void) { return 0; }
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -228,7 +228,8 @@ extern int pm_test_level;
#ifdef CONFIG_SUSPEND_FREEZER
static inline int suspend_freeze_processes(void)
{
- return freeze_processes();
+ int error = freeze_processes();
+ return error ? : freeze_kernel_threads();
}

static inline void suspend_thaw_processes(void)
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
if (error)
goto Close;

- error = dpm_prepare(PMSG_FREEZE);
- if (error)
- goto Complete_devices;
-
/* Preallocate image memory before shutting down devices. */
error = hibernate_preallocate_memory();
if (error)
+ goto Close;
+
+ error = freeze_kernel_threads();
+ if (error)
+ goto Close;
+
+ error = dpm_prepare(PMSG_FREEZE);
+ if (error)
goto Complete_devices;

suspend_console();
Index: linux/Documentation/power/devices.txt
===================================================================
--- linux.orig/Documentation/power/devices.txt
+++ linux/Documentation/power/devices.txt
@@ -279,10 +279,6 @@ When the system goes into the standby or
time.) Unlike the other suspend-related phases, during the prepare
phase the device tree is traversed top-down.

- In addition to that, if device drivers need to allocate additional
- memory to be able to hadle device suspend correctly, that should be
- done in the prepare phase.
-
After the prepare callback method returns, no new children may be
registered below the device. The method may also prepare the device or
driver in some way for the upcoming system power transition (for

2011-09-25 21:57:35

by Christoph

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On 25.09.2011 15:32, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> test results of the patch below:
>>
>> 1. real machine
>>
>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>> gpu => disabled compitz => wakeup worked two times.
>
> Hmm, so there's a separate bug related to NVidia I guess.

Maybe.

Just made another test: the machine (macbook) woke up, loaded image, thaw.
It got stuck at vt#1, displaying console with login. Cursor blinking, but
no (usb) keyboard or network enabled. Bricked?!!


On the other hand I've got another box with nvidia gpu:

debian5 32bit
2.6.38.2+ #3 SMP Fri Apr 1
nvidia 260.19.36

It's up since I compiled the kernel: I use it twice a week and I kept it
freezed all the time. It was solid rock until today: hard reset on resume.
WTF? (I remember this version combo was stable on the macbook but the
kernel lacks a solid wireless driver).

There are a stupid situation where you can't debug. What else can go
wrong? This is off topic but it's a cute kernel crash while I gave the
nouveau driver a chance:

http://events.ccc.de/camp/2011/wiki/File:Dome22.jpg

chris

http://events.ccc.de/camp/2011/wiki/DomeTent

>
>> 2. virtualbox / stress test / xfs and ext4
>>
>> on 3rd resume, it booted up "normal" like this:
>>
>> [ 3.351813] Freeing unused kernel memory: 568k freed [
>> 3.460973] Freeing unused kernel memory: 284k freed
>>
>> [ 17.328356] PM: Preparing processes for restore.
>>

...

2011-09-25 22:08:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Sunday, September 25, 2011, Christoph wrote:
> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
> > On Sunday, September 25, 2011, Christoph wrote:
> >> test results of the patch below:
> >>
> >> 1. real machine
> >>
> >> suspends fine but on wakeup, after loading image: hard reset. nvidia
> >> gpu => disabled compitz => wakeup worked two times.
> >
> > Hmm, so there's a separate bug related to NVidia I guess.
>
> Maybe.
>
> Just made another test: the machine (macbook) woke up, loaded image, thaw.
> It got stuck at vt#1, displaying console with login. Cursor blinking, but
> no (usb) keyboard or network enabled. Bricked?!!
>
>
> On the other hand I've got another box with nvidia gpu:
>
> debian5 32bit
> 2.6.38.2+ #3 SMP Fri Apr 1
> nvidia 260.19.36
>
> It's up since I compiled the kernel: I use it twice a week and I kept it
> freezed all the time. It was solid rock until today: hard reset on resume.
> WTF? (I remember this version combo was stable on the macbook but the
> kernel lacks a solid wireless driver).

If that's an x86_64 system, there is a known bug causing problems like
this to happen. There's a patch fixing it, but not conclusive:
http://marc.info/?l=linux-kernel&m=131653513414314&w=2

Thanks,
Rafael

2011-09-26 05:27:30

by Christoph

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)


On 26.09.2011 00:10, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
>>> On Sunday, September 25, 2011, Christoph wrote:
>>>> test results of the patch below:
>>>>
>>>> 1. real machine
>>>>
>>>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>>>> gpu => disabled compitz => wakeup worked two times.
>>>
>>> Hmm, so there's a separate bug related to NVidia I guess.
>>
>> Maybe.
>>
>> Just made another test: the machine (macbook) woke up, loaded image, thaw.
>> It got stuck at vt#1, displaying console with login. Cursor blinking, but
>> no (usb) keyboard or network enabled. Bricked?!!
>>
>>
>> On the other hand I've got another box with nvidia gpu:
>>
>> debian5 32bit
>> 2.6.38.2+ #3 SMP Fri Apr 1
>> nvidia 260.19.36
>>
>> It's up since I compiled the kernel: I use it twice a week and I kept it
>> freezed all the time. It was solid rock until today: hard reset on resume.
>> WTF? (I remember this version combo was stable on the macbook but the
>> kernel lacks a solid wireless driver).
>
> If that's an x86_64 system, there is a known bug causing problems like
> this to happen. There's a patch fixing it, but not conclusive:
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2

very good. seems to fix resume, at least the 2 times I tested so far.

tx! :)

chris


>
> Thanks,
> Rafael
>
> _______________________________________________
> xfs mailing list
> [email protected]
> http://oss.sgi.com/mailman/listinfo/xfs

2011-10-22 15:14:32

by Christoph

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

> PM / Freezer: Freeze filesystems while freezing processes (v2)
>
> On Sunday, August 07, 2011, Dave Chinner wrote:
>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <[email protected]>
>>>
>>> Freeze all filesystems during the freezing of tasks by calling
>>> freeze_bdev() for each of them and thaw them during the thawing of
>>> tasks with the help of thaw_bdev().
>>>
>>> This is needed by hibernation, because some filesystems (e.g. XFS)
>>> deadlock with the preallocation of memory used by it if the memory
>>> pressure caused by it is too heavy.
>>>
...
>
> Below is an alternative fix, the changelog pretty much explains the
> idea.
>
> I've tested it on Toshiba Portege R500, but I don't have an XFS
> partition to verify that it really helps, so I'd appreciate it if
> someone able to reproduce the original issue could test it and report
> back.

Hi Rafael!

Well, the kernel bugtracker is still down and I just like to post my
experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
MacBookPro, doomed with 4GB RAM running debian.

Bug #1

on the way to hibernate, machine hangs on

"PM: Preallocating image memory..."

this patch worked for me now for weeks:
"[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
https://lkml.org/lkml/2011/9/24/77

I was able to reproduce this bug with virtualbox and tested the patch ~40
cycles.

Bug#2

on resume from hibernate, hard reset (x64 only):
http://marc.info/?l=linux-kernel&m=131653513414314&w=2

With this patch I haven't got this issue again the last weeks.

I wasn't able to reproduce this bug with virtualbox.





I only got one pm-hibernate issue. Last line:

Disabling non-boot CPUs ...

This time I've enabled debug hung task :)

schedule_timeout
...
workqueue_cpu_callback
notifier_call_chain
...
__cpu_notify
_cpu_down
printk
disable_nonboot_cpus
hibernation_snapshot
hibernate
...

Any other idea besides the possibility it's caused by evil earth
radiation, isn't it?


Gruss,
chris




On 26.09.2011 00:10, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
>>> On Sunday, September 25, 2011, Christoph wrote:
>>>> test results of the patch below:
>>>>
>>>> 1. real machine
>>>>
>>>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>>>> gpu => disabled compitz => wakeup worked two times.
>>>
>>> Hmm, so there's a separate bug related to NVidia I guess.
>>
>> Maybe.
>>
>> Just made another test: the machine (macbook) woke up, loaded image, thaw.
>> It got stuck at vt#1, displaying console with login. Cursor blinking, but
>> no (usb) keyboard or network enabled. Bricked?!!
>>
>>
>> On the other hand I've got another box with nvidia gpu:
>>
>> debian5 32bit
>> 2.6.38.2+ #3 SMP Fri Apr 1
>> nvidia 260.19.36
>>
>> It's up since I compiled the kernel: I use it twice a week and I kept it
>> freezed all the time. It was solid rock until today: hard reset on resume.
>> WTF? (I remember this version combo was stable on the macbook but the
>> kernel lacks a solid wireless driver).
>
> If that's an x86_64 system, there is a known bug causing problems like
> this to happen. There's a patch fixing it, but not conclusive:
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2
>
> Thanks,
> Rafael
>
> _______________________________________________
> xfs mailing list
> [email protected]
> http://oss.sgi.com/mailman/listinfo/xfs

2011-10-22 21:33:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Saturday, October 22, 2011, Christoph wrote:
> > PM / Freezer: Freeze filesystems while freezing processes (v2)
> >
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <[email protected]>
> >>>
> >>> Freeze all filesystems during the freezing of tasks by calling
> >>> freeze_bdev() for each of them and thaw them during the thawing of
> >>> tasks with the help of thaw_bdev().
> >>>
> >>> This is needed by hibernation, because some filesystems (e.g. XFS)
> >>> deadlock with the preallocation of memory used by it if the memory
> >>> pressure caused by it is too heavy.
> >>>
> ...
> >
> > Below is an alternative fix, the changelog pretty much explains the
> > idea.
> >
> > I've tested it on Toshiba Portege R500, but I don't have an XFS
> > partition to verify that it really helps, so I'd appreciate it if
> > someone able to reproduce the original issue could test it and report
> > back.
>
> Hi Rafael!
>
> Well, the kernel bugtracker is still down and I just like to post my
> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
> MacBookPro, doomed with 4GB RAM running debian.
>
> Bug #1
>
> on the way to hibernate, machine hangs on
>
> "PM: Preallocating image memory..."
>
> this patch worked for me now for weeks:
> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
> https://lkml.org/lkml/2011/9/24/77

This patch is going to be merged into 3.2.

> I was able to reproduce this bug with virtualbox and tested the patch ~40
> cycles.
>
> Bug#2
>
> on resume from hibernate, hard reset (x64 only):
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2
>
> With this patch I haven't got this issue again the last weeks.

Hmm. This issue appears to be still under investigation to me, but perhaps
that's taken too much already.

Takashi, perhaps you can repost the patch as a proper submission? It would
be good to have this regression fixed even if we don't know the real source of
it.

> I wasn't able to reproduce this bug with virtualbox.
>
>
>
>
>
> I only got one pm-hibernate issue. Last line:
>
> Disabling non-boot CPUs ...
>
> This time I've enabled debug hung task :)
>
> schedule_timeout
> ...
> workqueue_cpu_callback
> notifier_call_chain
> ...
> __cpu_notify
> _cpu_down
> printk
> disable_nonboot_cpus
> hibernation_snapshot
> hibernate
> ...
>
> Any other idea besides the possibility it's caused by evil earth
> radiation, isn't it?

I'm not exactly sure what happened from your description, care to explain?

Rafael

2011-11-16 13:49:23

by Ferenc Wagner

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

"Rafael J. Wysocki" <[email protected]> writes:

> On Saturday, October 22, 2011, Christoph wrote:
>
>>> PM / Freezer: Freeze filesystems while freezing processes (v2)
>>>
>>> On Sunday, August 07, 2011, Dave Chinner wrote:
>>>
>>>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>>>
>>>>> From: Rafael J. Wysocki <[email protected]>
>>>>>
>>>>> Freeze all filesystems during the freezing of tasks by calling
>>>>> freeze_bdev() for each of them and thaw them during the thawing of
>>>>> tasks with the help of thaw_bdev().
>>>>>
>>>>> This is needed by hibernation, because some filesystems (e.g. XFS)
>>>>> deadlock with the preallocation of memory used by it if the memory
>>>>> pressure caused by it is too heavy.
>>>
>>> Below is an alternative fix, the changelog pretty much explains the
>>> idea.
>>>
>>> I've tested it on Toshiba Portege R500, but I don't have an XFS
>>> partition to verify that it really helps, so I'd appreciate it if
>>> someone able to reproduce the original issue could test it and report
>>> back.
>>
>> Well, the kernel bugtracker is still down and I just like to post my
>> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
>> MacBookPro, doomed with 4GB RAM running debian.
>>
>> Bug #1
>>
>> on the way to hibernate, machine hangs on
>>
>> "PM: Preallocating image memory..."
>>
>> this patch worked for me now for weeks:
>> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
>> https://lkml.org/lkml/2011/9/24/77
>
> This patch is going to be merged into 3.2.

Hi,

I was the original reporter of the Bugzilla issue, just didn't know
about this thread until recently. Anyway, I'm running 3.2-rc1 now,
which contains the alternative fix, and I can confirm that it indeed
works: hibernation does not deadlock on my XFS rooted system anymore
during memory preallocation. Thanks everybody for his or her work on
the issue!

To add something still, preallocation now ends with a couple of seconds
of heavy disk activity, but with several seconds of total inactivity
beforehand. Is this warranted by some CPU intensive task at that stage?
--
Thanks,
Feri.

2011-11-16 21:47:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

On Wednesday, November 16, 2011, Ferenc Wagner wrote:
> "Rafael J. Wysocki" <[email protected]> writes:
>
> > On Saturday, October 22, 2011, Christoph wrote:
> >
> >>> PM / Freezer: Freeze filesystems while freezing processes (v2)
> >>>
> >>> On Sunday, August 07, 2011, Dave Chinner wrote:
> >>>
> >>>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>>>
> >>>>> From: Rafael J. Wysocki <[email protected]>
> >>>>>
> >>>>> Freeze all filesystems during the freezing of tasks by calling
> >>>>> freeze_bdev() for each of them and thaw them during the thawing of
> >>>>> tasks with the help of thaw_bdev().
> >>>>>
> >>>>> This is needed by hibernation, because some filesystems (e.g. XFS)
> >>>>> deadlock with the preallocation of memory used by it if the memory
> >>>>> pressure caused by it is too heavy.
> >>>
> >>> Below is an alternative fix, the changelog pretty much explains the
> >>> idea.
> >>>
> >>> I've tested it on Toshiba Portege R500, but I don't have an XFS
> >>> partition to verify that it really helps, so I'd appreciate it if
> >>> someone able to reproduce the original issue could test it and report
> >>> back.
> >>
> >> Well, the kernel bugtracker is still down and I just like to post my
> >> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
> >> MacBookPro, doomed with 4GB RAM running debian.
> >>
> >> Bug #1
> >>
> >> on the way to hibernate, machine hangs on
> >>
> >> "PM: Preallocating image memory..."
> >>
> >> this patch worked for me now for weeks:
> >> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
> >> https://lkml.org/lkml/2011/9/24/77
> >
> > This patch is going to be merged into 3.2.
>
> Hi,
>
> I was the original reporter of the Bugzilla issue, just didn't know
> about this thread until recently. Anyway, I'm running 3.2-rc1 now,
> which contains the alternative fix, and I can confirm that it indeed
> works: hibernation does not deadlock on my XFS rooted system anymore
> during memory preallocation. Thanks everybody for his or her work on
> the issue!
>
> To add something still, preallocation now ends with a couple of seconds
> of heavy disk activity, but with several seconds of total inactivity
> beforehand. Is this warranted by some CPU intensive task at that stage?

Quilte frankly, I have no idea.

Thanks,
Rafael