LinuxLists.cc - Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Sun, 26 Oct 2008, Rafael J. Wysocki wrote:
> On Saturday, 25 of October 2008, Nigel Cunningham wrote:
> > While working on freezing fuse filesystems, I found that if a filesystem
> > is frozen when we try to freeze processes, freezing can fail because
> > threads are waiting in vfs_check_frozen for the filesystem to be thawed.
> > We should thus not count such threads.
> >
> > The check will be safe if a filesystem is thawed while we're freezing
> > processes because filesystem thaws are only invoked from userspace. Any
> > waiting processes will be woken and frozen prior to us completing the
> > freezing of userspace (the caller invoking the filesystem thaw will be
> > freezing) or - in the worst case - together with kernel threads.

The description is missing some details: why is the filesystem frozen
before suspend? AFAICS this can happen when DM calls bdev_freeze() on
the device before the task freezing begins. Is this the case?

Also, while the patch might solve some of the symptoms of the fuse
vs. process freezer interaction, it will not fully fix that problem.
As such it's just a hack to hide the problem, making it less likely to
appear.

So I'm not at all conviced about this patch.

Thanks,
Miklos

2008-10-27 11:20:57

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi Miklos.

On Mon, 2008-10-27 at 12:12 +0100, Miklos Szeredi wrote:
> On Sun, 26 Oct 2008, Rafael J. Wysocki wrote:
> > On Saturday, 25 of October 2008, Nigel Cunningham wrote:
> > > While working on freezing fuse filesystems, I found that if a filesystem
> > > is frozen when we try to freeze processes, freezing can fail because
> > > threads are waiting in vfs_check_frozen for the filesystem to be thawed.
> > > We should thus not count such threads.
> > >
> > > The check will be safe if a filesystem is thawed while we're freezing
> > > processes because filesystem thaws are only invoked from userspace. Any
> > > waiting processes will be woken and frozen prior to us completing the
> > > freezing of userspace (the caller invoking the filesystem thaw will be
> > > freezing) or - in the worst case - together with kernel threads.
>
> The description is missing some details: why is the filesystem frozen
> before suspend? AFAICS this can happen when DM calls bdev_freeze() on
> the device before the task freezing begins. Is this the case?

It doesn't matter why a process is sitting in that wait_event call. What
does matter is that one can be there. In the case where I saw it, I was
working on fuse freezing. I don't remember the details, as it's a year
since I made this patch, but I don't think I wasn't using fuse or DM.

> Also, while the patch might solve some of the symptoms of the fuse
> vs. process freezer interaction, it will not fully fix that problem.
> As such it's just a hack to hide the problem, making it less likely to
> appear.

No, it's part of the solution. I haven't posted the full fuse freezing
patch because I thought this could be profitably merged without the rest
of the patch.

Regards,

Nigel

2008-10-27 11:33:18

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Monday, 27 of October 2008, Nigel Cunningham wrote:
> Hi Miklos.
>
> On Mon, 2008-10-27 at 12:12 +0100, Miklos Szeredi wrote:
> > On Sun, 26 Oct 2008, Rafael J. Wysocki wrote:
> > > On Saturday, 25 of October 2008, Nigel Cunningham wrote:
> > > > While working on freezing fuse filesystems, I found that if a filesystem
> > > > is frozen when we try to freeze processes, freezing can fail because
> > > > threads are waiting in vfs_check_frozen for the filesystem to be thawed.
> > > > We should thus not count such threads.
> > > >
> > > > The check will be safe if a filesystem is thawed while we're freezing
> > > > processes because filesystem thaws are only invoked from userspace. Any
> > > > waiting processes will be woken and frozen prior to us completing the
> > > > freezing of userspace (the caller invoking the filesystem thaw will be
> > > > freezing) or - in the worst case - together with kernel threads.
> >
> > The description is missing some details: why is the filesystem frozen
> > before suspend? AFAICS this can happen when DM calls bdev_freeze() on
> > the device before the task freezing begins. Is this the case?
>
> It doesn't matter why a process is sitting in that wait_event call. What
> does matter is that one can be there. In the case where I saw it, I was
> working on fuse freezing. I don't remember the details, as it's a year
> since I made this patch, but I don't think I wasn't using fuse or DM.
>
> > Also, while the patch might solve some of the symptoms of the fuse
> > vs. process freezer interaction, it will not fully fix that problem.
> > As such it's just a hack to hide the problem, making it less likely to
> > appear.
>
> No, it's part of the solution. I haven't posted the full fuse freezing
> patch because I thought this could be profitably merged without the rest
> of the patch.

Well, I guess it's better if you post the entire thing so that we can see
what the role of the $subject patch is in it, even if this patch finally gets
merged separately.

Thanks,
Rafael

2008-10-27 11:38:33

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Mon, 27 Oct 2008, Nigel Cunningham wrote:
> It doesn't matter why a process is sitting in that wait_event call. What
> does matter is that one can be there.

Right, but it could also be sleeping in any other wait_event(),
mutex_lock(), etc, resulting in the same issue.

> No, it's part of the solution. I haven't posted the full fuse freezing
> patch because I thought this could be profitably merged without the rest
> of the patch.

Please post the full patch, it'd be a lot better to see the big
picture instead of just a small part of it.

Thanks,
Miklos

2008-10-27 11:40:28

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Mon, 2008-10-27 at 12:37 +0100, Rafael J. Wysocki wrote:
> On Monday, 27 of October 2008, Nigel Cunningham wrote:
> > Hi Miklos.
> >
> > On Mon, 2008-10-27 at 12:12 +0100, Miklos Szeredi wrote:
> > > On Sun, 26 Oct 2008, Rafael J. Wysocki wrote:
> > > > On Saturday, 25 of October 2008, Nigel Cunningham wrote:
> > > > > While working on freezing fuse filesystems, I found that if a filesystem
> > > > > is frozen when we try to freeze processes, freezing can fail because
> > > > > threads are waiting in vfs_check_frozen for the filesystem to be thawed.
> > > > > We should thus not count such threads.
> > > > >
> > > > > The check will be safe if a filesystem is thawed while we're freezing
> > > > > processes because filesystem thaws are only invoked from userspace. Any
> > > > > waiting processes will be woken and frozen prior to us completing the
> > > > > freezing of userspace (the caller invoking the filesystem thaw will be
> > > > > freezing) or - in the worst case - together with kernel threads.
> > >
> > > The description is missing some details: why is the filesystem frozen
> > > before suspend? AFAICS this can happen when DM calls bdev_freeze() on
> > > the device before the task freezing begins. Is this the case?
> >
> > It doesn't matter why a process is sitting in that wait_event call. What
> > does matter is that one can be there. In the case where I saw it, I was
> > working on fuse freezing. I don't remember the details, as it's a year
> > since I made this patch, but I don't think I wasn't using fuse or DM.
> >
> > > Also, while the patch might solve some of the symptoms of the fuse
> > > vs. process freezer interaction, it will not fully fix that problem.
> > > As such it's just a hack to hide the problem, making it less likely to
> > > appear.
> >
> > No, it's part of the solution. I haven't posted the full fuse freezing
> > patch because I thought this could be profitably merged without the rest
> > of the patch.
>
> Well, I guess it's better if you post the entire thing so that we can see
> what the role of the $subject patch is in it, even if this patch finally gets
> merged separately.

Ah.. that makes me see how vfs_check_frozen was getting triggered...
(fs/namei.c, below).

Regards,

Nigel

fs/buffer.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/control.c | 1
fs/fuse/dev.c | 7 +++
fs/fuse/dir.c | 35 +++++++++++++++--
fs/fuse/file.c | 14 +++++++
fs/fuse/fuse.h | 13 ++++++
fs/fuse/inode.c | 4 +-
fs/namei.c | 2 +
include/linux/buffer_head.h | 5 ++
include/linux/freezer.h | 15 +++++++
include/linux/fs.h | 10 ++++-
kernel/power/process.c | 48 +++++++++++++++++++++---
12 files changed, 227 insertions(+), 14 deletions(-)
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/buffer.c 700-BRANCH-fuse-freezing.patch-new/fs/buffer.c
--- 700-BRANCH-fuse-freezing.patch-old/fs/buffer.c 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/fs/buffer.c 2008-10-21 13:16:08.000000000 +1100
@@ -247,6 +247,93 @@ void thaw_bdev(struct block_device *bdev
}
EXPORT_SYMBOL(thaw_bdev);

+#if 0
+#define FS_PRINTK(fmt, args...) printk(fmt, ## args)
+#else
+#define FS_PRINTK(fmt, args...)
+#endif
+
+/* #define DEBUG_FS_FREEZING */
+
+/**
+ * freeze_filesystems - lock all filesystems and force them into a consistent
+ * state
+ * @which: What combination of fuse & non-fuse to freeze.
+ */
+void freeze_filesystems(int which)
+{
+ struct super_block *sb;
+
+ lockdep_off();
+
+ /*
+ * Freeze in reverse order so filesystems dependant upon others are
+ * frozen in the right order (eg. loopback on ext3).
+ */
+ list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+ FS_PRINTK(KERN_INFO "Considering %s.%s: (root %p, bdev %x)",
+ sb->s_type->name ? sb->s_type->name : "?",
+ sb->s_subtype ? sb->s_subtype : "", sb->s_root,
+ sb->s_bdev ? sb->s_bdev->bd_dev : 0);
+
+ if (sb->s_type->fs_flags & FS_IS_FUSE &&
+ sb->s_frozen == SB_UNFROZEN &&
+ which & FS_FREEZER_FUSE) {
+ sb->s_frozen = SB_FREEZE_TRANS;
+ sb->s_flags |= MS_FROZEN;
+ FS_PRINTK("Fuse filesystem done.\n");
+ continue;
+ }
+
+ if (!sb->s_root || !sb->s_bdev ||
+ (sb->s_frozen == SB_FREEZE_TRANS) ||
+ (sb->s_flags & MS_RDONLY) ||
+ (sb->s_flags & MS_FROZEN) ||
+ !(which & FS_FREEZER_NORMAL)) {
+ FS_PRINTK(KERN_INFO "Nope.\n");
+ continue;
+ }
+
+ FS_PRINTK(KERN_INFO "Freezing %x... ", sb->s_bdev->bd_dev);
+ freeze_bdev(sb->s_bdev);
+ sb->s_flags |= MS_FROZEN;
+ FS_PRINTK(KERN_INFO "Done.\n");
+ }
+
+ lockdep_on();
+}
+
+/**
+ * thaw_filesystems - unlock all filesystems
+ * @which: What combination of fuse & non-fuse to thaw.
+ */
+void thaw_filesystems(int which)
+{
+ struct super_block *sb;
+
+ lockdep_off();
+
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if (!(sb->s_flags & MS_FROZEN))
+ continue;
+
+ if (sb->s_type->fs_flags & FS_IS_FUSE) {
+ if (!(which & FS_FREEZER_FUSE))
+ continue;
+
+ sb->s_frozen = SB_UNFROZEN;
+ } else {
+ if (!(which & FS_FREEZER_NORMAL))
+ continue;
+
+ thaw_bdev(sb->s_bdev, sb);
+ }
+ sb->s_flags &= ~MS_FROZEN;
+ }
+
+ lockdep_on();
+}
+
/*
* Various filesystems appear to want __find_get_block to be non-blocking.
* But it's the page lock which protects the buffers. To get around this,
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/fuse/control.c 700-BRANCH-fuse-freezing.patch-new/fs/fuse/control.c
--- 700-BRANCH-fuse-freezing.patch-old/fs/fuse/control.c 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/fs/fuse/control.c 2008-10-21 13:12:40.000000000 +1100
@@ -207,6 +207,7 @@ static void fuse_ctl_kill_sb(struct supe
static struct file_system_type fuse_ctl_fs_type = {
.owner = THIS_MODULE,
.name = "fusectl",
+ .fs_flags = FS_IS_FUSE,
.get_sb = fuse_ctl_get_sb,
.kill_sb = fuse_ctl_kill_sb,
};
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/fuse/dev.c 700-BRANCH-fuse-freezing.patch-new/fs/fuse/dev.c
--- 700-BRANCH-fuse-freezing.patch-old/fs/fuse/dev.c 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/fs/fuse/dev.c 2008-10-21 13:12:40.000000000 +1100
@@ -7,6 +7,7 @@
*/

#include "fuse_i.h"
+#include "fuse.h"

#include <linux/init.h>
#include <linux/module.h>
@@ -16,6 +17,7 @@
#include <linux/pagemap.h>
#include <linux/file.h>
#include <linux/slab.h>
+#include <linux/freezer.h>

MODULE_ALIAS_MISCDEV(FUSE_MINOR);

@@ -743,6 +745,8 @@ static ssize_t fuse_dev_read(struct kioc
if (!fc)
return -EPERM;

+ FUSE_MIGHT_FREEZE(file->f_mapping->host->i_sb, "fuse_dev_read");
+
restart:
spin_lock(&fc->lock);
err = -EAGAIN;
@@ -869,6 +873,9 @@ static ssize_t fuse_dev_write(struct kio
if (!fc)
return -EPERM;

+ FUSE_MIGHT_FREEZE(iocb->ki_filp->f_mapping->host->i_sb,
+ "fuse_dev_write");
+
fuse_copy_init(&cs, fc, 0, NULL, iov, nr_segs);
if (nbytes < sizeof(struct fuse_out_header))
return -EINVAL;
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/fuse/dir.c 700-BRANCH-fuse-freezing.patch-new/fs/fuse/dir.c
--- 700-BRANCH-fuse-freezing.patch-old/fs/fuse/dir.c 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/fs/fuse/dir.c 2008-10-21 13:12:40.000000000 +1100
@@ -7,12 +7,14 @@
*/

#include "fuse_i.h"
+#include "fuse.h"

#include <linux/pagemap.h>
#include <linux/file.h>
#include <linux/gfp.h>
#include <linux/sched.h>
#include <linux/namei.h>
+#include <linux/freezer.h>

#if BITS_PER_LONG >= 64
static inline void fuse_dentry_settime(struct dentry *entry, u64 time)
@@ -174,6 +176,9 @@ static int fuse_dentry_revalidate(struct
return 0;

fc = get_fuse_conn(inode);
+
+ FUSE_MIGHT_FREEZE(inode->i_sb, "fuse_dentry_revalidate");
+
req = fuse_get_req(fc);
if (IS_ERR(req))
return 0;
@@ -273,6 +278,8 @@ int fuse_lookup_name(struct super_block
if (IS_ERR(req))
goto out;

+ FUSE_MIGHT_FREEZE(sb, "fuse_lookup");
+
forget_req = fuse_get_req(fc);
err = PTR_ERR(forget_req);
if (IS_ERR(forget_req)) {
@@ -402,6 +409,8 @@ static int fuse_create_open(struct inode
if (IS_ERR(forget_req))
return PTR_ERR(forget_req);

+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_create_open");
+
req = fuse_get_req(fc);
err = PTR_ERR(req);
if (IS_ERR(req))
@@ -488,6 +497,8 @@ static int create_new_entry(struct fuse_
int err;
struct fuse_req *forget_req;

+ FUSE_MIGHT_FREEZE(dir->i_sb, "create_new_entry");
+
forget_req = fuse_get_req(fc);
if (IS_ERR(forget_req)) {
fuse_put_request(fc, req);
@@ -585,7 +596,11 @@ static int fuse_mkdir(struct inode *dir,
{
struct fuse_mkdir_in inarg;
struct fuse_conn *fc = get_fuse_conn(dir);
- struct fuse_req *req = fuse_get_req(fc);
+ struct fuse_req *req;
+
+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_mkdir");
+
+ req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);

@@ -605,7 +620,11 @@ static int fuse_symlink(struct inode *di
{
struct fuse_conn *fc = get_fuse_conn(dir);
unsigned len = strlen(link) + 1;
- struct fuse_req *req = fuse_get_req(fc);
+ struct fuse_req *req;
+
+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_symlink");
+
+ req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);

@@ -622,7 +641,11 @@ static int fuse_unlink(struct inode *dir
{
int err;
struct fuse_conn *fc = get_fuse_conn(dir);
- struct fuse_req *req = fuse_get_req(fc);
+ struct fuse_req *req;
+
+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_unlink");
+
+ req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);

@@ -653,7 +676,11 @@ static int fuse_rmdir(struct inode *dir,
{
int err;
struct fuse_conn *fc = get_fuse_conn(dir);
- struct fuse_req *req = fuse_get_req(fc);
+ struct fuse_req *req;
+
+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_rmdir");
+
+ req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);

diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/fuse/file.c 700-BRANCH-fuse-freezing.patch-new/fs/fuse/file.c
--- 700-BRANCH-fuse-freezing.patch-old/fs/fuse/file.c 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/fs/fuse/file.c 2008-10-21 13:12:40.000000000 +1100
@@ -7,11 +7,13 @@
*/

#include "fuse_i.h"
+#include "fuse.h"

#include <linux/pagemap.h>
#include <linux/slab.h>
#include <linux/kernel.h>
#include <linux/sched.h>
+#include <linux/freezer.h>

static const struct file_operations fuse_direct_io_file_operations;

@@ -23,6 +25,8 @@ static int fuse_send_open(struct inode *
struct fuse_req *req;
int err;

+ FUSE_MIGHT_FREEZE(inode->i_sb, "fuse_send_open");
+
req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);
@@ -674,6 +678,8 @@ static int fuse_buffered_write(struct fi
if (is_bad_inode(inode))
return -EIO;

+ FUSE_MIGHT_FREEZE(inode->i_sb, "fuse_commit_write");
+
/*
* Make sure writepages on the same page are not mixed up with
* plain writes.
@@ -962,6 +968,8 @@ static ssize_t fuse_direct_io(struct fil
if (is_bad_inode(inode))
return -EIO;

+ FUSE_MIGHT_FREEZE(file->f_mapping->host->i_sb, "fuse_direct_io");
+
req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);
@@ -1315,6 +1323,8 @@ static int fuse_getlk(struct file *file,
struct fuse_lk_out outarg;
int err;

+ FUSE_MIGHT_FREEZE(file->f_mapping->host->i_sb, "fuse_getlk");
+
req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);
@@ -1350,6 +1360,8 @@ static int fuse_setlk(struct file *file,
if (fl->fl_flags & FL_CLOSE)
return 0;

+ FUSE_MIGHT_FREEZE(file->f_mapping->host->i_sb, "fuse_setlk");
+
req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);
@@ -1416,6 +1428,8 @@ static sector_t fuse_bmap(struct address
if (!inode->i_sb->s_bdev || fc->no_bmap)
return 0;

+ FUSE_MIGHT_FREEZE(inode->i_sb, "fuse_bmap");
+
req = fuse_get_req(fc);
if (IS_ERR(req))
return 0;
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/fuse/fuse.h 700-BRANCH-fuse-freezing.patch-new/fs/fuse/fuse.h
--- 700-BRANCH-fuse-freezing.patch-old/fs/fuse/fuse.h 1970-01-01 10:00:00.000000000 +1000
+++ 700-BRANCH-fuse-freezing.patch-new/fs/fuse/fuse.h 2008-10-21 13:12:40.000000000 +1100
@@ -0,0 +1,13 @@
+#define FUSE_MIGHT_FREEZE(superblock, desc) \
+do { \
+ int printed = 0; \
+ while (superblock->s_frozen != SB_UNFROZEN) { \
+ if (!printed) { \
+ printk(KERN_INFO "%d frozen in " desc ".\n", \
+ current->pid); \
+ printed = 1; \
+ } \
+ try_to_freeze(); \
+ yield(); \
+ } \
+} while (0)
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/fuse/inode.c 700-BRANCH-fuse-freezing.patch-new/fs/fuse/inode.c
--- 700-BRANCH-fuse-freezing.patch-old/fs/fuse/inode.c 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/fs/fuse/inode.c 2008-10-21 13:12:40.000000000 +1100
@@ -914,7 +914,7 @@ static int fuse_get_sb(struct file_syste
static struct file_system_type fuse_fs_type = {
.owner = THIS_MODULE,
.name = "fuse",
- .fs_flags = FS_HAS_SUBTYPE,
+ .fs_flags = FS_HAS_SUBTYPE | FS_IS_FUSE,
.get_sb = fuse_get_sb,
.kill_sb = kill_anon_super,
};
@@ -933,7 +933,7 @@ static struct file_system_type fuseblk_f
.name = "fuseblk",
.get_sb = fuse_get_sb_blk,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
+ .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_IS_FUSE,
};

static inline int register_fuseblk(void)
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/namei.c 700-BRANCH-fuse-freezing.patch-new/fs/namei.c
--- 700-BRANCH-fuse-freezing.patch-old/fs/namei.c 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/fs/namei.c 2008-10-21 13:12:40.000000000 +1100
@@ -2223,6 +2223,8 @@ int vfs_unlink(struct inode *dir, struct
if (!dir->i_op || !dir->i_op->unlink)
return -EPERM;

+ vfs_check_frozen(dir->i_sb, SB_FREEZE_WRITE);
+
DQUOT_INIT(dir);

mutex_lock(&dentry->d_inode->i_mutex);
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/include/linux/buffer_head.h 700-BRANCH-fuse-freezing.patch-new/include/linux/buffer_head.h
--- 700-BRANCH-fuse-freezing.patch-old/include/linux/buffer_head.h 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/include/linux/buffer_head.h 2008-10-21 13:12:40.000000000 +1100
@@ -171,6 +171,11 @@ wait_queue_head_t *bh_waitq_head(struct
int fsync_bdev(struct block_device *);
struct super_block *freeze_bdev(struct block_device *);
void thaw_bdev(struct block_device *, struct super_block *);
+#define FS_FREEZER_FUSE 1
+#define FS_FREEZER_NORMAL 2
+#define FS_FREEZER_ALL (FS_FREEZER_FUSE | FS_FREEZER_NORMAL)
+void freeze_filesystems(int which);
+void thaw_filesystems(int which);
int fsync_super(struct super_block *);
int fsync_no_super(struct block_device *);
struct buffer_head *__find_get_block(struct block_device *bdev, sector_t block,
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/include/linux/freezer.h 700-BRANCH-fuse-freezing.patch-new/include/linux/freezer.h
--- 700-BRANCH-fuse-freezing.patch-old/include/linux/freezer.h 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/include/linux/freezer.h 2008-10-21 13:16:08.000000000 +1100
@@ -136,6 +136,19 @@ static inline void set_freezable_with_si
current->flags &= ~(PF_NOFREEZE | PF_FREEZER_NOSIG);
}

+extern int freezer_state;
+#define FREEZER_OFF 0
+#define FREEZER_FILESYSTEMS_FROZEN 1
+#define FREEZER_USERSPACE_FROZEN 2
+#define FREEZER_FULLY_ON 3
+
+static inline int freezer_is_on(void)
+{
+ return freezer_state == FREEZER_FULLY_ON;
+}
+
+extern void thaw_kernel_threads(void);
+
/*
* Freezer-friendly wrappers around wait_event_interruptible() and
* wait_event_interruptible_timeout(), originally defined in <linux/wait.h>
@@ -178,6 +191,8 @@ static inline int freeze_processes(void)
static inline void thaw_processes(void) {}

static inline int try_to_freeze(void) { return 0; }
+static inline int freezer_is_on(void) { return 0; }
+static inline void thaw_kernel_threads(void) { }

static inline void freezer_do_not_count(void) {}
static inline void freezer_count(void) {}
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/include/linux/fs.h 700-BRANCH-fuse-freezing.patch-new/include/linux/fs.h
--- 700-BRANCH-fuse-freezing.patch-old/include/linux/fs.h 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/include/linux/fs.h 2008-10-21 13:12:40.000000000 +1100
@@ -8,6 +8,7 @@

#include <linux/limits.h>
#include <linux/ioctl.h>
+#include <linux/freezer.h>

/*
* It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -96,6 +97,7 @@ extern int dir_notify_enable;
#define FS_REQUIRES_DEV 1
#define FS_BINARY_MOUNTDATA 2
#define FS_HAS_SUBTYPE 4
+#define FS_IS_FUSE 8 /* Fuse filesystem - bdev freeze these too */
#define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move()
* during rename() internally.
@@ -128,6 +130,7 @@ extern int dir_notify_enable;
#define MS_RELATIME (1<<21) /* Update atime relative to mtime/ctime. */
#define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
#define MS_I_VERSION (1<<23) /* Update inode I_version field */
+#define MS_FROZEN (1<<24) /* Frozen by freeze_filesystems() */
#define MS_ACTIVE (1<<30)
#define MS_NOUSER (1<<31)

@@ -1141,8 +1144,11 @@ enum {
SB_FREEZE_TRANS = 2,
};

-#define vfs_check_frozen(sb, level) \
- wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
+#define vfs_check_frozen(sb, level) do { \
+ freezer_do_not_count(); \
+ wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level))); \
+ freezer_count(); \
+} while (0)

#define get_fs_excl() atomic_inc(&current->fs_excl)
#define put_fs_excl() atomic_dec(&current->fs_excl)
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/kernel/power/process.c 700-BRANCH-fuse-freezing.patch-new/kernel/power/process.c
--- 700-BRANCH-fuse-freezing.patch-old/kernel/power/process.c 2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/kernel/power/process.c 2008-10-21 13:16:08.000000000 +1100
@@ -13,6 +13,9 @@
#include <linux/module.h>
#include <linux/syscalls.h>
#include <linux/freezer.h>
+#include <linux/buffer_head.h>
+
+int freezer_state;

/*
* Timeout for stopping processes
@@ -201,7 +204,8 @@ static int try_to_freeze_tasks(bool sig_
do_each_thread(g, p) {
task_lock(p);
if (freezing(p) && !freezer_should_skip(p))
- printk(KERN_ERR " %s\n", p->comm);
+ printk(KERN_ERR " %s (%d) failed to freeze.\n",
+ p->comm, p->pid);
cancel_freezing(p);
task_unlock(p);
} while_each_thread(g, p);
@@ -221,17 +225,25 @@ int freeze_processes(void)
{
int error;

- printk("Freezing user space processes ... ");
+ printk(KERN_INFO "Stopping fuse filesystems.\n");
+ freeze_filesystems(FS_FREEZER_FUSE);
+ freezer_state = FREEZER_FILESYSTEMS_FROZEN;
+ printk(KERN_INFO "Freezing user space processes ... ");
error = try_to_freeze_tasks(true);
if (error)
goto Exit;
- printk("done.\n");
+ printk(KERN_INFO "done.\n");

- printk("Freezing remaining freezable tasks ... ");
+ sys_sync();
+ printk(KERN_INFO "Stopping normal filesystems.\n");
+ freeze_filesystems(FS_FREEZER_NORMAL);
+ freezer_state = FREEZER_USERSPACE_FROZEN;
+ printk(KERN_INFO "Freezing remaining freezable tasks ... ");
error = try_to_freeze_tasks(false);
if (error)
goto Exit;
printk("done.");
+ freezer_state = FREEZER_FULLY_ON;
Exit:
BUG_ON(in_atomic());
printk("\n");
@@ -257,11 +269,35 @@ static void thaw_tasks(bool nosig_only)

void thaw_processes(void)
{
- printk("Restarting tasks ... ");
- thaw_tasks(true);
+ int old_state = freezer_state;
+
+ if (old_state == FREEZER_OFF)
+ return;
+
+ /*
+ * Change state beforehand because thawed tasks might submit I/O
+ * immediately.
+ */
+ freezer_state = FREEZER_OFF;
+
+ printk(KERN_INFO "Restarting all filesystems ...\n");
+ thaw_filesystems(FS_FREEZER_ALL);
+
+ printk(KERN_INFO "Restarting tasks ... ");
+
+ if (old_state == FREEZER_FULLY_ON)
+ thaw_tasks(true);
thaw_tasks(false);
schedule();
printk("done.\n");
}

EXPORT_SYMBOL(refrigerator);
+
+void thaw_kernel_threads(void)
+{
+ freezer_state = FREEZER_USERSPACE_FROZEN;
+ printk(KERN_INFO "Restarting normal filesystems.\n");
+ thaw_filesystems(FS_FREEZER_NORMAL);
+ thaw_tasks(true);
+}

2008-10-27 12:39:24

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Mon, 27 Oct 2008, Nigel Cunningham wrote:
> Hi.
>
> On Mon, 2008-10-27 at 12:37 +0100, Rafael J. Wysocki wrote:
> > Well, I guess it's better if you post the entire thing so that we can see
> > what the role of the $subject patch is in it, even if this patch finally gets
> > merged separately.
>
> Ah.. that makes me see how vfs_check_frozen was getting triggered...
> (fs/namei.c, below).

Nigel, thanks for the patch, it makes thinks a lot clearer.

>From a quick look through the patch it seems to solve a bunch of cases
where new fuse requests during the freezing could cause problems. But
it doesn't do anything with requests that are already under way when
the freezing starts, which would still result in all the same
problems.

Take this scenario:

1) process A does rename("/mnt/fuse/a", "/mnt/fuse/b")
2) request goes to process B serving the fuse filesystem
3) filesystems are frozen, no new fuse requests
4) processes are frozen, let's say B first, then A
5) freezing A will fail, since it's still waiting for the request to finish

Several solutions have been posted, none of which really solve the problem:

a) Let's tag fuse server processes and freeze them later. This is
basically impossible, because many processes could be interoperating
and there's no way to know which is depending on which (example:
sshfs uses ssh for communication, which possibly relies on openvpn
process for packet transmission).

b) While waiting for replies to fuse request, allow process to
freeze. Does not fully solve the problem, as VFS might be holding
locks, and other processes waiting for those locks will not be
freezable.

Miklos

2008-10-27 20:59:45

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Mon, 2008-10-27 at 13:38 +0100, Miklos Szeredi wrote:
> On Mon, 27 Oct 2008, Nigel Cunningham wrote:
> > Hi.
> >
> > On Mon, 2008-10-27 at 12:37 +0100, Rafael J. Wysocki wrote:
> > > Well, I guess it's better if you post the entire thing so that we can see
> > > what the role of the $subject patch is in it, even if this patch finally gets
> > > merged separately.
> >
> > Ah.. that makes me see how vfs_check_frozen was getting triggered...
> > (fs/namei.c, below).
>
> Nigel, thanks for the patch, it makes thinks a lot clearer.
>
> >From a quick look through the patch it seems to solve a bunch of cases
> where new fuse requests during the freezing could cause problems. But
> it doesn't do anything with requests that are already under way when
> the freezing starts, which would still result in all the same
> problems.
>
> Take this scenario:
>
> 1) process A does rename("/mnt/fuse/a", "/mnt/fuse/b")
> 2) request goes to process B serving the fuse filesystem
> 3) filesystems are frozen, no new fuse requests
> 4) processes are frozen, let's say B first, then A
> 5) freezing A will fail, since it's still waiting for the request to finish

I'll have a look at the code again. I deliberately didn't stop existing
requests, but perhaps that's the wrong behaviour.

In the above scenario, process B won't see process A's new request until
post-resume if the filesystem is already frozen, so steps 4 & 5 don't
happen. Process B will also always be frozen before process A if process
A is userspace (most likely in the above scenario) or was mounted after
process B. (I've had this patch distributed as is for almost a year,
with no problems at all, so I believe I'm right here).

> Several solutions have been posted, none of which really solve the problem:
>
> a) Let's tag fuse server processes and freeze them later. This is
> basically impossible, because many processes could be interoperating
> and there's no way to know which is depending on which (example:
> sshfs uses ssh for communication, which possibly relies on openvpn
> process for packet transmission).
>
> b) While waiting for replies to fuse request, allow process to
> freeze. Does not fully solve the problem, as VFS might be holding
> locks, and other processes waiting for those locks will not be
> freezable.

I agree that these solutions won't work.

Regards,

Nigel

2008-10-27 21:09:37

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Tue, 28 Oct 2008, Nigel Cunningham wrote:
> Hi.
>
> On Mon, 2008-10-27 at 13:38 +0100, Miklos Szeredi wrote:
> > On Mon, 27 Oct 2008, Nigel Cunningham wrote:
> > > Hi.
> > >
> > > On Mon, 2008-10-27 at 12:37 +0100, Rafael J. Wysocki wrote:
> > > > Well, I guess it's better if you post the entire thing so that we can see
> > > > what the role of the $subject patch is in it, even if this patch finally gets
> > > > merged separately.
> > >
> > > Ah.. that makes me see how vfs_check_frozen was getting triggered...
> > > (fs/namei.c, below).
> >
> > Nigel, thanks for the patch, it makes thinks a lot clearer.
> >
> > >From a quick look through the patch it seems to solve a bunch of cases
> > where new fuse requests during the freezing could cause problems. But
> > it doesn't do anything with requests that are already under way when
> > the freezing starts, which would still result in all the same
> > problems.
> >
> > Take this scenario:
> >
> > 1) process A does rename("/mnt/fuse/a", "/mnt/fuse/b")
> > 2) request goes to process B serving the fuse filesystem
> > 3) filesystems are frozen, no new fuse requests
> > 4) processes are frozen, let's say B first, then A
> > 5) freezing A will fail, since it's still waiting for the request to finish
>
> I'll have a look at the code again. I deliberately didn't stop existing
> requests, but perhaps that's the wrong behaviour.
>
> In the above scenario, process B won't see process A's new request until
> post-resume if the filesystem is already frozen, so steps 4 & 5 don't
> happen.

No, I mean the filesystem is only frozen at 3 not before, so B is
already processing the request by the time the filesystem gets frozen.

> Process B will also always be frozen before process A if process
> A is userspace (most likely in the above scenario) or was mounted after
> process B. (I've had this patch distributed as is for almost a year,
> with no problems at all, so I believe I'm right here).

Both A and B are userspace processes. The order of freezing userspace
processes is basically random, there's no way to make sure that
processes doing work on behalf of a fuse filesystem (B) are frozen
after processes accessing the fuse filesystem (A).

Miklos

2008-10-27 22:13:41

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Mon, 2008-10-27 at 22:09 +0100, Miklos Szeredi wrote:
> On Tue, 28 Oct 2008, Nigel Cunningham wrote:
> > Hi.
> >
> > On Mon, 2008-10-27 at 13:38 +0100, Miklos Szeredi wrote:
> > > On Mon, 27 Oct 2008, Nigel Cunningham wrote:
> > > > Hi.
> > > >
> > > > On Mon, 2008-10-27 at 12:37 +0100, Rafael J. Wysocki wrote:
> > > > > Well, I guess it's better if you post the entire thing so that we can see
> > > > > what the role of the $subject patch is in it, even if this patch finally gets
> > > > > merged separately.
> > > >
> > > > Ah.. that makes me see how vfs_check_frozen was getting triggered...
> > > > (fs/namei.c, below).
> > >
> > > Nigel, thanks for the patch, it makes thinks a lot clearer.
> > >
> > > >From a quick look through the patch it seems to solve a bunch of cases
> > > where new fuse requests during the freezing could cause problems. But
> > > it doesn't do anything with requests that are already under way when
> > > the freezing starts, which would still result in all the same
> > > problems.
> > >
> > > Take this scenario:
> > >
> > > 1) process A does rename("/mnt/fuse/a", "/mnt/fuse/b")
> > > 2) request goes to process B serving the fuse filesystem
> > > 3) filesystems are frozen, no new fuse requests
> > > 4) processes are frozen, let's say B first, then A
> > > 5) freezing A will fail, since it's still waiting for the request to finish
> >
> > I'll have a look at the code again. I deliberately didn't stop existing
> > requests, but perhaps that's the wrong behaviour.
> >
> > In the above scenario, process B won't see process A's new request until
> > post-resume if the filesystem is already frozen, so steps 4 & 5 don't
> > happen.
>
> No, I mean the filesystem is only frozen at 3 not before, so B is
> already processing the request by the time the filesystem gets frozen.
>
> > Process B will also always be frozen before process A if process
> > A is userspace (most likely in the above scenario) or was mounted after
> > process B. (I've had this patch distributed as is for almost a year,
> > with no problems at all, so I believe I'm right here).
>
> Both A and B are userspace processes. The order of freezing userspace
> processes is basically random, there's no way to make sure that
> processes doing work on behalf of a fuse filesystem (B) are frozen
> after processes accessing the fuse filesystem (A).

Sorry. You're right - I was confusing things in what I said, but I do
have a (unconfused) solution:

The answer is to freeze the fuse filesystems first, stopping new
requests (freezing the processes making them) before they are passed on
to userspace and allowing existing requests to complete or freeze. Once
that is done, the userspace fuse processes will be idle, at least as far
as fuse activity is concerned. After fuse activity is stopped, userspace
can be frozen without worrying about what processes are fuse and what
aren't. This is what my patch implements so far.

To deal with requests that are already in progress, I'd suggest three
possibilities, in the order I think they should be preferred.

1) Use wait_event_freezeable[_timeout] in fuse code. Probably preferable
to #2, but I thought of it later :)

2) Use freezer_do_not_count as part of FUSE_MIGHT_FREEZE (resetting
before exiting the callers, of course). If the request doesn't complete
during the freezing period, it must be because the userspace thread was
already frozen. If the request does complete, we're counted again during
the normal freezing of userspace that follows the freezing of
filesystems.

3) Adding a means to check whether processes being frozen are fuse
requests. The code could then wait for fc->num_waiting - (say)
fc->num_frozen == 0.

Regards,

Nigel

2008-10-28 20:26:00

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Tue, 28 Oct 2008, Nigel Cunningham wrote:
> The answer is to freeze the fuse filesystems first, stopping new
> requests (freezing the processes making them) before they are passed on
> to userspace and allowing existing requests to complete or freeze. Once
> that is done, the userspace fuse processes will be idle, at least as far
> as fuse activity is concerned. After fuse activity is stopped, userspace
> can be frozen without worrying about what processes are fuse and what
> aren't. This is what my patch implements so far.

OK.

> To deal with requests that are already in progress, I'd suggest three
> possibilities, in the order I think they should be preferred.
>
> 1) Use wait_event_freezeable[_timeout] in fuse code. Probably preferable
> to #2, but I thought of it later :)
>
> 2) Use freezer_do_not_count as part of FUSE_MIGHT_FREEZE (resetting
> before exiting the callers, of course). If the request doesn't complete
> during the freezing period, it must be because the userspace thread was
> already frozen. If the request does complete, we're counted again during
> the normal freezing of userspace that follows the freezing of
> filesystems.
>
> 3) Adding a means to check whether processes being frozen are fuse
> requests. The code could then wait for fc->num_waiting - (say)
> fc->num_frozen == 0.

Yup, these fix the freezing of tasks which have outstanding fuse
requests.

However it does not fix the freezing of tasks which are waiting for
VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
This is the tricky part...

Miklos

2008-10-28 21:30:19

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Tue, 2008-10-28 at 21:25 +0100, Miklos Szeredi wrote:
> > To deal with requests that are already in progress, I'd suggest three
> > possibilities, in the order I think they should be preferred.
> >
> > 1) Use wait_event_freezeable[_timeout] in fuse code. Probably preferable
> > to #2, but I thought of it later :)
> >
> > 2) Use freezer_do_not_count as part of FUSE_MIGHT_FREEZE (resetting
> > before exiting the callers, of course). If the request doesn't complete
> > during the freezing period, it must be because the userspace thread was
> > already frozen. If the request does complete, we're counted again during
> > the normal freezing of userspace that follows the freezing of
> > filesystems.
> >
> > 3) Adding a means to check whether processes being frozen are fuse
> > requests. The code could then wait for fc->num_waiting - (say)
> > fc->num_frozen == 0.
>
> Yup, these fix the freezing of tasks which have outstanding fuse
> requests.
>
> However it does not fix the freezing of tasks which are waiting for
> VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> This is the tricky part...

Convert them all to wait_event_freezeable. If you know the locks they're
waiting on definitely aren't going to be free until post-resume and
we're going to be trying to freeze them while they're waiting, that
would be the right behaviour.

Regards,

Nigel

2008-10-28 21:52:23

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > However it does not fix the freezing of tasks which are waiting for
> > VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> > This is the tricky part...
>
> Convert them all to wait_event_freezeable.

You mean convert mutexes to event queues? Not a very good idea.

I fear we are going down the same path as the last time. I still
don't think rewriting the VFS is the right solution to the freezing
problem. But hey, if you want, sumbit a patch or an RFD and lets see
what others think.

Miklos

2008-10-28 21:55:55

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Tuesday, 28 of October 2008, Miklos Szeredi wrote:
> On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > However it does not fix the freezing of tasks which are waiting for
> > > VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> > > This is the tricky part...
> >
> > Convert them all to wait_event_freezeable.
>
> You mean convert mutexes to event queues? Not a very good idea.
>
> I fear we are going down the same path as the last time. I still
> don't think rewriting the VFS is the right solution to the freezing
> problem. But hey, if you want, sumbit a patch or an RFD and lets see
> what others think.

So, what solution would you prefer?

Rafael

2008-10-28 22:02:56

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Tue, 28 Oct 2008, Rafael J. Wysocki wrote:
> On Tuesday, 28 of October 2008, Miklos Szeredi wrote:
> > On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > > However it does not fix the freezing of tasks which are waiting for
> > > > VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> > > > This is the tricky part...
> > >
> > > Convert them all to wait_event_freezeable.
> >
> > You mean convert mutexes to event queues? Not a very good idea.
> >
> > I fear we are going down the same path as the last time. I still
> > don't think rewriting the VFS is the right solution to the freezing
> > problem. But hey, if you want, sumbit a patch or an RFD and lets see
> > what others think.
>
> So, what solution would you prefer?

I would prefer a freezer-less solution. Suspend to ram doesn't need
the freezer, and with the kexec approach hibernate could be done
without it also.

I don't think adding hacks to the VFS to work around the issues with
the freezer is the right way to solve this. But this is just my
personal opinion, the VFS maintainers may think otherwise.

Miklos

2008-10-28 22:03:46

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Tue, 2008-10-28 at 22:51 +0100, Miklos Szeredi wrote:
> On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > However it does not fix the freezing of tasks which are waiting for
> > > VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> > > This is the tricky part...
> >
> > Convert them all to wait_event_freezeable.
>
> You mean convert mutexes to event queues? Not a very good idea.

No, I don't. Sorry. I was thinking you were referring to the wait_event
calls in the fuse request code. My bad.

Perhaps it's best to go back to my original position: hold new requests
and allow existing ones to complete. I'll look again at previous
messages to see why you thought that was problematic.

Regards,

Nigel

2008-10-28 22:17:32

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Tuesday, 28 of October 2008, Miklos Szeredi wrote:
> On Tue, 28 Oct 2008, Rafael J. Wysocki wrote:
> > On Tuesday, 28 of October 2008, Miklos Szeredi wrote:
> > > On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > > > However it does not fix the freezing of tasks which are waiting for
> > > > > VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> > > > > This is the tricky part...
> > > >
> > > > Convert them all to wait_event_freezeable.
> > >
> > > You mean convert mutexes to event queues? Not a very good idea.
> > >
> > > I fear we are going down the same path as the last time. I still
> > > don't think rewriting the VFS is the right solution to the freezing
> > > problem. But hey, if you want, sumbit a patch or an RFD and lets see
> > > what others think.
> >
> > So, what solution would you prefer?
>
> I would prefer a freezer-less solution. Suspend to ram doesn't need
> the freezer,

Well, yes it does. And it will in forseeable future, AFAICS.

> and with the kexec approach hibernate could be done without it also.

There are problems with that too, although they aren't directly related to
filesystems.

> I don't think adding hacks to the VFS to work around the issues with
> the freezer is the right way to solve this. But this is just my
> personal opinion, the VFS maintainers may think otherwise.

Well, my personal opinion is that we need filesystems to support suspend,
this way or another and the sooner it happens, the better. Still, I'm rather
not going to make that happen myself. :-)

Apparently, Nigel is willing to work in this direction and we can use this as
an opportunity to learn what exactly is necessary for this purpose and _then_
decide if this is reasonable or not instead of dismissing it upfront.

Thanks,
Rafael

2008-10-28 23:04:28

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi again.

On Tue, 2008-10-28 at 21:25 +0100, Miklos Szeredi wrote:
> On Tue, 28 Oct 2008, Nigel Cunningham wrote:
> > The answer is to freeze the fuse filesystems first, stopping new
> > requests (freezing the processes making them) before they are passed on
> > to userspace and allowing existing requests to complete or freeze. Once
> > that is done, the userspace fuse processes will be idle, at least as far
> > as fuse activity is concerned. After fuse activity is stopped, userspace
> > can be frozen without worrying about what processes are fuse and what
> > aren't. This is what my patch implements so far.
>
> OK.
>
> > To deal with requests that are already in progress, I'd suggest three
> > possibilities, in the order I think they should be preferred.
> >
> > 1) Use wait_event_freezeable[_timeout] in fuse code. Probably preferable
> > to #2, but I thought of it later :)
> >
> > 2) Use freezer_do_not_count as part of FUSE_MIGHT_FREEZE (resetting
> > before exiting the callers, of course). If the request doesn't complete
> > during the freezing period, it must be because the userspace thread was
> > already frozen. If the request does complete, we're counted again during
> > the normal freezing of userspace that follows the freezing of
> > filesystems.
> >
> > 3) Adding a means to check whether processes being frozen are fuse
> > requests. The code could then wait for fc->num_waiting - (say)
> > fc->num_frozen == 0.
>
> Yup, these fix the freezing of tasks which have outstanding fuse
> requests.
>
> However it does not fix the freezing of tasks which are waiting for
> VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> This is the tricky part...

Okay. Looking back on our conversation brought me back to this message,
which I think needs another reply.

If we take the strategy of holding new requests and allowing existing
ones to complete, then am I right in thinking that we only need to worry
about where inode->i_mutex is taken in fs/fuse/file.c (I don't see it
taken in other fs/fuse/*.c files).

If that's correct, dealing with that issue looks relatively straight
forward: we need some more FUSE_MIGHT_FREEZE calls for those functions,
and something done to the vfs_check_frozen call - I'm a bit confused by
that - inode->i_sb will refer to the fuse filesystem, won't it?

Regards,

Nigel

2008-10-28 23:13:18

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > However it does not fix the freezing of tasks which are waiting for
> > VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> > This is the tricky part...
>
> Okay. Looking back on our conversation brought me back to this message,
> which I think needs another reply.
>
> If we take the strategy of holding new requests and allowing existing
> ones to complete, then am I right in thinking that we only need to worry
> about where inode->i_mutex is taken in fs/fuse/file.c (I don't see it
> taken in other fs/fuse/*.c files).

Nope, i_mutex is usually taken by the VFS not the filesystem. That
means that the filesystem is called with the mutex already held. Try
"grep i_mutex fs/*.c". There's also sb->s_vfs_rename_mutex, for all
the gory details see Documentation/filesystems/Locking.

So it's not just having to fix fuse, it's having to "fix" the VFS as
well.

Miklos

2008-10-28 23:17:53

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Wed, 2008-10-29 at 00:12 +0100, Miklos Szeredi wrote:
> On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > However it does not fix the freezing of tasks which are waiting for
> > > VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
> > > This is the tricky part...
> >
> > Okay. Looking back on our conversation brought me back to this message,
> > which I think needs another reply.
> >
> > If we take the strategy of holding new requests and allowing existing
> > ones to complete, then am I right in thinking that we only need to worry
> > about where inode->i_mutex is taken in fs/fuse/file.c (I don't see it
> > taken in other fs/fuse/*.c files).
>
> Nope, i_mutex is usually taken by the VFS not the filesystem. That
> means that the filesystem is called with the mutex already held. Try
> "grep i_mutex fs/*.c". There's also sb->s_vfs_rename_mutex, for all
> the gory details see Documentation/filesystems/Locking.
>
> So it's not just having to fix fuse, it's having to "fix" the VFS as
> well.

Remember, though, that we're only freezing fuse at the moment, and
strictly one filesystem at a time. We can thus happily wait for the
i_mutex taken by some other process to be released.

Regards,

Nigel

2008-10-28 23:22:17

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Tue, 28 Oct 2008, Rafael J. Wysocki wrote:
> On Tuesday, 28 of October 2008, Miklos Szeredi wrote:
> > I would prefer a freezer-less solution. Suspend to ram doesn't need
> > the freezer,
>
> Well, yes it does. And it will in forseeable future, AFAICS.

Umm, OK. Last I remember everybody agreed that there's absolutely no
reason why processes need to be frozen, and the only important thing
is that drivers are not twiddling the hardware during suspend, and
this can usually easily be solved on the subsystem level.

> > I don't think adding hacks to the VFS to work around the issues with
> > the freezer is the right way to solve this. But this is just my
> > personal opinion, the VFS maintainers may think otherwise.
>
> Well, my personal opinion is that we need filesystems to support suspend,
> this way or another and the sooner it happens, the better. Still, I'm rather
> not going to make that happen myself. :-)
>
> Apparently, Nigel is willing to work in this direction and we can use this as
> an opportunity to learn what exactly is necessary for this purpose and _then_
> decide if this is reasonable or not instead of dismissing it upfront.

I haven't dismissed it, just voiced my opinion that I think it's the
wrong direction.

Miklos

2008-10-28 23:24:48

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> Remember, though, that we're only freezing fuse at the moment, and
> strictly one filesystem at a time. We can thus happily wait for the
> i_mutex taken by some other process to be released.

Not going to work: you need to wait for all requests to be finished,
but those might depend on some other fuse filesystem which has already
been frozen.

Miklos

2008-10-28 23:42:02

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Wed, 2008-10-29 at 00:24 +0100, Miklos Szeredi wrote:
> On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > Remember, though, that we're only freezing fuse at the moment, and
> > strictly one filesystem at a time. We can thus happily wait for the
> > i_mutex taken by some other process to be released.
>
> Not going to work: you need to wait for all requests to be finished,
> but those might depend on some other fuse filesystem which has already
> been frozen.

Okay. In that case, am I right in thinking that the request waiting on
the frozen filesystem will be stuck in request_wait_answer, and the
userspace process that was trying to satisfy the request will be stuck
in the FUSE_MIGHT_FREEZE call that was invoked for the frozen
filesystem?

Regards,

Nigel

2008-10-28 23:46:00

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> Hi.
>
> On Wed, 2008-10-29 at 00:24 +0100, Miklos Szeredi wrote:
> > On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > Remember, though, that we're only freezing fuse at the moment, and
> > > strictly one filesystem at a time. We can thus happily wait for the
> > > i_mutex taken by some other process to be released.
> >
> > Not going to work: you need to wait for all requests to be finished,
> > but those might depend on some other fuse filesystem which has already
> > been frozen.
>
> Okay. In that case, am I right in thinking that the request waiting on
> the frozen filesystem will be stuck in request_wait_answer,

Yes.

> and the
> userspace process that was trying to satisfy the request will be stuck
> in the FUSE_MIGHT_FREEZE call that was invoked for the frozen
> filesystem?

No, it already passed that, before the filesystem got frozen. But it
doesn't matter, in either case i_mutex will already have been taken by
the VFS and it won't be released until the request completely
finishes.

Miklos

2008-10-28 23:51:18

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Miklos Szeredi wrote:
> On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > Hi.
> >
> > On Wed, 2008-10-29 at 00:24 +0100, Miklos Szeredi wrote:
> > > On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > > Remember, though, that we're only freezing fuse at the moment, and
> > > > strictly one filesystem at a time. We can thus happily wait for the
> > > > i_mutex taken by some other process to be released.
> > >
> > > Not going to work: you need to wait for all requests to be finished,
> > > but those might depend on some other fuse filesystem which has already
> > > been frozen.
> >
> > Okay. In that case, am I right in thinking that the request waiting on
> > the frozen filesystem will be stuck in request_wait_answer,
>
> Yes.
>
> > and the
> > userspace process that was trying to satisfy the request will be stuck
> > in the FUSE_MIGHT_FREEZE call that was invoked for the frozen
> > filesystem?

Sorry, I misunderstood this. Yes you're right, in the case of one
fuse filesystem relying on another to complete the request the already
frozen one will be stuck in FUSE_MIGHT_FREEZE().

How does that help?

Miklos

2008-10-28 23:54:41

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi Miklos.

On Wed, 2008-10-29 at 00:45 +0100, Miklos Szeredi wrote:
> On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > Hi.
> >
> > On Wed, 2008-10-29 at 00:24 +0100, Miklos Szeredi wrote:
> > > On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > > Remember, though, that we're only freezing fuse at the moment, and
> > > > strictly one filesystem at a time. We can thus happily wait for the
> > > > i_mutex taken by some other process to be released.
> > >
> > > Not going to work: you need to wait for all requests to be finished,
> > > but those might depend on some other fuse filesystem which has already
> > > been frozen.
> >
> > Okay. In that case, am I right in thinking that the request waiting on
> > the frozen filesystem will be stuck in request_wait_answer,
>
> Yes.
>
> > and the
> > userspace process that was trying to satisfy the request will be stuck
> > in the FUSE_MIGHT_FREEZE call that was invoked for the frozen
> > filesystem?
>
> No, it already passed that, before the filesystem got frozen. But it
> doesn't matter, in either case i_mutex will already have been taken by
> the VFS and it won't be released until the request completely
> finishes.

I think we're making different assumptions. I'm assuming that one of
those solutions we already discussed is implemented, such that we don't
start freezing a new filesystem until all the requests for the current
filesystem are dealt with. Perhaps I should come up with a new version
of the patch that implements that.

Nigel

2008-10-28 23:55:17

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wednesday, 29 of October 2008, Miklos Szeredi wrote:
> On Tue, 28 Oct 2008, Rafael J. Wysocki wrote:
> > On Tuesday, 28 of October 2008, Miklos Szeredi wrote:
> > > I would prefer a freezer-less solution. Suspend to ram doesn't need
> > > the freezer,
> >
> > Well, yes it does. And it will in forseeable future, AFAICS.
>
> Umm, OK. Last I remember everybody agreed that there's absolutely no
> reason why processes need to be frozen, and the only important thing
> is that drivers are not twiddling the hardware during suspend, and
> this can usually easily be solved on the subsystem level.

Well, this turned out not to be the case in the meantime.

In fact to handle that without the freezer we'd have to synchronize every
driver's suspend/resume callbacks with every possible way in which
applications can access the device for regular I/O (for example for PCI devices
this means any I/O other than configuration space accesses).

While this is possible in theory, I don't see this happening any time soon,
especially that we're going to keep the bar for accepting new drivers
relatively low.

Thanks,
Rafael

2008-10-28 23:58:42

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Wed, 2008-10-29 at 00:50 +0100, Miklos Szeredi wrote:
> On Wed, 29 Oct 2008, Miklos Szeredi wrote:
> > On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > Hi.
> > >
> > > On Wed, 2008-10-29 at 00:24 +0100, Miklos Szeredi wrote:
> > > > On Wed, 29 Oct 2008, Nigel Cunningham wrote:
> > > > > Remember, though, that we're only freezing fuse at the moment, and
> > > > > strictly one filesystem at a time. We can thus happily wait for the
> > > > > i_mutex taken by some other process to be released.
> > > >
> > > > Not going to work: you need to wait for all requests to be finished,
> > > > but those might depend on some other fuse filesystem which has already
> > > > been frozen.
> > >
> > > Okay. In that case, am I right in thinking that the request waiting on
> > > the frozen filesystem will be stuck in request_wait_answer,
> >
> > Yes.
> >
> > > and the
> > > userspace process that was trying to satisfy the request will be stuck
> > > in the FUSE_MIGHT_FREEZE call that was invoked for the frozen
> > > filesystem?
>
> Sorry, I misunderstood this. Yes you're right, in the case of one
> fuse filesystem relying on another to complete the request the already
> frozen one will be stuck in FUSE_MIGHT_FREEZE().
>
> How does that help?

Well, my next question was going to be: can we find a way to know that
the userspace process we're waiting on was frozen? If we can know that,
then perhaps we can apply that knowledge in this thread to avoid a
freezing failure.

Regards,

Nigel

2008-10-29 08:10:50

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Rafael J. Wysocki wrote:
> On Wednesday, 29 of October 2008, Miklos Szeredi wrote:
> > On Tue, 28 Oct 2008, Rafael J. Wysocki wrote:
> > > On Tuesday, 28 of October 2008, Miklos Szeredi wrote:
> > > > I would prefer a freezer-less solution. Suspend to ram doesn't need
> > > > the freezer,
> > >
> > > Well, yes it does. And it will in forseeable future, AFAICS.
> >
> > Umm, OK. Last I remember everybody agreed that there's absolutely no
> > reason why processes need to be frozen, and the only important thing
> > is that drivers are not twiddling the hardware during suspend, and
> > this can usually easily be solved on the subsystem level.
>
> Well, this turned out not to be the case in the meantime.
>
> In fact to handle that without the freezer we'd have to synchronize
> every driver's suspend/resume callbacks with every possible way in
> which applications can access the device for regular I/O (for
> example for PCI devices this means any I/O other than configuration
> space accesses).

Not all callbacks. I don't know what the current model is but AFAIR
it should be something like this:

1) call drivers to prepare for suspend (allocate space, etc)
2) stop all driver activity (plug queues, disable interrupts, etc)
3) call drivers to actually save state and power down
4) suspend

The part we are concerned is stopping driver activity. It could be
done with a mutex, or it could be done by freezing tasks. Adding a
mutex or other mechanism is the one I most like, but it's probably the
biggest work, so lets look at how to fix the freezing:

Currently the criteria for freezing is that userspace task has to exit
kernelspace, and kernel task has to hit a specific "freeze point".
This causes problems where we want to freeze tasks which are "stuck"
inside filesystems or other non-driver parts of the kernel. We can
fix this two ways:

a) mark additional places to freeze for userspace tasks as
well. This is the direction Nigel seems to be taking.

b) or instead we could allow freezing anywhere in uninterruptible
sleep, _except_ where explicity marked.

Which is easier? I don't know. But I very storgly feel that marking
un-freezable places instead of marking freezable places is a much
cleaner solution. It only affects parts of the kernel which have
something to do with suspend, instead of affecting parts of the kernel
which have absolutely nothing to do with suspend.

Miklos

2008-10-29 13:51:43

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Miklos Szeredi wrote:

> Not all callbacks. I don't know what the current model is but AFAIR
> it should be something like this:
>
> 1) call drivers to prepare for suspend (allocate space, etc)
> 2) stop all driver activity (plug queues, disable interrupts, etc)
> 3) call drivers to actually save state and power down
> 4) suspend
>
> The part we are concerned is stopping driver activity. It could be
> done with a mutex, or it could be done by freezing tasks. Adding a
> mutex or other mechanism is the one I most like, but it's probably the
> biggest work, so lets look at how to fix the freezing:

Not only is adding a mutex the biggest amount of work, it has has the
largest impact. Every I/O pathway would have to acquire the
appropriate mutex. That's a significant additional load on the system.

> Currently the criteria for freezing is that userspace task has to exit
> kernelspace, and kernel task has to hit a specific "freeze point".
> This causes problems where we want to freeze tasks which are "stuck"
> inside filesystems or other non-driver parts of the kernel. We can
> fix this two ways:
>
> a) mark additional places to freeze for userspace tasks as
> well. This is the direction Nigel seems to be taking.
>
> b) or instead we could allow freezing anywhere in uninterruptible
> sleep, _except_ where explicity marked.
>
> Which is easier? I don't know. But I very storgly feel that marking
> un-freezable places instead of marking freezable places is a much
> cleaner solution. It only affects parts of the kernel which have
> something to do with suspend, instead of affecting parts of the kernel
> which have absolutely nothing to do with suspend.

The problem with unrestricted freezing shows up when you freeze tasks
that hold a mutex or other sort of lock. If this mutex is needed later
on for suspending a device then the suspend will hang, because a frozen
task can't release any mutexes.

I suppose you could try to categorize mutexes as "freezable" and
"non-freezable". Ones used by device drives would generally be
non-freezable, whereas others (such as those used by VFS) would be
freezable. Still, it would be pretty difficult. Among other things,
it would be necessary to verify that a task holding a non-freezable
mutex never tries to acquire a freezable mutex.

Alan Stern

2008-10-29 14:50:36

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Alan Stern wrote:
> On Wed, 29 Oct 2008, Miklos Szeredi wrote:
>
> > Not all callbacks. I don't know what the current model is but AFAIR
> > it should be something like this:
> >
> > 1) call drivers to prepare for suspend (allocate space, etc)
> > 2) stop all driver activity (plug queues, disable interrupts, etc)
> > 3) call drivers to actually save state and power down
> > 4) suspend
> >
> > The part we are concerned is stopping driver activity. It could be
> > done with a mutex, or it could be done by freezing tasks. Adding a
> > mutex or other mechanism is the one I most like, but it's probably the
> > biggest work, so lets look at how to fix the freezing:
>
> Not only is adding a mutex the biggest amount of work, it has has the
> largest impact. Every I/O pathway would have to acquire the
> appropriate mutex. That's a significant additional load on the system.

Actually I was thinking of an rw-semaphore, not a mutex. But yeah
that still has scalability problems. But it could be done with custom
locking primitives, optimized for this case:

suspend_disable();
/* driver stuff */
suspend_enable();

> The problem with unrestricted freezing shows up when you freeze tasks
> that hold a mutex or other sort of lock. If this mutex is needed later
> on for suspending a device then the suspend will hang, because a frozen
> task can't release any mutexes.

I did a random sampling of ->suspend() callbacks, and they don't seem
to be taking mutexes. Does that happen at all?

Did anybody ever try modifying the freezer for suspend (not
hibernate), so that it allows tasks not in running state to freeze?
If not, I think that's an experiment worth doing.

Miklos

2008-10-29 15:28:45

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Miklos Szeredi wrote:

> Actually I was thinking of an rw-semaphore, not a mutex. But yeah
> that still has scalability problems. But it could be done with custom
> locking primitives, optimized for this case:
>
> suspend_disable();
> /* driver stuff */
> suspend_enable();

Yes, it could be done. And the overhead could be minimized by using
per-CPU variables. It would still be an awful lot of work, and easy to
get wrong.

> > The problem with unrestricted freezing shows up when you freeze tasks
> > that hold a mutex or other sort of lock. If this mutex is needed later
> > on for suspending a device then the suspend will hang, because a frozen
> > task can't release any mutexes.
>
> I did a random sampling of ->suspend() callbacks, and they don't seem
> to be taking mutexes. Does that happen at all?

It does, particularly among drivers that do runtime PM, which is
becoming more and more important.

Besides, suspend has to synchronize with I/O somehow. Right now that
is handled by making suspend wait until no tasks are doing I/O (because
they are all frozen). If you allow tasks to be frozen at more or less
arbitrary times, while holding arbitrary locks, then you may end up
freezing a task that's in the middle of I/O. That should certainly
block the suspend (not to mention messing up the I/O operation).

> Did anybody ever try modifying the freezer for suspend (not
> hibernate), so that it allows tasks not in running state to freeze?
> If not, I think that's an experiment worth doing.

What happens if the reason the task isn't running is because it's
waiting for I/O to complete? I just don't think this can be made to
work.

Alan Stern

2008-10-29 15:51:12

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Alan Stern wrote:
> On Wed, 29 Oct 2008, Miklos Szeredi wrote:
> > I did a random sampling of ->suspend() callbacks, and they don't seem
> > to be taking mutexes. Does that happen at all?
>
> It does, particularly among drivers that do runtime PM, which is
> becoming more and more important.
>
> Besides, suspend has to synchronize with I/O somehow. Right now that
> is handled by making suspend wait until no tasks are doing I/O (because
> they are all frozen).

What about async I/O?

> If you allow tasks to be frozen at more or less
> arbitrary times, while holding arbitrary locks, then you may end up
> freezing a task that's in the middle of I/O. That should certainly
> block the suspend (not to mention messing up the I/O operation).

What is the middle of I/O? Depending the type of I/O it could be
messed up regardless of whether tasks happen to be in userspace or not
(e.g. printing).

And some types of I/O are already mostly decoupled from userspace
(file I/O, networking), so the userspace freezing shoudln't make any
difference.

> > Did anybody ever try modifying the freezer for suspend (not
> > hibernate), so that it allows tasks not in running state to freeze?
> > If not, I think that's an experiment worth doing.
>
> What happens if the reason the task isn't running is because it's
> waiting for I/O to complete? I just don't think this can be made to
> work.

Don't know. I've never written a driver, and I'm not familiar with
runtime PM, etc. So I can't come up with a detailed design for
solving the freezer issues there.

But I do think that the solution does not lie in "fixing" the VFS.

Miklos

2008-10-29 16:10:44

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Alan Stern wrote:
> On Wed, 29 Oct 2008, Miklos Szeredi wrote:
>
> > Actually I was thinking of an rw-semaphore, not a mutex. But yeah
> > that still has scalability problems. But it could be done with custom
> > locking primitives, optimized for this case:
> >
> > suspend_disable();
> > /* driver stuff */
> > suspend_enable();
>
> Yes, it could be done. And the overhead could be minimized by using
> per-CPU variables. It would still be an awful lot of work, and easy to
> get wrong.

OK, getting back to this, as it seems to be the only way that we agree
is doable.

How about this,

a) identify syscalls that may make drivers do I/O:

- read
- write
- ioctl
???

b) add the suspend_disable/enable() primitives to these syscalls

c) push primitives inside the implementation

c) is slightly tricky, but could be done for example by setting a flag
on open: FMODE_NO_SUSPEND_DISABLE (better name required), saying that
implementation is responsible for getting the suspend disable magic
right.

For starters this flag could be set for all non-device opens (maybe all
non-char-dev opens?), solving the fuse vs. freezer issues without any
complicated trickery.

Miklos

2008-10-29 16:17:43

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Miklos Szeredi wrote:

> On Wed, 29 Oct 2008, Alan Stern wrote:
> > On Wed, 29 Oct 2008, Miklos Szeredi wrote:
> > > I did a random sampling of ->suspend() callbacks, and they don't seem
> > > to be taking mutexes. Does that happen at all?
> >
> > It does, particularly among drivers that do runtime PM, which is
> > becoming more and more important.
> >
> > Besides, suspend has to synchronize with I/O somehow. Right now that
> > is handled by making suspend wait until no tasks are doing I/O (because
> > they are all frozen).
>
> What about async I/O?

I don't know. Maybe it slipped through the cracks. It wouldn't be
surprising to find that async I/O can get messed up during a suspend.

> > If you allow tasks to be frozen at more or less
> > arbitrary times, while holding arbitrary locks, then you may end up
> > freezing a task that's in the middle of I/O. That should certainly
> > block the suspend (not to mention messing up the I/O operation).
>
> What is the middle of I/O? Depending the type of I/O it could be
> messed up regardless of whether tasks happen to be in userspace or not
> (e.g. printing).

I'll wriggle out of this by saying that "in the middle of I/O" means
the task has gotten a device into a state from which it can't
successfully be suspended. An example would be if the system had sent
the device part of a command. Even if you did manage to suspend the
device and resume it later, you wouldn't expect it to work right when
it received the second half of the command.

In general, drivers don't leave devices in this kind of intermediate
state for long. A driver might sleep at such times, but it wouldn't
return to userspace.

> And some types of I/O are already mostly decoupled from userspace
> (file I/O, networking), so the userspace freezing shoudln't make any
> difference.

True. We're mostly talking about character device I/O. There are a
lot of character device drivers in the kernel. :-)

> > > Did anybody ever try modifying the freezer for suspend (not
> > > hibernate), so that it allows tasks not in running state to freeze?
> > > If not, I think that's an experiment worth doing.
> >
> > What happens if the reason the task isn't running is because it's
> > waiting for I/O to complete? I just don't think this can be made to
> > work.
>
> Don't know. I've never written a driver, and I'm not familiar with
> runtime PM, etc. So I can't come up with a detailed design for
> solving the freezer issues there.
>
> But I do think that the solution does not lie in "fixing" the VFS.

I tend to agree. The problems posed by fuse are very difficult. It is
not at all obvious how to attack them.

Alan Stern

2008-10-29 20:37:43

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Miklos Szeredi wrote:

> On Wed, 29 Oct 2008, Alan Stern wrote:
> > On Wed, 29 Oct 2008, Miklos Szeredi wrote:
> >
> > > Actually I was thinking of an rw-semaphore, not a mutex. But yeah
> > > that still has scalability problems. But it could be done with custom
> > > locking primitives, optimized for this case:
> > >
> > > suspend_disable();
> > > /* driver stuff */
> > > suspend_enable();
> >
> > Yes, it could be done. And the overhead could be minimized by using
> > per-CPU variables. It would still be an awful lot of work, and easy to
> > get wrong.
>
> OK, getting back to this, as it seems to be the only way that we agree
> is doable.
>
> How about this,
>
> a) identify syscalls that may make drivers do I/O:
>
> - read
> - write
> - ioctl
> ???
>
> b) add the suspend_disable/enable() primitives to these syscalls
>
> c) push primitives inside the implementation

I discussed this last summer with Rafael. It's a lot harder than it
looks, for all sorts of reasons. For example, what about user tasks
that have access to memory-mapped I/O regions?

> c) is slightly tricky, but could be done for example by setting a flag
> on open: FMODE_NO_SUSPEND_DISABLE (better name required), saying that
> implementation is responsible for getting the suspend disable magic
> right.
>
> For starters this flag could be set for all non-device opens (maybe all
> non-char-dev opens?), solving the fuse vs. freezer issues without any
> complicated trickery.

I don't know. There are other interfaces too, like sysfs attributes,
that would have to be handled specially. On the whole, the freezer
seems much, much simpler.

Regarding fuse, something like Nigel's scheme for preventing new
requests and then waiting for old requests to complete might work out.
Especially if you combine it with a strategy for making the freezer
back and retry after a delay when something goes wrong.

Alan Stern

2008-10-29 21:07:35

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wednesday, 29 of October 2008, Alan Stern wrote:
> On Wed, 29 Oct 2008, Miklos Szeredi wrote:
>
> > On Wed, 29 Oct 2008, Alan Stern wrote:
> > > On Wed, 29 Oct 2008, Miklos Szeredi wrote:
> > >
> > > > Actually I was thinking of an rw-semaphore, not a mutex. But yeah
> > > > that still has scalability problems. But it could be done with custom
> > > > locking primitives, optimized for this case:
> > > >
> > > > suspend_disable();
> > > > /* driver stuff */
> > > > suspend_enable();
> > >
> > > Yes, it could be done. And the overhead could be minimized by using
> > > per-CPU variables. It would still be an awful lot of work, and easy to
> > > get wrong.
> >
> > OK, getting back to this, as it seems to be the only way that we agree
> > is doable.
> >
> > How about this,
> >
> > a) identify syscalls that may make drivers do I/O:
> >
> > - read
> > - write
> > - ioctl
> > ???
> >
> > b) add the suspend_disable/enable() primitives to these syscalls
> >
> > c) push primitives inside the implementation
>
> I discussed this last summer with Rafael. It's a lot harder than it
> looks, for all sorts of reasons. For example, what about user tasks
> that have access to memory-mapped I/O regions?
>
> > c) is slightly tricky, but could be done for example by setting a flag
> > on open: FMODE_NO_SUSPEND_DISABLE (better name required), saying that
> > implementation is responsible for getting the suspend disable magic
> > right.
> >
> > For starters this flag could be set for all non-device opens (maybe all
> > non-char-dev opens?), solving the fuse vs. freezer issues without any
> > complicated trickery.
>
> I don't know. There are other interfaces too, like sysfs attributes,
> that would have to be handled specially. On the whole, the freezer
> seems much, much simpler.
>
> Regarding fuse, something like Nigel's scheme for preventing new
> requests and then waiting for old requests to complete might work out.
> Especially if you combine it with a strategy for making the freezer
> back and retry after a delay when something goes wrong.

Yeah.

The current design of the freezer is rather simplistic and I'm not really sure
it's the best one possible. Perhaps we can redesign the freezer to work
differently and handle the cases like fuse.

I didn't have the time to think about that yet, though.

Thanks,
Rafael

2008-10-29 21:45:42

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Wed, 2008-10-29 at 22:11 +0100, Rafael J. Wysocki wrote:
> The current design of the freezer is rather simplistic and I'm not really sure
> it's the best one possible. Perhaps we can redesign the freezer to work
> differently and handle the cases like fuse.

Why redo what I've already done? In the full patch, you have the basis
of what you're talking about. I haven't seen a failure to freeze fuse or
anything else in a year of use.

Nigel

2008-10-29 22:03:26

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wednesday, 29 of October 2008, Nigel Cunningham wrote:
> Hi.
>
> On Wed, 2008-10-29 at 22:11 +0100, Rafael J. Wysocki wrote:
> > The current design of the freezer is rather simplistic and I'm not really sure
> > it's the best one possible. Perhaps we can redesign the freezer to work
> > differently and handle the cases like fuse.
>
> Why redo what I've already done? In the full patch, you have the basis
> of what you're talking about. I haven't seen a failure to freeze fuse or
> anything else in a year of use.

Still, Miklos noticed some problems with it.

I'm not talking about doing things on top of the current signal-based
freezing mechanism, but rather about moving the freezer a bit closer towards
the scheduler. Maybe this is the way to go. I don't know.

In any case, the freezing of user space seems to be much simpler than modifying
all drivers to add suspend synchronization.

Thanks,
Rafael

2008-10-29 23:43:28

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Alan Stern wrote:
> I discussed this last summer with Rafael. It's a lot harder than it
> looks, for all sorts of reasons. For example, what about user tasks
> that have access to memory-mapped I/O regions?

What about them? Freezing doesn't seem to help with that.

> > c) is slightly tricky, but could be done for example by setting a flag
> > on open: FMODE_NO_SUSPEND_DISABLE (better name required), saying that
> > implementation is responsible for getting the suspend disable magic
> > right.
> >
> > For starters this flag could be set for all non-device opens (maybe all
> > non-char-dev opens?), solving the fuse vs. freezer issues without any
> > complicated trickery.
>
> I don't know. There are other interfaces too, like sysfs attributes,
> that would have to be handled specially. On the whole, the freezer
> seems much, much simpler.

OK, then non-device files on "regular" filesystems.

> Regarding fuse, something like Nigel's scheme for preventing new
> requests and then waiting for old requests to complete might work out.
> Especially if you combine it with a strategy for making the freezer
> back and retry after a delay when something goes wrong.

I don't think it will work out, because to be able to do this some
ordering between freezing the filesystems must be done. But this is
basically impossible, for all the same reasons it's impossible to
order the freezing of userspace tasks.

Also this is not just a fuse issue: we have userspace network devices,
we have userspace USB drivers, etc, affected by this problem.

If there _is_ a solution with the freezer that does solve all of this,
I haven't yet heard it. But yeah, in the end the simpler solution
should win.

Miklos

2008-10-29 23:48:48

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Thu, 30 Oct 2008, Nigel Cunningham wrote:
> Hi.
>
> On Wed, 2008-10-29 at 22:11 +0100, Rafael J. Wysocki wrote:
> > The current design of the freezer is rather simplistic and I'm not really sure
> > it's the best one possible. Perhaps we can redesign the freezer to work
> > differently and handle the cases like fuse.
>
> Why redo what I've already done? In the full patch, you have the basis
> of what you're talking about. I haven't seen a failure to freeze fuse or
> anything else in a year of use.

Well yeah, your patch handles the straightforward cases. But it
doesn't help with the more tricky cases, where one fuse filesystem is
using another, and as those may become more widespread, this approach
will fail.

Miklos

2008-10-29 23:53:37

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, 29 Oct 2008, Rafael J. Wysocki wrote:
> In any case, the freezing of user space seems to be much simpler
> than modifying all drivers to add suspend synchronization.

As I mentioned, we don't _have_ to modify all drivers. It could start
out by just modifying syscalls that result in calls to drivers. That
would be a 10-liner patch.

Miklos

2008-10-29 23:56:25

by Matthew Garrett

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Wed, Oct 29, 2008 at 04:37:30PM -0400, Alan Stern wrote:

> I discussed this last summer with Rafael. It's a lot harder than it
> looks, for all sorts of reasons. For example, what about user tasks
> that have access to memory-mapped I/O regions?

The only interesting cases of this I know of are X (which ought to stop
doing so in the near future) and certain sound operations (which are
going via a well-defined API anyway and need to handle devices going
away at random, so not a problem). There may be some other niche cases,
but really - if you're mmapping hardware then it's generally because you
haven't written a proper kernel driver. Do that instead. Runtime power
management's already going to make you wildly unhappy.

--
Matthew Garrett | [email protected]

2008-10-30 02:04:21

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Thu, 2008-10-30 at 00:48 +0100, Miklos Szeredi wrote:
> On Thu, 30 Oct 2008, Nigel Cunningham wrote:
> > Hi.
> >
> > On Wed, 2008-10-29 at 22:11 +0100, Rafael J. Wysocki wrote:
> > > The current design of the freezer is rather simplistic and I'm not really sure
> > > it's the best one possible. Perhaps we can redesign the freezer to work
> > > differently and handle the cases like fuse.
> >
> > Why redo what I've already done? In the full patch, you have the basis
> > of what you're talking about. I haven't seen a failure to freeze fuse or
> > anything else in a year of use.
>
> Well yeah, your patch handles the straightforward cases. But it
> doesn't help with the more tricky cases, where one fuse filesystem is
> using another, and as those may become more widespread, this approach
> will fail.

At the moment, yes. But it's not impossible for us to modify the patch
to handle that as well.

Regards,

Nigel

2008-10-30 13:54:40

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Thu, 30 Oct 2008, Miklos Szeredi wrote:

> On Wed, 29 Oct 2008, Alan Stern wrote:
> > I discussed this last summer with Rafael. It's a lot harder than it
> > looks, for all sorts of reasons. For example, what about user tasks
> > that have access to memory-mapped I/O regions?
>
> What about them? Freezing doesn't seem to help with that.

Sure it does. A frozen process can't touch a memory-mapped I/O region,
whereas a non-frozen process can. Although as Matthew pointed out,
this is rapidly becoming a moot point.

> > > c) is slightly tricky, but could be done for example by setting a flag
> > > on open: FMODE_NO_SUSPEND_DISABLE (better name required), saying that
> > > implementation is responsible for getting the suspend disable magic
> > > right.
> > >
> > > For starters this flag could be set for all non-device opens (maybe all
> > > non-char-dev opens?), solving the fuse vs. freezer issues without any
> > > complicated trickery.
> >
> > I don't know. There are other interfaces too, like sysfs attributes,
> > that would have to be handled specially. On the whole, the freezer
> > seems much, much simpler.
>
> OK, then non-device files on "regular" filesystems.

Would you like to write a first-pass patch? I don't think it will
work.

> > Regarding fuse, something like Nigel's scheme for preventing new
> > requests and then waiting for old requests to complete might work out.
> > Especially if you combine it with a strategy for making the freezer
> > back and retry after a delay when something goes wrong.
>
> I don't think it will work out, because to be able to do this some
> ordering between freezing the filesystems must be done. But this is
> basically impossible, for all the same reasons it's impossible to
> order the freezing of userspace tasks.
>
> Also this is not just a fuse issue: we have userspace network devices,
> we have userspace USB drivers, etc, affected by this problem.
>
> If there _is_ a solution with the freezer that does solve all of this,
> I haven't yet heard it. But yeah, in the end the simpler solution
> should win.

To do this properly, it is necessary to categorize sleeping states.
Some should count as frozen and others shouldn't. So for instance, all
the mutexes and wait queues involved in VFS would count as frozen; a
process waiting on one of them can simply be kept from waking up until
the suspend is over. Others (those involved in device I/O) wouldn't
count as frozen; a task waiting on one of them would have to wake up
and move forward before the system could suspend.

Doing that seems like a lot of work, just as modifying every driver
does. Changing a few kernel entry points is simpler, but I'm pretty
sure it won't work. For instance, tasks can block arbitrarily long on
read calls (waiting for data to arrive); you can't allow such things to
prevent the system from suspending.

Alan Stern

2008-10-30 14:00:40

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Fri, 31 Oct 2008, Nigel Cunningham wrote:

> > Well yeah, your patch handles the straightforward cases. But it
> > doesn't help with the more tricky cases, where one fuse filesystem is
> > using another, and as those may become more widespread, this approach
> > will fail.
>
> At the moment, yes. But it's not impossible for us to modify the patch
> to handle that as well.

It depends on what you mean. The most direct reading of your statement
is simply wrong: It is _theoretically_ impossible to find the correct
order for freezing filesystems. To do so would be equivalent to
solving the halting problem.

Alan Stern

2008-10-30 14:40:02

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Thu, 30 Oct 2008, Alan Stern wrote:
> On Thu, 30 Oct 2008, Miklos Szeredi wrote:
>
> > On Wed, 29 Oct 2008, Alan Stern wrote:
> > > I discussed this last summer with Rafael. It's a lot harder than it
> > > looks, for all sorts of reasons. For example, what about user tasks
> > > that have access to memory-mapped I/O regions?
> >
> > What about them? Freezing doesn't seem to help with that.
>
> Sure it does. A frozen process can't touch a memory-mapped I/O region,
> whereas a non-frozen process can.

But it can be in the middle of I/O by your definition.

> > > I don't know. There are other interfaces too, like sysfs attributes,
> > > that would have to be handled specially. On the whole, the freezer
> > > seems much, much simpler.
> >
> > OK, then non-device files on "regular" filesystems.
>
> Would you like to write a first-pass patch? I don't think it will
> work.

If somebody doesn't beat me to it, I'll do that (first implemented
with a global rw-sem).

> Doing that seems like a lot of work, just as modifying every driver
> does. Changing a few kernel entry points is simpler, but I'm pretty
> sure it won't work. For instance, tasks can block arbitrarily long on
> read calls (waiting for data to arrive); you can't allow such things to
> prevent the system from suspending.

But we already do: either

a) it's in interruptible sleep (I/O on sockets, pipes, etc), and
freezing simply interrupts it, or

b) it's in uninterruptible sleep and suspend will wait it out (or
time out).

In the new scheme we could retain that part of the freezer: interrupt
all tasks which are inside the critical region and wait for them to
exit the critical region.

To put it in another way: it's still the freezer, it does all the same
things as the old freezer, except that the condition for freezing is
not that the task is out of the kernel, rather that it's out of the
disable_supend - enable_suspend region. As such it's not a big change
to the whole suspend system, and so there shouldn't be anything big
going wrong there.

Miklos

2008-10-30 17:07:39

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Thu, 30 Oct 2008, Miklos Szeredi wrote:

> On Thu, 30 Oct 2008, Alan Stern wrote:
> > On Thu, 30 Oct 2008, Miklos Szeredi wrote:
> >
> > > On Wed, 29 Oct 2008, Alan Stern wrote:
> > > > I discussed this last summer with Rafael. It's a lot harder than it
> > > > looks, for all sorts of reasons. For example, what about user tasks
> > > > that have access to memory-mapped I/O regions?
> > >
> > > What about them? Freezing doesn't seem to help with that.
> >
> > Sure it does. A frozen process can't touch a memory-mapped I/O region,
> > whereas a non-frozen process can.
>
> But it can be in the middle of I/O by your definition.

True. Yet another problem...

> > Would you like to write a first-pass patch? I don't think it will
> > work.
>
> If somebody doesn't beat me to it, I'll do that (first implemented
> with a global rw-sem).

Converting it to per-CPU counters later on should be fairly easy.

> > Doing that seems like a lot of work, just as modifying every driver
> > does. Changing a few kernel entry points is simpler, but I'm pretty
> > sure it won't work. For instance, tasks can block arbitrarily long on
> > read calls (waiting for data to arrive); you can't allow such things to
> > prevent the system from suspending.
>
> But we already do: either
>
> a) it's in interruptible sleep (I/O on sockets, pipes, etc), and
> freezing simply interrupts it, or
>
> b) it's in uninterruptible sleep and suspend will wait it out (or
> time out).
>
> In the new scheme we could retain that part of the freezer: interrupt
> all tasks which are inside the critical region and wait for them to
> exit the critical region.
>
> To put it in another way: it's still the freezer, it does all the same
> things as the old freezer, except that the condition for freezing is
> not that the task is out of the kernel, rather that it's out of the
> disable_supend - enable_suspend region. As such it's not a big change
> to the whole suspend system, and so there shouldn't be anything big
> going wrong there.

Okay. Don't forget things like ioctl for sockets -- they often involve
doing I/O directly to the network interface device.

What happens to a task accessing a non-regular file on a fuse
filesystem? :-)

Alan Stern

2008-10-30 17:43:50

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Thu, 30 Oct 2008, Alan Stern wrote:
> Okay. Don't forget things like ioctl for sockets -- they often involve
> doing I/O directly to the network interface device.

Yeah, ioctls should probably just always be protected (at least
initially), regardless of what type of file they are done on.

> What happens to a task accessing a non-regular file on a fuse
> filesystem? :-)

The same as on any other filesystem, i.e. the fs is only involved as
far as calling init_special_inode(), the rest is taken care of by the
VFS.

Tejun Heo recently posted patches to fuse which enable emulating a
char device from userspace. That is another matter, obviously we'd
want to keep the "allow suspend during I/O" property of fuse in that
case, even though there's a char device involved (but no hardware, at
least not on that level).

Miklos

2008-10-30 20:13:38

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Thursday, 30 of October 2008, Miklos Szeredi wrote:
> On Thu, 30 Oct 2008, Alan Stern wrote:
> > On Thu, 30 Oct 2008, Miklos Szeredi wrote:
> >
> > > On Wed, 29 Oct 2008, Alan Stern wrote:
> > > > I discussed this last summer with Rafael. It's a lot harder than it
> > > > looks, for all sorts of reasons. For example, what about user tasks
> > > > that have access to memory-mapped I/O regions?
> > >
> > > What about them? Freezing doesn't seem to help with that.
> >
> > Sure it does. A frozen process can't touch a memory-mapped I/O region,
> > whereas a non-frozen process can.
>
> But it can be in the middle of I/O by your definition.
>
> > > > I don't know. There are other interfaces too, like sysfs attributes,
> > > > that would have to be handled specially. On the whole, the freezer
> > > > seems much, much simpler.
> > >
> > > OK, then non-device files on "regular" filesystems.
> >
> > Would you like to write a first-pass patch? I don't think it will
> > work.
>
> If somebody doesn't beat me to it, I'll do that (first implemented
> with a global rw-sem).
>
> > Doing that seems like a lot of work, just as modifying every driver
> > does. Changing a few kernel entry points is simpler, but I'm pretty
> > sure it won't work. For instance, tasks can block arbitrarily long on
> > read calls (waiting for data to arrive); you can't allow such things to
> > prevent the system from suspending.
>
> But we already do: either
>
> a) it's in interruptible sleep (I/O on sockets, pipes, etc), and
> freezing simply interrupts it, or
>
> b) it's in uninterruptible sleep and suspend will wait it out (or
> time out).
>
> In the new scheme we could retain that part of the freezer: interrupt
> all tasks which are inside the critical region and wait for them to
> exit the critical region.
>
> To put it in another way: it's still the freezer, it does all the same
> things as the old freezer, except that the condition for freezing is
> not that the task is out of the kernel, rather that it's out of the
> disable_supend - enable_suspend region. As such it's not a big change
> to the whole suspend system, and so there shouldn't be anything big
> going wrong there.

I like this idea.

I was thinking about using task flags to mark processes as "not freezable
at the moment", which would make the freezer to wait for such tasks
(currently a task may only be always freezable or not freezable at all,
which is not very flexible).

Unfortunately, I didn't have the time to implement this.

Thanks,
Rafael

2008-10-30 21:44:54

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi Alan.

On Thu, 2008-10-30 at 09:56 -0400, Alan Stern wrote:
> On Fri, 31 Oct 2008, Nigel Cunningham wrote:
>
> > > Well yeah, your patch handles the straightforward cases. But it
> > > doesn't help with the more tricky cases, where one fuse filesystem is
> > > using another, and as those may become more widespread, this approach
> > > will fail.
> >
> > At the moment, yes. But it's not impossible for us to modify the patch
> > to handle that as well.
>
> It depends on what you mean. The most direct reading of your statement
> is simply wrong: It is _theoretically_ impossible to find the correct
> order for freezing filesystems. To do so would be equivalent to
> solving the halting problem.

I'm not sure that's true. You see, I'm thinking of this as not that
different to the problem of unmounting filesystems. There, too, we need
to unmount in a particular order, and let transactions on each
filesystem stop cleanly before we can unmount them. Even if there are
differences, perhaps looking at how we handle unmounting will help with
handling freezing.

Regards,

Nigel

2008-10-31 08:49:27

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Fri, 31 Oct 2008, Nigel Cunningham wrote:
> I'm not sure that's true. You see, I'm thinking of this as not that
> different to the problem of unmounting filesystems. There, too, we need
> to unmount in a particular order, and let transactions on each
> filesystem stop cleanly before we can unmount them. Even if there are
> differences, perhaps looking at how we handle unmounting will help with
> handling freezing.

There's nothing magic about umount, it just uses a refcount on the fs.

But umount changes the namespace, that's the big difference. For
example if a process is accessing path P which has a component inside
the mount, it _will_ get different results before and after the
umount. This is not acceptable for freezing.

For freezing to work with such a refcounting scheme, we'd have to
count _future_ uses of the fs as well, not just current ones, which is
obviously impossible.

Miklos

2008-10-31 09:11:15

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi.

On Fri, 2008-10-31 at 09:49 +0100, Miklos Szeredi wrote:
> On Fri, 31 Oct 2008, Nigel Cunningham wrote:
> > I'm not sure that's true. You see, I'm thinking of this as not that
> > different to the problem of unmounting filesystems. There, too, we need
> > to unmount in a particular order, and let transactions on each
> > filesystem stop cleanly before we can unmount them. Even if there are
> > differences, perhaps looking at how we handle unmounting will help with
> > handling freezing.
>
> There's nothing magic about umount, it just uses a refcount on the fs.
>
> But umount changes the namespace, that's the big difference. For
> example if a process is accessing path P which has a component inside
> the mount, it _will_ get different results before and after the
> umount. This is not acceptable for freezing.
>
> For freezing to work with such a refcounting scheme, we'd have to
> count _future_ uses of the fs as well, not just current ones, which is
> obviously impossible.

I must be missing something. If you're freezing future users of the
filesystem before they can start anything new, doesn't that deal with
this problem?

Regards,

Nigel

2008-10-31 09:16:32

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Fri, 31 Oct 2008, Nigel Cunningham wrote:
> Hi.
>
> On Fri, 2008-10-31 at 09:49 +0100, Miklos Szeredi wrote:
> > On Fri, 31 Oct 2008, Nigel Cunningham wrote:
> > > I'm not sure that's true. You see, I'm thinking of this as not that
> > > different to the problem of unmounting filesystems. There, too, we need
> > > to unmount in a particular order, and let transactions on each
> > > filesystem stop cleanly before we can unmount them. Even if there are
> > > differences, perhaps looking at how we handle unmounting will help with
> > > handling freezing.
> >
> > There's nothing magic about umount, it just uses a refcount on the fs.
> >
> > But umount changes the namespace, that's the big difference. For
> > example if a process is accessing path P which has a component inside
> > the mount, it _will_ get different results before and after the
> > umount. This is not acceptable for freezing.
> >
> > For freezing to work with such a refcounting scheme, we'd have to
> > count _future_ uses of the fs as well, not just current ones, which is
> > obviously impossible.
>
> I must be missing something. If you're freezing future users of the
> filesystem before they can start anything new, doesn't that deal with
> this problem?

How do you determine which are the future users?

Miklos

2008-10-31 11:28:19

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Hi again.

On Fri, 2008-10-31 at 10:16 +0100, Miklos Szeredi wrote:
> On Fri, 31 Oct 2008, Nigel Cunningham wrote:
> > Hi.
> >
> > On Fri, 2008-10-31 at 09:49 +0100, Miklos Szeredi wrote:
> > > On Fri, 31 Oct 2008, Nigel Cunningham wrote:
> > > > I'm not sure that's true. You see, I'm thinking of this as not that
> > > > different to the problem of unmounting filesystems. There, too, we need
> > > > to unmount in a particular order, and let transactions on each
> > > > filesystem stop cleanly before we can unmount them. Even if there are
> > > > differences, perhaps looking at how we handle unmounting will help with
> > > > handling freezing.
> > >
> > > There's nothing magic about umount, it just uses a refcount on the fs.
> > >
> > > But umount changes the namespace, that's the big difference. For
> > > example if a process is accessing path P which has a component inside
> > > the mount, it _will_ get different results before and after the
> > > umount. This is not acceptable for freezing.
> > >
> > > For freezing to work with such a refcounting scheme, we'd have to
> > > count _future_ uses of the fs as well, not just current ones, which is
> > > obviously impossible.
> >
> > I must be missing something. If you're freezing future users of the
> > filesystem before they can start anything new, doesn't that deal with
> > this problem?
>
> How do you determine which are the future users?

You don't. You just use FUSE_MIGHT_FREEZE to stop them before they take
locks that might cause problems. So my suggestion is:

1) Stop new requests at FUSE_MIGHT_FREEZE
2) Handle existing requests by using freezer_do_not_count in
request_send and request_send_nowait before the spin_lock and
freezer_count after the spin_unlock.

With #2, we don't need to care about whether the request is completed
before freezing completes or post-resume.

If the userspace process tries to use an already frozen fuse filesystem
and gets frozen, that's okay because we'll sit waiting for an answer,
not be counted by the freezer and so not contribute to any failure to
freeze processes.

If the userspace process completes its work, we'll either get caught at
the freezer_count (if we've already been told to freeze) or be gotten
later, after exiting the fuse code.

Regards,

Nigel

2008-10-31 12:45:00

[permalink] [raw]

Subject: Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

On Fri, 31 Oct 2008, Nigel Cunningham wrote:
> You don't. You just use FUSE_MIGHT_FREEZE to stop them before they take
> locks that might cause problems. So my suggestion is:

Before? FUSE_MIGHT_FREEZE is called _after_ i_mutex is taken by the
VFS.

> 1) Stop new requests at FUSE_MIGHT_FREEZE
> 2) Handle existing requests by using freezer_do_not_count in
> request_send and request_send_nowait before the spin_lock and
> freezer_count after the spin_unlock.
>
> With #2, we don't need to care about whether the request is completed
> before freezing completes or post-resume.
>
> If the userspace process tries to use an already frozen fuse filesystem
> and gets frozen, that's okay because we'll sit waiting for an answer,
> not be counted by the freezer and so not contribute to any failure to
> freeze processes.
>
> If the userspace process completes its work, we'll either get caught at
> the freezer_count (if we've already been told to freeze) or be gotten
> later, after exiting the fuse code.

Yes, this is the variant of categorizing sleeps to freezing and
non-freezing. The problem is, you need to do that with all
mutex_lock(&inode->i_mutex) instances as well. Try grepping for that
in fs/*.c!

It _is_ possible, it's just not practical.

Miklos

2008-10-31 21:11:51