Darrick J. Wong poked me about the status of the fs freez work, he's
right, it's been too long since the last spin. The last v2 attempt happened
in April 2021 [0], this just takes the feedback from Christoph and spins it
again. I've only done basic build tests on x86_64, and haven't yet run time
tested the stuff, but given the size of this set its better to review early
before getting stuck on details. So this is what I've ended up with so far.
Please help me paint the bike shed, and figure out the stuff perhaps
I had not considered yet. The locking stuff is really the important thing
here.
I'd like to re-iterate that tons of areas of the kernel are using the
kthread freezer stuff for things it probably has no reason to use it, so
once we remove this from the fs, it should be easy to start trimming this
from other parts of the kernel. The kthread freezer stuff was put in place
originally stop IO in flight for fs. Other parts of the kernels should
have no business using this stuff after all this work is done.
[0] https://lore.kernel.org/all/[email protected]/
Changes since the last v2:
* instead of having different semantics for lock / unlocked freeze
and thaw calls, this unifies the semantics by requiring the lock
prior to freeze / thaw
* uses grab_active_super() now in all all places which need to freeze
or thaw, this includes filesystems, this is to match the locking
requirements, and so to not add new heuristics over defining if the
superblock might be in a good state for freeze/thaw.
* drops SB_FREEZE_COMPLETE_AUTO in favor of just checking for a flag
to be able to determine if userspace initiated the freeze or if its
auto (by the kernel pm)
* folded the pm calls for the VFS so that instead of one call which
has a one-liner with two routines, we use the same one-liner on the
pm side of things.
* split the FS stuff by using a enw temporary flag, so to enable
easier review of the FS changes
* more filesystems use the freezer API now so this also converts them
over
* adjusted the coccinelle rule to use the new flag and in the end
removes it
This is all here too:
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20231010-fs-freeze-v5
Luis Chamberlain (24):
fs: unify locking semantics for fs freeze / thaw
fs: add frozen sb state helpers
fs: distinguish between user initiated freeze and kernel initiated
freeze
fs: add iterate_supers_excl() and iterate_supers_reverse_excl()
fs: add automatic kernel fs freeze / thaw and remove kthread freezing
xfs: replace kthread freezing with auto fs freezing
btrfs: replace kthread freezing with auto fs freezing
ext4: replace kthread freezing with auto fs freezing
f2fs: replace kthread freezing with auto fs freezing
cifs: replace kthread freezing with auto fs freezing
gfs2: replace kthread freezing with auto fs freezing
jfs: replace kthread freezing with auto fs freezing
nilfs2: replace kthread freezing with auto fs freezing
nfs: replace kthread freezing with auto fs freezing
nfsd: replace kthread freezing with auto fs freezing
ubifs: replace kthread freezing with auto fs freezing
ksmbd: replace kthread freezing with auto fs freezing
jffs2: replace kthread freezing with auto fs freezing
jbd2: replace kthread freezing with auto fs freezing
coredump: drop freezer usage
ecryptfs: replace kthread freezing with auto fs freezing
fscache: replace kthread freezing with auto fs freezing
lockd: replace kthread freezing with auto fs freezing
fs: remove FS_AUTOFREEZE
block/bdev.c | 9 +-
fs/btrfs/disk-io.c | 4 +-
fs/btrfs/scrub.c | 2 +-
fs/cifs/cifsfs.c | 10 +-
fs/cifs/connect.c | 8 --
fs/cifs/dfs_cache.c | 2 +-
fs/coredump.c | 2 +-
fs/ecryptfs/kthread.c | 1 -
fs/ext4/ext4_jbd2.c | 2 +-
fs/ext4/super.c | 3 -
fs/f2fs/gc.c | 12 +-
fs/f2fs/segment.c | 6 +-
fs/fscache/main.c | 2 +-
fs/gfs2/glock.c | 6 +-
fs/gfs2/glops.c | 2 +-
fs/gfs2/log.c | 2 -
fs/gfs2/main.c | 4 +-
fs/gfs2/quota.c | 2 -
fs/gfs2/super.c | 11 +-
fs/gfs2/sys.c | 12 +-
fs/gfs2/util.c | 7 +-
fs/ioctl.c | 14 ++-
fs/jbd2/journal.c | 54 ++++-----
fs/jffs2/background.c | 3 +-
fs/jfs/jfs_logmgr.c | 11 +-
fs/jfs/jfs_txnmgr.c | 31 ++----
fs/ksmbd/connection.c | 3 -
fs/ksmbd/transport_tcp.c | 2 -
fs/lockd/clntproc.c | 1 -
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 4 -
fs/nfsd/nfssvc.c | 2 -
fs/nilfs2/segment.c | 48 ++++----
fs/quota/quota.c | 4 +-
fs/super.c | 232 ++++++++++++++++++++++++++++++++-------
fs/ubifs/commit.c | 4 -
fs/xfs/xfs_log.c | 3 +-
fs/xfs/xfs_log_cil.c | 2 +-
fs/xfs/xfs_mru_cache.c | 2 +-
fs/xfs/xfs_pwork.c | 2 +-
fs/xfs/xfs_super.c | 14 +--
fs/xfs/xfs_trans.c | 3 +-
fs/xfs/xfs_trans_ail.c | 3 -
include/linux/fs.h | 53 ++++++++-
kernel/power/process.c | 15 ++-
45 files changed, 393 insertions(+), 229 deletions(-)
--
2.35.1
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superflous
freezer calls:
spatch --sp-file fs-freeze-cleanup.cocci --in-place --timeout 120 --dir fs/ext4 --jobs 12 --use-gitgrep
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/ext4/super.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b31db521d6bf..0ae6f13c7fa4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -136,7 +136,7 @@ static struct file_system_type ext2_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("ext2");
MODULE_ALIAS("ext2");
@@ -152,7 +152,7 @@ static struct file_system_type ext3_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("ext3");
MODULE_ALIAS("ext3");
@@ -3734,7 +3734,6 @@ static int ext4_lazyinit_thread(void *arg)
unsigned long next_wakeup, cur;
BUG_ON(NULL == eli);
- set_freezable();
cont_thread:
while (true) {
@@ -3786,8 +3785,6 @@ static int ext4_lazyinit_thread(void *arg)
}
mutex_unlock(&eli->li_list_mtx);
- try_to_freeze();
-
cur = jiffies;
if ((time_after_eq(cur, next_wakeup)) ||
(MAX_JIFFY_OFFSET == next_wakeup)) {
@@ -7192,7 +7189,7 @@ static struct file_system_type ext4_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+ .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("ext4");
--
2.35.1
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superflous
freezer calls:
spatch --sp-file fs-freeze-cleanup.cocci --in-place --timeout 120 --dir fs/btrfs --jobs 12 --use-gitgrep
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/btrfs/disk-io.c | 4 ++--
fs/btrfs/scrub.c | 2 +-
fs/btrfs/super.c | 4 ++--
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f330dfa066c0..bf7ad1f34e21 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2354,7 +2354,7 @@ static void btrfs_init_qgroup(struct btrfs_fs_info *fs_info)
static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
{
u32 max_active = fs_info->thread_pool_size;
- unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
+ unsigned int flags = WQ_MEM_RECLAIM | WQ_UNBOUND;
fs_info->workers =
btrfs_alloc_workqueue(fs_info, "worker", flags, max_active, 16);
@@ -2395,7 +2395,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
fs_info->qgroup_rescan_workers =
btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
fs_info->discard_ctl.discard_workers =
- alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
+ alloc_workqueue("btrfs_discard", WQ_UNBOUND, 1);
if (!(fs_info->workers && fs_info->hipri_workers &&
fs_info->delalloc_workers && fs_info->flush_workers &&
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 52b346795f66..d32d7308c3a1 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -4207,7 +4207,7 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
struct workqueue_struct *scrub_workers = NULL;
struct workqueue_struct *scrub_wr_comp = NULL;
struct workqueue_struct *scrub_parity = NULL;
- unsigned int flags = WQ_FREEZABLE | WQ_UNBOUND;
+ unsigned int flags = WQ_UNBOUND;
int max_active = fs_info->thread_pool_size;
int ret = -ENOMEM;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 433ce221dc5c..35059fe276ac 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2138,7 +2138,7 @@ static struct file_system_type btrfs_fs_type = {
.name = "btrfs",
.mount = btrfs_mount,
.kill_sb = btrfs_kill_super,
- .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_AUTOFREEZE,
};
static struct file_system_type btrfs_root_fs_type = {
@@ -2146,7 +2146,7 @@ static struct file_system_type btrfs_root_fs_type = {
.name = "btrfs",
.mount = btrfs_mount_root,
.kill_sb = btrfs_kill_super,
- .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP,
+ .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("btrfs");
--
2.35.1
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superflous
freezer calls:
spatch --sp-file fs-freeze-cleanup.cocci --in-place --timeout 120 --dir fs/jffs2 --jobs 12 --use-gitgrep
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/jffs2/background.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/jffs2/background.c b/fs/jffs2/background.c
index 6da92ecaf66d..e29fdf1ed878 100644
--- a/fs/jffs2/background.c
+++ b/fs/jffs2/background.c
@@ -87,7 +87,6 @@ static int jffs2_garbage_collect_thread(void *_c)
set_user_nice(current, 10);
- set_freezable();
for (;;) {
sigprocmask(SIG_UNBLOCK, &hupmask, NULL);
again:
@@ -119,7 +118,7 @@ static int jffs2_garbage_collect_thread(void *_c)
/* Put_super will send a SIGKILL and then wait on the sem.
*/
- while (signal_pending(current) || freezing(current)) {
+ while (signal_pending(current) ||) {
unsigned long signr;
if (try_to_freeze())
--
2.35.1
Add support to automatically handle freezing and thawing filesystems
during the kernel's suspend/resume cycle.
This is needed so that we properly really stop IO in flight without
races after userspace has been frozen. Without this we rely on
kthread freezing and its semantics are loose and error prone.
For instance, even though a kthread may use try_to_freeze() and end
up being frozen we have no way of being sure that everything that
has been spawned asynchronously from it (such as timers) have also
been stopped as well.
A long term advantage of also adding filesystem freeze / thawing
supporting during suspend / hibernation is that long term we may
be able to eventually drop the kernel's thread freezing completely
as it was originally added to stop disk IO in flight as we hibernate
or suspend.
This does not remove the superflous freezer calls on all filesystems.
Each filesystem must remove all the kthread freezer stuff and peg
the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
flag.
Subsequent patches remove the kthread freezer usage from each
filesystem, one at a time to make all this work bisectable.
Once all filesystems remove the usage of the kthread freezer we
can remove the FS_AUTOFREEZE flag.
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/super.c | 69 ++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 14 +++++++++
kernel/power/process.c | 15 ++++++++-
3 files changed, 97 insertions(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c
index 2f77fcb6e555..e8af4c8269ad 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1853,3 +1853,72 @@ int thaw_super(struct super_block *sb, bool usercall)
return 0;
}
EXPORT_SYMBOL(thaw_super);
+
+#ifdef CONFIG_PM_SLEEP
+static bool super_should_freeze(struct super_block *sb)
+{
+ if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
+ return false;
+ /*
+ * We don't freeze virtual filesystems, we skip those filesystems with
+ * no backing device.
+ */
+ if (sb->s_bdi == &noop_backing_dev_info)
+ return false;
+
+ return true;
+}
+
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+ int error = 0;
+
+ if (!grab_lock_super(sb)) {
+ pr_err("%s (%s): freezing failed to grab_super()\n",
+ sb->s_type->name, sb->s_id);
+ return -ENOTTY;
+ }
+
+ if (!super_should_freeze(sb))
+ goto out;
+
+ pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
+
+ error = freeze_super(sb, false);
+ if (!error)
+ lockdep_sb_freeze_release(sb);
+ else if (error != -EBUSY)
+ pr_notice("%s (%s): Unable to freeze, error=%d",
+ sb->s_type->name, sb->s_id, error);
+
+out:
+ deactivate_locked_super(sb);
+ return error;
+}
+
+int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+ int error = 0;
+
+ if (!grab_lock_super(sb)) {
+ pr_err("%s (%s): thawing failed to grab_super()\n",
+ sb->s_type->name, sb->s_id);
+ return -ENOTTY;
+ }
+
+ if (!super_should_freeze(sb))
+ goto out;
+
+ pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id);
+
+ error = thaw_super(sb, false);
+ if (error && error != -EBUSY)
+ pr_notice("%s (%s): Unable to unfreeze, error=%d",
+ sb->s_type->name, sb->s_id, error);
+
+out:
+ deactivate_locked_super(sb);
+ return error;
+}
+
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f168e72f6ca1..e5bee359e804 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2231,6 +2231,7 @@ struct file_system_type {
#define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */
#define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
+#define FS_AUTOFREEZE (1<<16) /* temporary as we phase kthread freezer out */
int (*init_fs_context)(struct fs_context *);
const struct fs_parameter_spec *parameters;
struct dentry *(*mount) (struct file_system_type *, int,
@@ -2306,6 +2307,19 @@ extern int user_statfs(const char __user *, struct kstatfs *);
extern int fd_statfs(int, struct kstatfs *);
extern int freeze_super(struct super_block *super, bool usercall);
extern int thaw_super(struct super_block *super, bool usercall);
+#ifdef CONFIG_PM_SLEEP
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv);
+int fs_suspend_thaw_sb(struct super_block *sb, void *priv);
+#else
+static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+ return 0;
+}
+static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+ return 0;
+}
+#endif
extern __printf(2, 3)
int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
extern int super_setup_bdi(struct super_block *sb);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 6c1c7e566d35..1dd6b0b6b4e5 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -140,6 +140,16 @@ int freeze_processes(void)
BUG_ON(in_atomic());
+ pr_info("Freezing filesystems ... ");
+ error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
+ if (error) {
+ pr_cont("failed\n");
+ iterate_supers_excl(fs_suspend_thaw_sb, NULL);
+ thaw_processes();
+ return error;
+ }
+ pr_cont("done.\n");
+
/*
* Now that the whole userspace is frozen we need to disable
* the OOM killer to disallow any further interference with
@@ -149,8 +159,10 @@ int freeze_processes(void)
if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
error = -EBUSY;
- if (error)
+ if (error) {
+ iterate_supers_excl(fs_suspend_thaw_sb, NULL);
thaw_processes();
+ }
return error;
}
@@ -188,6 +200,7 @@ void thaw_processes(void)
pm_nosig_freezing = false;
oom_killer_enable();
+ iterate_supers_excl(fs_suspend_thaw_sb, NULL);
pr_info("Restarting tasks ... ");
--
2.35.1
On Fri, Jan 13, 2023 at 04:33:50PM -0800, Luis Chamberlain wrote:
> +#ifdef CONFIG_PM_SLEEP
> +static bool super_should_freeze(struct super_block *sb)
> +{
> + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
> + return false;
This is used.
> + /*
> + * We don't freeze virtual filesystems, we skip those filesystems with
> + * no backing device.
> + */
> + if (sb->s_bdi == &noop_backing_dev_info)
> + return false;
I however had dropped this and forgot to update my branch.
> +
> + return true;
> +}
So the call to super_should_freeze() is removed and the check for the
flag is open coded.
Luis
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superflous
freezer calls:
spatch --sp-file fs-freeze-cleanup.cocci --in-place --timeout 120 --dir fs/nfs --jobs 12 --use-gitgrep
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/ksmbd/connection.c | 3 ---
fs/ksmbd/transport_tcp.c | 2 --
2 files changed, 5 deletions(-)
diff --git a/fs/ksmbd/connection.c b/fs/ksmbd/connection.c
index fd0a288af299..4ed17de1423e 100644
--- a/fs/ksmbd/connection.c
+++ b/fs/ksmbd/connection.c
@@ -292,9 +292,6 @@ int ksmbd_conn_handler_loop(void *p)
conn->last_active = jiffies;
while (ksmbd_conn_alive(conn)) {
- if (try_to_freeze())
- continue;
-
kvfree(conn->request_buf);
conn->request_buf = NULL;
diff --git a/fs/ksmbd/transport_tcp.c b/fs/ksmbd/transport_tcp.c
index 4c6bd0b69979..dadb4f306428 100644
--- a/fs/ksmbd/transport_tcp.c
+++ b/fs/ksmbd/transport_tcp.c
@@ -305,8 +305,6 @@ static int ksmbd_tcp_readv(struct tcp_transport *t, struct kvec *iov_orig,
ksmbd_msg.msg_controllen = 0;
for (total_read = 0; to_read; total_read += length, to_read -= length) {
- try_to_freeze();
-
if (!ksmbd_conn_alive(conn)) {
total_read = -ESHUTDOWN;
break;
--
2.35.1
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superflous
freezer calls:
spatch --sp-file fs-freeze-cleanup.cocci --in-place --timeout 120 --dir fs/nfs --jobs 12 --use-gitgrep
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/nfs/callback.c | 4 ----
fs/nfs/fs_context.c | 4 ++--
2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 456af7d230cf..f5ba4d6bf2a7 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -77,8 +77,6 @@ nfs4_callback_svc(void *vrqstp)
int err;
struct svc_rqst *rqstp = vrqstp;
- set_freezable();
-
while (!kthread_freezable_should_stop(NULL)) {
if (signal_pending(current))
@@ -109,8 +107,6 @@ nfs41_callback_svc(void *vrqstp)
int error;
DEFINE_WAIT(wq);
- set_freezable();
-
while (!kthread_freezable_should_stop(NULL)) {
if (signal_pending(current))
diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
index 9bcd53d5c7d4..04753962db9a 100644
--- a/fs/nfs/fs_context.c
+++ b/fs/nfs/fs_context.c
@@ -1583,7 +1583,7 @@ struct file_system_type nfs_fs_type = {
.init_fs_context = nfs_init_fs_context,
.parameters = nfs_fs_parameters,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("nfs");
EXPORT_SYMBOL_GPL(nfs_fs_type);
@@ -1595,7 +1595,7 @@ struct file_system_type nfs4_fs_type = {
.init_fs_context = nfs_init_fs_context,
.parameters = nfs_fs_parameters,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("nfs4");
MODULE_ALIAS("nfs4");
--
2.35.1
Right now freeze_super() and thaw_super() are called with
different locking contexts. To expand on this is messy, so
just unify the requirement to require grabbing an active
reference and keep the superblock locked.
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
block/bdev.c | 5 ++++-
fs/f2fs/gc.c | 5 +++++
fs/gfs2/super.c | 9 +++++++--
fs/gfs2/sys.c | 6 ++++++
fs/gfs2/util.c | 5 +++++
fs/ioctl.c | 12 ++++++++++--
fs/super.c | 51 ++++++++++++++-----------------------------------
7 files changed, 51 insertions(+), 42 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index edc110d90df4..8fd3a7991c02 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -251,7 +251,7 @@ int freeze_bdev(struct block_device *bdev)
error = sb->s_op->freeze_super(sb);
else
error = freeze_super(sb);
- deactivate_super(sb);
+ deactivate_locked_super(sb);
if (error) {
bdev->bd_fsfreeze_count--;
@@ -289,6 +289,8 @@ int thaw_bdev(struct block_device *bdev)
sb = bdev->bd_fsfreeze_sb;
if (!sb)
goto out;
+ if (!get_active_super(bdev))
+ goto out;
if (sb->s_op->thaw_super)
error = sb->s_op->thaw_super(sb);
@@ -298,6 +300,7 @@ int thaw_bdev(struct block_device *bdev)
bdev->bd_fsfreeze_count++;
else
bdev->bd_fsfreeze_sb = NULL;
+ deactivate_locked_super(sb);
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return error;
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 7444c392eab1..4c681fe487ee 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -2139,7 +2139,10 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
if (err)
return err;
+ if (!get_active_super(sbi->sb->s_bdev))
+ return -ENOTTY;
freeze_super(sbi->sb);
+
f2fs_down_write(&sbi->gc_lock);
f2fs_down_write(&sbi->cp_global_sem);
@@ -2190,6 +2193,8 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
out_err:
f2fs_up_write(&sbi->cp_global_sem);
f2fs_up_write(&sbi->gc_lock);
+ /* We use the same active reference from freeze */
thaw_super(sbi->sb);
+ deactivate_locked_super(sbi->sb);
return err;
}
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 999cc146d708..48df7b276b64 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -661,7 +661,12 @@ void gfs2_freeze_func(struct work_struct *work)
struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_freeze_work);
struct super_block *sb = sdp->sd_vfs;
- atomic_inc(&sb->s_active);
+ if (!get_active_super(sb->s_bdev)) {
+ fs_info(sdp, "GFS2: couldn't grap super for thaw for filesystem\n");
+ gfs2_assert_withdraw(sdp, 0);
+ return;
+ }
+
error = gfs2_freeze_lock(sdp, &freeze_gh, 0);
if (error) {
gfs2_assert_withdraw(sdp, 0);
@@ -675,7 +680,7 @@ void gfs2_freeze_func(struct work_struct *work)
}
gfs2_freeze_unlock(&freeze_gh);
}
- deactivate_super(sb);
+ deactivate_locked_super(sb);
clear_bit_unlock(SDF_FS_FROZEN, &sdp->sd_flags);
wake_up_bit(&sdp->sd_flags, SDF_FS_FROZEN);
return;
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index d87ea98cf535..d0b80552a678 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -162,6 +162,9 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
+ if (!get_active_super(sb->s_bdev))
+ return -ENOTTY;
+
switch (n) {
case 0:
error = thaw_super(sdp->sd_vfs);
@@ -170,9 +173,12 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
error = freeze_super(sdp->sd_vfs);
break;
default:
+ deactivate_locked_super(sb);
return -EINVAL;
}
+ deactivate_locked_super(sb);
+
if (error) {
fs_warn(sdp, "freeze %d error %d\n", n, error);
return error;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 7a6aeffcdf5c..3a0cd5e9ad84 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -345,10 +345,15 @@ int gfs2_withdraw(struct gfs2_sbd *sdp)
set_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
if (sdp->sd_args.ar_errors == GFS2_ERRORS_WITHDRAW) {
+ if (!get_active_super(sb->s_bdev)) {
+ fs_err(sdp, "could not grab super on withdraw for file system\n");
+ return -1;
+ }
fs_err(sdp, "about to withdraw this file system\n");
BUG_ON(sdp->sd_args.ar_debug);
signal_our_withdraw(sdp);
+ deactivate_locked_super(sb);
kobject_uevent(&sdp->sd_kobj, KOBJ_OFFLINE);
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 80ac36aea913..3d2536e1ea58 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -386,6 +386,7 @@ static int ioctl_fioasync(unsigned int fd, struct file *filp,
static int ioctl_fsfreeze(struct file *filp)
{
struct super_block *sb = file_inode(filp)->i_sb;
+ int ret;
if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
return -EPERM;
@@ -394,10 +395,17 @@ static int ioctl_fsfreeze(struct file *filp)
if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL)
return -EOPNOTSUPP;
+ if (!get_active_super(sb->s_bdev))
+ return -ENOTTY;
+
/* Freeze */
if (sb->s_op->freeze_super)
- return sb->s_op->freeze_super(sb);
- return freeze_super(sb);
+ ret = sb->s_op->freeze_super(sb);
+ ret = freeze_super(sb);
+
+ deactivate_locked_super(sb);
+
+ return ret;
}
static int ioctl_fsthaw(struct file *filp)
diff --git a/fs/super.c b/fs/super.c
index 12c08cb20405..a31a41b313f3 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -39,8 +39,6 @@
#include <uapi/linux/mount.h>
#include "internal.h"
-static int thaw_super_locked(struct super_block *sb);
-
static LIST_HEAD(super_blocks);
static DEFINE_SPINLOCK(sb_lock);
@@ -830,7 +828,6 @@ struct super_block *get_active_super(struct block_device *bdev)
if (sb->s_bdev == bdev) {
if (!grab_super(sb))
goto restart;
- up_write(&sb->s_umount);
return sb;
}
}
@@ -1003,13 +1000,13 @@ void emergency_remount(void)
static void do_thaw_all_callback(struct super_block *sb)
{
- down_write(&sb->s_umount);
+ if (!get_active_super(sb->s_bdev))
+ return;
if (sb->s_root && sb->s_flags & SB_BORN) {
emergency_thaw_bdev(sb);
- thaw_super_locked(sb);
- } else {
- up_write(&sb->s_umount);
+ thaw_super(sb);
}
+ deactivate_locked_super(sb);
}
static void do_thaw_all(struct work_struct *work)
@@ -1651,22 +1648,15 @@ int freeze_super(struct super_block *sb)
{
int ret;
- atomic_inc(&sb->s_active);
- down_write(&sb->s_umount);
- if (sb->s_writers.frozen != SB_UNFROZEN) {
- deactivate_locked_super(sb);
+ if (sb->s_writers.frozen != SB_UNFROZEN)
return -EBUSY;
- }
- if (!(sb->s_flags & SB_BORN)) {
- up_write(&sb->s_umount);
+ if (!(sb->s_flags & SB_BORN))
return 0; /* sic - it's "nothing to do" */
- }
if (sb_rdonly(sb)) {
/* Nothing to do really... */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
- up_write(&sb->s_umount);
return 0;
}
@@ -1686,7 +1676,6 @@ int freeze_super(struct super_block *sb)
sb->s_writers.frozen = SB_UNFROZEN;
sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
wake_up(&sb->s_writers.wait_unfrozen);
- deactivate_locked_super(sb);
return ret;
}
@@ -1702,7 +1691,6 @@ int freeze_super(struct super_block *sb)
sb->s_writers.frozen = SB_UNFROZEN;
sb_freeze_unlock(sb, SB_FREEZE_FS);
wake_up(&sb->s_writers.wait_unfrozen);
- deactivate_locked_super(sb);
return ret;
}
}
@@ -1712,19 +1700,22 @@ int freeze_super(struct super_block *sb)
*/
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
lockdep_sb_freeze_release(sb);
- up_write(&sb->s_umount);
return 0;
}
EXPORT_SYMBOL(freeze_super);
-static int thaw_super_locked(struct super_block *sb)
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
{
int error;
- if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
- up_write(&sb->s_umount);
+ if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
return -EINVAL;
- }
if (sb_rdonly(sb)) {
sb->s_writers.frozen = SB_UNFROZEN;
@@ -1739,7 +1730,6 @@ static int thaw_super_locked(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
lockdep_sb_freeze_release(sb);
- up_write(&sb->s_umount);
return error;
}
}
@@ -1748,19 +1738,6 @@ static int thaw_super_locked(struct super_block *sb)
sb_freeze_unlock(sb, SB_FREEZE_FS);
out:
wake_up(&sb->s_writers.wait_unfrozen);
- deactivate_locked_super(sb);
return 0;
}
-
-/**
- * thaw_super -- unlock filesystem
- * @sb: the super to thaw
- *
- * Unlocks the filesystem and marks it writeable again after freeze_super().
- */
-int thaw_super(struct super_block *sb)
-{
- down_write(&sb->s_umount);
- return thaw_super_locked(sb);
-}
EXPORT_SYMBOL(thaw_super);
--
2.35.1
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superflous
freezer calls:
spatch --sp-file fs-freeze-cleanup.cocci --in-place --timeout 120 --dir fs/nilfs2 --jobs 12 --use-gitgrep
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/nilfs2/segment.c | 48 +++++++++++++++++++--------------------------
1 file changed, 20 insertions(+), 28 deletions(-)
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index f7a14ed12a66..1c48aa9c7f56 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2541,6 +2541,8 @@ static int nilfs_segctor_thread(void *arg)
struct nilfs_sc_info *sci = (struct nilfs_sc_info *)arg;
struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
int timeout = 0;
+ DEFINE_WAIT(wait);
+ int should_sleep = 1;
sci->sc_timer_task = current;
@@ -2572,38 +2574,28 @@ static int nilfs_segctor_thread(void *arg)
timeout = 0;
}
+ prepare_to_wait(&sci->sc_wait_daemon, &wait,
+ TASK_INTERRUPTIBLE);
- if (freezing(current)) {
+ if (sci->sc_seq_request != sci->sc_seq_done)
+ should_sleep = 0;
+ else if (sci->sc_flush_request)
+ should_sleep = 0;
+ else if (sci->sc_state & NILFS_SEGCTOR_COMMIT)
+ should_sleep = time_before(jiffies,
+ sci->sc_timer.expires);
+
+ if (should_sleep) {
spin_unlock(&sci->sc_state_lock);
- try_to_freeze();
+ schedule();
spin_lock(&sci->sc_state_lock);
- } else {
- DEFINE_WAIT(wait);
- int should_sleep = 1;
-
- prepare_to_wait(&sci->sc_wait_daemon, &wait,
- TASK_INTERRUPTIBLE);
-
- if (sci->sc_seq_request != sci->sc_seq_done)
- should_sleep = 0;
- else if (sci->sc_flush_request)
- should_sleep = 0;
- else if (sci->sc_state & NILFS_SEGCTOR_COMMIT)
- should_sleep = time_before(jiffies,
- sci->sc_timer.expires);
-
- if (should_sleep) {
- spin_unlock(&sci->sc_state_lock);
- schedule();
- spin_lock(&sci->sc_state_lock);
- }
- finish_wait(&sci->sc_wait_daemon, &wait);
- timeout = ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
- time_after_eq(jiffies, sci->sc_timer.expires));
-
- if (nilfs_sb_dirty(nilfs) && nilfs_sb_need_update(nilfs))
- set_nilfs_discontinued(nilfs);
}
+ finish_wait(&sci->sc_wait_daemon, &wait);
+ timeout = ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
+ time_after_eq(jiffies, sci->sc_timer.expires));
+
+ if (nilfs_sb_dirty(nilfs) && nilfs_sb_need_update(nilfs))
+ set_nilfs_discontinued(nilfs);
goto loop;
end_thread:
--
2.35.1
There are use cases where we wish to traverse the superblock list
but also capture errors, and in which case we want to avoid having
our callers issue a lock themselves since we can do the locking for
the callers. Provide a iterate_supers_excl() which calls a function
with the write lock held. If an error occurs we capture it and
propagate it.
Likewise there are use cases where we wish to traverse the superblock
list but in reverse order. The new iterate_supers_reverse_excl() helpers
does this but also also captures any errors encountered.
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/super.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 2 +
2 files changed, 93 insertions(+)
diff --git a/fs/super.c b/fs/super.c
index 0d6b4de8da88..2f77fcb6e555 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -732,6 +732,97 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
spin_unlock(&sb_lock);
}
+/**
+ * iterate_supers_excl - exclusively call func for all active superblocks
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * locked superblock and given argument. Returns 0 unless an error
+ * occurred on calling the function on any superblock.
+ */
+int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
+{
+ struct super_block *sb, *p = NULL;
+ int error = 0;
+
+ spin_lock(&sb_lock);
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if (hlist_unhashed(&sb->s_instances))
+ continue;
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+
+ down_write(&sb->s_umount);
+ if (sb->s_root && (sb->s_flags & SB_BORN)) {
+ error = f(sb, arg);
+ if (error) {
+ up_write(&sb->s_umount);
+ spin_lock(&sb_lock);
+ __put_super(sb);
+ break;
+ }
+ }
+ up_write(&sb->s_umount);
+
+ spin_lock(&sb_lock);
+ if (p)
+ __put_super(p);
+ p = sb;
+ }
+ if (p)
+ __put_super(p);
+ spin_unlock(&sb_lock);
+
+ return error;
+}
+
+/**
+ * iterate_supers_reverse_excl - exclusively calls func in reverse order
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * locked superblock and given argument, in reverse order, and holding
+ * the s_umount write lock. Returns if an error occurred.
+ */
+int iterate_supers_reverse_excl(int (*f)(struct super_block *, void *),
+ void *arg)
+{
+ struct super_block *sb, *p = NULL;
+ int error = 0;
+
+ spin_lock(&sb_lock);
+ list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+ if (hlist_unhashed(&sb->s_instances))
+ continue;
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+
+ down_write(&sb->s_umount);
+ if (sb->s_root && (sb->s_flags & SB_BORN)) {
+ error = f(sb, arg);
+ if (error) {
+ up_write(&sb->s_umount);
+ spin_lock(&sb_lock);
+ __put_super(sb);
+ break;
+ }
+ }
+ up_write(&sb->s_umount);
+
+ spin_lock(&sb_lock);
+ if (p)
+ __put_super(p);
+ p = sb;
+ }
+ if (p)
+ __put_super(p);
+ spin_unlock(&sb_lock);
+
+ return error;
+}
+
/**
* iterate_supers_type - call function for superblocks of given type
* @type: fs type
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3b2586de4364..f168e72f6ca1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2916,6 +2916,8 @@ extern struct super_block *get_active_super(struct block_device *bdev);
extern void drop_super(struct super_block *sb);
extern void drop_super_exclusive(struct super_block *sb);
extern void iterate_supers(void (*)(struct super_block *, void *), void *);
+extern int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg);
+extern int iterate_supers_reverse_excl(int (*)(struct super_block *, void *), void *);
extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
--
2.35.1
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superflous
freezer calls:
spatch --sp-file fs-freeze-cleanup.cocci --in-place --timeout 120 --dir fs/gfs2 --jobs 12 --use-gitgrep
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/gfs2/glock.c | 6 +++---
fs/gfs2/log.c | 2 --
fs/gfs2/main.c | 4 ++--
fs/gfs2/ops_fstype.c | 4 ++--
fs/gfs2/quota.c | 2 --
5 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 524f3c96b9a4..7ad1a1229ae3 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2459,14 +2459,14 @@ int __init gfs2_glock_init(void)
if (ret < 0)
return ret;
- glock_workqueue = alloc_workqueue("glock_workqueue", WQ_MEM_RECLAIM |
- WQ_HIGHPRI | WQ_FREEZABLE, 0);
+ glock_workqueue = alloc_workqueue("glock_workqueue",
+ WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
if (!glock_workqueue) {
rhashtable_destroy(&gl_hash_table);
return -ENOMEM;
}
gfs2_delete_workqueue = alloc_workqueue("delete_workqueue",
- WQ_MEM_RECLAIM | WQ_FREEZABLE,
+ WQ_MEM_RECLAIM,
0);
if (!gfs2_delete_workqueue) {
destroy_workqueue(glock_workqueue);
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 1fcc829f02ab..213fafc367f4 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -1330,8 +1330,6 @@ int gfs2_logd(void *data)
t = gfs2_tune_get(sdp, gt_logd_secs) * HZ;
- try_to_freeze();
-
do {
prepare_to_wait(&sdp->sd_logd_waitq, &wait,
TASK_INTERRUPTIBLE);
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index afcb32854f14..43d4748ad183 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -153,12 +153,12 @@ static int __init init_gfs2_fs(void)
error = -ENOMEM;
gfs_recovery_wq = alloc_workqueue("gfs_recovery",
- WQ_MEM_RECLAIM | WQ_FREEZABLE, 0);
+ WQ_MEM_RECLAIM, 0);
if (!gfs_recovery_wq)
goto fail_wq1;
gfs2_control_wq = alloc_workqueue("gfs2_control",
- WQ_UNBOUND | WQ_FREEZABLE, 0);
+ WQ_UNBOUND, 0);
if (!gfs2_control_wq)
goto fail_wq2;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index c0cf1d2d0ef5..8f5a63148eaf 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1740,7 +1740,7 @@ static void gfs2_kill_sb(struct super_block *sb)
struct file_system_type gfs2_fs_type = {
.name = "gfs2",
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_AUTOFREEZE,
.init_fs_context = gfs2_init_fs_context,
.parameters = gfs2_fs_parameters,
.kill_sb = gfs2_kill_sb,
@@ -1750,7 +1750,7 @@ MODULE_ALIAS_FS("gfs2");
struct file_system_type gfs2meta_fs_type = {
.name = "gfs2meta",
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_AUTOFREEZE,
.init_fs_context = gfs2_meta_init_fs_context,
.owner = THIS_MODULE,
};
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 1ed17226d9ed..710764af9d04 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1555,8 +1555,6 @@ int gfs2_quotad(void *data)
quotad_check_timeo(sdp, "sync", gfs2_quota_sync, t,
"ad_timeo, &tune->gt_quota_quantum);
- try_to_freeze();
-
bypass:
t = min(quotad_timeo, statfs_timeo);
--
2.35.1
On Fri 13-01-23 16:33:46, Luis Chamberlain wrote:
> Right now freeze_super() and thaw_super() are called with
> different locking contexts. To expand on this is messy, so
> just unify the requirement to require grabbing an active
> reference and keep the superblock locked.
>
> Suggested-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
The cleanup is nice but now freeze_super() does not increment s_active
anymore so nothing prevents the superblock from being torn down while it is
frozen. This is a behavioral change that needs documenting in the changelog
if nothing else but I think it may be actually problematic if the
filesystem's ->put_super method gets called on frozen filesystem. So I
think we may need to also block attempts to unmount frozen filesystem -
actually GFS2 needs this as well [1].
Honza
[1] lore.kernel.org/r/[email protected]
> ---
> block/bdev.c | 5 ++++-
> fs/f2fs/gc.c | 5 +++++
> fs/gfs2/super.c | 9 +++++++--
> fs/gfs2/sys.c | 6 ++++++
> fs/gfs2/util.c | 5 +++++
> fs/ioctl.c | 12 ++++++++++--
> fs/super.c | 51 ++++++++++++++-----------------------------------
> 7 files changed, 51 insertions(+), 42 deletions(-)
>
> diff --git a/block/bdev.c b/block/bdev.c
> index edc110d90df4..8fd3a7991c02 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -251,7 +251,7 @@ int freeze_bdev(struct block_device *bdev)
> error = sb->s_op->freeze_super(sb);
> else
> error = freeze_super(sb);
> - deactivate_super(sb);
> + deactivate_locked_super(sb);
>
> if (error) {
> bdev->bd_fsfreeze_count--;
> @@ -289,6 +289,8 @@ int thaw_bdev(struct block_device *bdev)
> sb = bdev->bd_fsfreeze_sb;
> if (!sb)
> goto out;
> + if (!get_active_super(bdev))
> + goto out;
>
> if (sb->s_op->thaw_super)
> error = sb->s_op->thaw_super(sb);
> @@ -298,6 +300,7 @@ int thaw_bdev(struct block_device *bdev)
> bdev->bd_fsfreeze_count++;
> else
> bdev->bd_fsfreeze_sb = NULL;
> + deactivate_locked_super(sb);
> out:
> mutex_unlock(&bdev->bd_fsfreeze_mutex);
> return error;
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 7444c392eab1..4c681fe487ee 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -2139,7 +2139,10 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> if (err)
> return err;
>
> + if (!get_active_super(sbi->sb->s_bdev))
> + return -ENOTTY;
> freeze_super(sbi->sb);
> +
> f2fs_down_write(&sbi->gc_lock);
> f2fs_down_write(&sbi->cp_global_sem);
>
> @@ -2190,6 +2193,8 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> out_err:
> f2fs_up_write(&sbi->cp_global_sem);
> f2fs_up_write(&sbi->gc_lock);
> + /* We use the same active reference from freeze */
> thaw_super(sbi->sb);
> + deactivate_locked_super(sbi->sb);
> return err;
> }
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 999cc146d708..48df7b276b64 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -661,7 +661,12 @@ void gfs2_freeze_func(struct work_struct *work)
> struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_freeze_work);
> struct super_block *sb = sdp->sd_vfs;
>
> - atomic_inc(&sb->s_active);
> + if (!get_active_super(sb->s_bdev)) {
> + fs_info(sdp, "GFS2: couldn't grap super for thaw for filesystem\n");
> + gfs2_assert_withdraw(sdp, 0);
> + return;
> + }
> +
> error = gfs2_freeze_lock(sdp, &freeze_gh, 0);
> if (error) {
> gfs2_assert_withdraw(sdp, 0);
> @@ -675,7 +680,7 @@ void gfs2_freeze_func(struct work_struct *work)
> }
> gfs2_freeze_unlock(&freeze_gh);
> }
> - deactivate_super(sb);
> + deactivate_locked_super(sb);
> clear_bit_unlock(SDF_FS_FROZEN, &sdp->sd_flags);
> wake_up_bit(&sdp->sd_flags, SDF_FS_FROZEN);
> return;
> diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
> index d87ea98cf535..d0b80552a678 100644
> --- a/fs/gfs2/sys.c
> +++ b/fs/gfs2/sys.c
> @@ -162,6 +162,9 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (!get_active_super(sb->s_bdev))
> + return -ENOTTY;
> +
> switch (n) {
> case 0:
> error = thaw_super(sdp->sd_vfs);
> @@ -170,9 +173,12 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
> error = freeze_super(sdp->sd_vfs);
> break;
> default:
> + deactivate_locked_super(sb);
> return -EINVAL;
> }
>
> + deactivate_locked_super(sb);
> +
> if (error) {
> fs_warn(sdp, "freeze %d error %d\n", n, error);
> return error;
> diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
> index 7a6aeffcdf5c..3a0cd5e9ad84 100644
> --- a/fs/gfs2/util.c
> +++ b/fs/gfs2/util.c
> @@ -345,10 +345,15 @@ int gfs2_withdraw(struct gfs2_sbd *sdp)
> set_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
>
> if (sdp->sd_args.ar_errors == GFS2_ERRORS_WITHDRAW) {
> + if (!get_active_super(sb->s_bdev)) {
> + fs_err(sdp, "could not grab super on withdraw for file system\n");
> + return -1;
> + }
> fs_err(sdp, "about to withdraw this file system\n");
> BUG_ON(sdp->sd_args.ar_debug);
>
> signal_our_withdraw(sdp);
> + deactivate_locked_super(sb);
>
> kobject_uevent(&sdp->sd_kobj, KOBJ_OFFLINE);
>
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 80ac36aea913..3d2536e1ea58 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -386,6 +386,7 @@ static int ioctl_fioasync(unsigned int fd, struct file *filp,
> static int ioctl_fsfreeze(struct file *filp)
> {
> struct super_block *sb = file_inode(filp)->i_sb;
> + int ret;
>
> if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
> return -EPERM;
> @@ -394,10 +395,17 @@ static int ioctl_fsfreeze(struct file *filp)
> if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL)
> return -EOPNOTSUPP;
>
> + if (!get_active_super(sb->s_bdev))
> + return -ENOTTY;
> +
> /* Freeze */
> if (sb->s_op->freeze_super)
> - return sb->s_op->freeze_super(sb);
> - return freeze_super(sb);
> + ret = sb->s_op->freeze_super(sb);
> + ret = freeze_super(sb);
> +
> + deactivate_locked_super(sb);
> +
> + return ret;
> }
>
> static int ioctl_fsthaw(struct file *filp)
> diff --git a/fs/super.c b/fs/super.c
> index 12c08cb20405..a31a41b313f3 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -39,8 +39,6 @@
> #include <uapi/linux/mount.h>
> #include "internal.h"
>
> -static int thaw_super_locked(struct super_block *sb);
> -
> static LIST_HEAD(super_blocks);
> static DEFINE_SPINLOCK(sb_lock);
>
> @@ -830,7 +828,6 @@ struct super_block *get_active_super(struct block_device *bdev)
> if (sb->s_bdev == bdev) {
> if (!grab_super(sb))
> goto restart;
> - up_write(&sb->s_umount);
> return sb;
> }
> }
> @@ -1003,13 +1000,13 @@ void emergency_remount(void)
>
> static void do_thaw_all_callback(struct super_block *sb)
> {
> - down_write(&sb->s_umount);
> + if (!get_active_super(sb->s_bdev))
> + return;
> if (sb->s_root && sb->s_flags & SB_BORN) {
> emergency_thaw_bdev(sb);
> - thaw_super_locked(sb);
> - } else {
> - up_write(&sb->s_umount);
> + thaw_super(sb);
> }
> + deactivate_locked_super(sb);
> }
>
> static void do_thaw_all(struct work_struct *work)
> @@ -1651,22 +1648,15 @@ int freeze_super(struct super_block *sb)
> {
> int ret;
>
> - atomic_inc(&sb->s_active);
> - down_write(&sb->s_umount);
> - if (sb->s_writers.frozen != SB_UNFROZEN) {
> - deactivate_locked_super(sb);
> + if (sb->s_writers.frozen != SB_UNFROZEN)
> return -EBUSY;
> - }
>
> - if (!(sb->s_flags & SB_BORN)) {
> - up_write(&sb->s_umount);
> + if (!(sb->s_flags & SB_BORN))
> return 0; /* sic - it's "nothing to do" */
> - }
>
> if (sb_rdonly(sb)) {
> /* Nothing to do really... */
> sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> - up_write(&sb->s_umount);
> return 0;
> }
>
> @@ -1686,7 +1676,6 @@ int freeze_super(struct super_block *sb)
> sb->s_writers.frozen = SB_UNFROZEN;
> sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
> wake_up(&sb->s_writers.wait_unfrozen);
> - deactivate_locked_super(sb);
> return ret;
> }
>
> @@ -1702,7 +1691,6 @@ int freeze_super(struct super_block *sb)
> sb->s_writers.frozen = SB_UNFROZEN;
> sb_freeze_unlock(sb, SB_FREEZE_FS);
> wake_up(&sb->s_writers.wait_unfrozen);
> - deactivate_locked_super(sb);
> return ret;
> }
> }
> @@ -1712,19 +1700,22 @@ int freeze_super(struct super_block *sb)
> */
> sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> lockdep_sb_freeze_release(sb);
> - up_write(&sb->s_umount);
> return 0;
> }
> EXPORT_SYMBOL(freeze_super);
>
> -static int thaw_super_locked(struct super_block *sb)
> +/**
> + * thaw_super -- unlock filesystem
> + * @sb: the super to thaw
> + *
> + * Unlocks the filesystem and marks it writeable again after freeze_super().
> + */
> +int thaw_super(struct super_block *sb)
> {
> int error;
>
> - if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
> - up_write(&sb->s_umount);
> + if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
> return -EINVAL;
> - }
>
> if (sb_rdonly(sb)) {
> sb->s_writers.frozen = SB_UNFROZEN;
> @@ -1739,7 +1730,6 @@ static int thaw_super_locked(struct super_block *sb)
> printk(KERN_ERR
> "VFS:Filesystem thaw failed\n");
> lockdep_sb_freeze_release(sb);
> - up_write(&sb->s_umount);
> return error;
> }
> }
> @@ -1748,19 +1738,6 @@ static int thaw_super_locked(struct super_block *sb)
> sb_freeze_unlock(sb, SB_FREEZE_FS);
> out:
> wake_up(&sb->s_writers.wait_unfrozen);
> - deactivate_locked_super(sb);
> return 0;
> }
> -
> -/**
> - * thaw_super -- unlock filesystem
> - * @sb: the super to thaw
> - *
> - * Unlocks the filesystem and marks it writeable again after freeze_super().
> - */
> -int thaw_super(struct super_block *sb)
> -{
> - down_write(&sb->s_umount);
> - return thaw_super_locked(sb);
> -}
> EXPORT_SYMBOL(thaw_super);
> --
> 2.35.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri 13-01-23 16:33:50, Luis Chamberlain wrote:
> Add support to automatically handle freezing and thawing filesystems
> during the kernel's suspend/resume cycle.
>
> This is needed so that we properly really stop IO in flight without
> races after userspace has been frozen. Without this we rely on
> kthread freezing and its semantics are loose and error prone.
> For instance, even though a kthread may use try_to_freeze() and end
> up being frozen we have no way of being sure that everything that
> has been spawned asynchronously from it (such as timers) have also
> been stopped as well.
>
> A long term advantage of also adding filesystem freeze / thawing
> supporting during suspend / hibernation is that long term we may
> be able to eventually drop the kernel's thread freezing completely
> as it was originally added to stop disk IO in flight as we hibernate
> or suspend.
>
> This does not remove the superflous freezer calls on all filesystems.
> Each filesystem must remove all the kthread freezer stuff and peg
> the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
> flag.
>
> Subsequent patches remove the kthread freezer usage from each
> filesystem, one at a time to make all this work bisectable.
> Once all filesystems remove the usage of the kthread freezer we
> can remove the FS_AUTOFREEZE flag.
>
> Signed-off-by: Luis Chamberlain <[email protected]>
Looks good to me. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> fs/super.c | 69 ++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 14 +++++++++
> kernel/power/process.c | 15 ++++++++-
> 3 files changed, 97 insertions(+), 1 deletion(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index 2f77fcb6e555..e8af4c8269ad 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1853,3 +1853,72 @@ int thaw_super(struct super_block *sb, bool usercall)
> return 0;
> }
> EXPORT_SYMBOL(thaw_super);
> +
> +#ifdef CONFIG_PM_SLEEP
> +static bool super_should_freeze(struct super_block *sb)
> +{
> + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
> + return false;
> + /*
> + * We don't freeze virtual filesystems, we skip those filesystems with
> + * no backing device.
> + */
> + if (sb->s_bdi == &noop_backing_dev_info)
> + return false;
> +
> + return true;
> +}
> +
> +int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> +{
> + int error = 0;
> +
> + if (!grab_lock_super(sb)) {
> + pr_err("%s (%s): freezing failed to grab_super()\n",
> + sb->s_type->name, sb->s_id);
> + return -ENOTTY;
> + }
> +
> + if (!super_should_freeze(sb))
> + goto out;
> +
> + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> +
> + error = freeze_super(sb, false);
> + if (!error)
> + lockdep_sb_freeze_release(sb);
> + else if (error != -EBUSY)
> + pr_notice("%s (%s): Unable to freeze, error=%d",
> + sb->s_type->name, sb->s_id, error);
> +
> +out:
> + deactivate_locked_super(sb);
> + return error;
> +}
> +
> +int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
> +{
> + int error = 0;
> +
> + if (!grab_lock_super(sb)) {
> + pr_err("%s (%s): thawing failed to grab_super()\n",
> + sb->s_type->name, sb->s_id);
> + return -ENOTTY;
> + }
> +
> + if (!super_should_freeze(sb))
> + goto out;
> +
> + pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id);
> +
> + error = thaw_super(sb, false);
> + if (error && error != -EBUSY)
> + pr_notice("%s (%s): Unable to unfreeze, error=%d",
> + sb->s_type->name, sb->s_id, error);
> +
> +out:
> + deactivate_locked_super(sb);
> + return error;
> +}
> +
> +#endif
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f168e72f6ca1..e5bee359e804 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2231,6 +2231,7 @@ struct file_system_type {
> #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */
> #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */
> #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
> +#define FS_AUTOFREEZE (1<<16) /* temporary as we phase kthread freezer out */
> int (*init_fs_context)(struct fs_context *);
> const struct fs_parameter_spec *parameters;
> struct dentry *(*mount) (struct file_system_type *, int,
> @@ -2306,6 +2307,19 @@ extern int user_statfs(const char __user *, struct kstatfs *);
> extern int fd_statfs(int, struct kstatfs *);
> extern int freeze_super(struct super_block *super, bool usercall);
> extern int thaw_super(struct super_block *super, bool usercall);
> +#ifdef CONFIG_PM_SLEEP
> +int fs_suspend_freeze_sb(struct super_block *sb, void *priv);
> +int fs_suspend_thaw_sb(struct super_block *sb, void *priv);
> +#else
> +static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> +{
> + return 0;
> +}
> +static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
> +{
> + return 0;
> +}
> +#endif
> extern __printf(2, 3)
> int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
> extern int super_setup_bdi(struct super_block *sb);
> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index 6c1c7e566d35..1dd6b0b6b4e5 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -140,6 +140,16 @@ int freeze_processes(void)
>
> BUG_ON(in_atomic());
>
> + pr_info("Freezing filesystems ... ");
> + error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
> + if (error) {
> + pr_cont("failed\n");
> + iterate_supers_excl(fs_suspend_thaw_sb, NULL);
> + thaw_processes();
> + return error;
> + }
> + pr_cont("done.\n");
> +
> /*
> * Now that the whole userspace is frozen we need to disable
> * the OOM killer to disallow any further interference with
> @@ -149,8 +159,10 @@ int freeze_processes(void)
> if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
> error = -EBUSY;
>
> - if (error)
> + if (error) {
> + iterate_supers_excl(fs_suspend_thaw_sb, NULL);
> thaw_processes();
> + }
> return error;
> }
>
> @@ -188,6 +200,7 @@ void thaw_processes(void)
> pm_nosig_freezing = false;
>
> oom_killer_enable();
> + iterate_supers_excl(fs_suspend_thaw_sb, NULL);
>
> pr_info("Restarting tasks ... ");
>
> --
> 2.35.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri, Jan 13, 2023 at 04:33:50PM -0800, Luis Chamberlain wrote:
> Add support to automatically handle freezing and thawing filesystems
> during the kernel's suspend/resume cycle.
>
> This is needed so that we properly really stop IO in flight without
> races after userspace has been frozen. Without this we rely on
> kthread freezing and its semantics are loose and error prone.
> For instance, even though a kthread may use try_to_freeze() and end
> up being frozen we have no way of being sure that everything that
> has been spawned asynchronously from it (such as timers) have also
> been stopped as well.
>
> A long term advantage of also adding filesystem freeze / thawing
> supporting during suspend / hibernation is that long term we may
> be able to eventually drop the kernel's thread freezing completely
> as it was originally added to stop disk IO in flight as we hibernate
> or suspend.
Hooray!
One evil question though --
Say you have dm devices A and B. Each has a distinct fs on it.
If you mount A and then B and initiate a suspend, that should result in
first B and then A freezing, right?
After resuming, you then change A's dm-table definition to point it
at a loop device backed by a file on B.
What happens now when you initiate a suspend? B freezes, then A tries
to flush data to the loop-mounted file on B, but it's too late for that.
That sounds like a deadlock?
Though I don't know how much we care about this corner case, since (a)
freezing has been busted on xfs for years and (b) one can think up all
sorts of horrid ouroborous scenarios like:
Change A's dm-table to point to a loop-mounted file on B, and changing B
to point to a loop-mounted file on A. Then try to write to either
filesystem and see what kind of storm you get.
Anyway, just wondering if you'd thought about that kind of doomsday
scenario that a nutty sysadmin could set up.
The only way I can think of to solve that kind of thing would be to hook
filesystems and loop devices into the device model, make fs "device"
suspend actually freeze, hope the suspend code suspends from the leaves
inward, and hope I actually understand how the device model works (I
don't.)
--D
> This does not remove the superflous freezer calls on all filesystems.
> Each filesystem must remove all the kthread freezer stuff and peg
> the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
> flag.
>
> Subsequent patches remove the kthread freezer usage from each
> filesystem, one at a time to make all this work bisectable.
> Once all filesystems remove the usage of the kthread freezer we
> can remove the FS_AUTOFREEZE flag.
>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> fs/super.c | 69 ++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 14 +++++++++
> kernel/power/process.c | 15 ++++++++-
> 3 files changed, 97 insertions(+), 1 deletion(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index 2f77fcb6e555..e8af4c8269ad 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1853,3 +1853,72 @@ int thaw_super(struct super_block *sb, bool usercall)
> return 0;
> }
> EXPORT_SYMBOL(thaw_super);
> +
> +#ifdef CONFIG_PM_SLEEP
> +static bool super_should_freeze(struct super_block *sb)
> +{
> + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
> + return false;
> + /*
> + * We don't freeze virtual filesystems, we skip those filesystems with
> + * no backing device.
> + */
> + if (sb->s_bdi == &noop_backing_dev_info)
> + return false;
> +
> + return true;
> +}
> +
> +int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> +{
> + int error = 0;
> +
> + if (!grab_lock_super(sb)) {
> + pr_err("%s (%s): freezing failed to grab_super()\n",
> + sb->s_type->name, sb->s_id);
> + return -ENOTTY;
> + }
> +
> + if (!super_should_freeze(sb))
> + goto out;
> +
> + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> +
> + error = freeze_super(sb, false);
> + if (!error)
> + lockdep_sb_freeze_release(sb);
> + else if (error != -EBUSY)
> + pr_notice("%s (%s): Unable to freeze, error=%d",
> + sb->s_type->name, sb->s_id, error);
> +
> +out:
> + deactivate_locked_super(sb);
> + return error;
> +}
> +
> +int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
> +{
> + int error = 0;
> +
> + if (!grab_lock_super(sb)) {
> + pr_err("%s (%s): thawing failed to grab_super()\n",
> + sb->s_type->name, sb->s_id);
> + return -ENOTTY;
> + }
> +
> + if (!super_should_freeze(sb))
> + goto out;
> +
> + pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id);
> +
> + error = thaw_super(sb, false);
> + if (error && error != -EBUSY)
> + pr_notice("%s (%s): Unable to unfreeze, error=%d",
> + sb->s_type->name, sb->s_id, error);
> +
> +out:
> + deactivate_locked_super(sb);
> + return error;
> +}
> +
> +#endif
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f168e72f6ca1..e5bee359e804 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2231,6 +2231,7 @@ struct file_system_type {
> #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */
> #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */
> #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
> +#define FS_AUTOFREEZE (1<<16) /* temporary as we phase kthread freezer out */
> int (*init_fs_context)(struct fs_context *);
> const struct fs_parameter_spec *parameters;
> struct dentry *(*mount) (struct file_system_type *, int,
> @@ -2306,6 +2307,19 @@ extern int user_statfs(const char __user *, struct kstatfs *);
> extern int fd_statfs(int, struct kstatfs *);
> extern int freeze_super(struct super_block *super, bool usercall);
> extern int thaw_super(struct super_block *super, bool usercall);
> +#ifdef CONFIG_PM_SLEEP
> +int fs_suspend_freeze_sb(struct super_block *sb, void *priv);
> +int fs_suspend_thaw_sb(struct super_block *sb, void *priv);
> +#else
> +static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> +{
> + return 0;
> +}
> +static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
> +{
> + return 0;
> +}
> +#endif
> extern __printf(2, 3)
> int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
> extern int super_setup_bdi(struct super_block *sb);
> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index 6c1c7e566d35..1dd6b0b6b4e5 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -140,6 +140,16 @@ int freeze_processes(void)
>
> BUG_ON(in_atomic());
>
> + pr_info("Freezing filesystems ... ");
> + error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
> + if (error) {
> + pr_cont("failed\n");
> + iterate_supers_excl(fs_suspend_thaw_sb, NULL);
> + thaw_processes();
> + return error;
> + }
> + pr_cont("done.\n");
> +
> /*
> * Now that the whole userspace is frozen we need to disable
> * the OOM killer to disallow any further interference with
> @@ -149,8 +159,10 @@ int freeze_processes(void)
> if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
> error = -EBUSY;
>
> - if (error)
> + if (error) {
> + iterate_supers_excl(fs_suspend_thaw_sb, NULL);
> thaw_processes();
> + }
> return error;
> }
>
> @@ -188,6 +200,7 @@ void thaw_processes(void)
> pm_nosig_freezing = false;
>
> oom_killer_enable();
> + iterate_supers_excl(fs_suspend_thaw_sb, NULL);
>
> pr_info("Restarting tasks ... ");
>
> --
> 2.35.1
>
On Mon, Jan 16, 2023 at 04:14:55PM +0100, Jan Kara wrote:
> So I think we may need to also block attempts to unmount frozen filesystem -
> actually GFS2 needs this as well [1].
>
> [1] lore.kernel.org/r/[email protected]
Yes, I reviewed Andreas's patch and I think we end up just complicating
things by allowing us to continue to "support" unmounting frozen
filesystems. Instead it is easier for us to just block that insanity.
Current attempt / non-boot tested or anything.
From: Luis Chamberlain <[email protected]>
Date: Sat, 6 May 2023 20:13:49 -0700
Subject: [RFC] fs: prevent mount / umount of frozen filesystems
Today you can unmount a frozen filesystem. Doing that turns it into
a zombie filesystem, you cannot shut it down until first you remounting
it and then unthawing it.
Enabling this sort of behaviour is madness.
Simplify this by instead just preventing us to unmount frozen
filesystems, and likewise prevent mounting frozen filesystems.
Suggested-by: Jan Kara <[email protected]>
Reported-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
fs/namespace.c | 3 +++
fs/super.c | 14 ++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/fs/namespace.c b/fs/namespace.c
index 54847db5b819..9c21d8662fc8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1636,6 +1636,9 @@ static int do_umount(struct mount *mnt, int flags)
if (retval)
return retval;
+ if (!(sb_is_unfrozen(sb)))
+ return -EBUSY;
+
/*
* Allow userspace to request a mountpoint be expired rather than
* unmounting unconditionally. Unmount only happens if:
diff --git a/fs/super.c b/fs/super.c
index 34afe411cf2b..55f5728f5090 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -441,6 +441,7 @@ void retire_super(struct super_block *sb)
{
WARN_ON(!sb->s_bdev);
down_write(&sb->s_umount);
+ WARN_ON_ONCE(!(sb_is_unfrozen(sb)));
if (sb->s_iflags & SB_I_PERSB_BDI) {
bdi_unregister(sb->s_bdi);
sb->s_iflags &= ~SB_I_PERSB_BDI;
@@ -468,6 +469,7 @@ void generic_shutdown_super(struct super_block *sb)
{
const struct super_operations *sop = sb->s_op;
+ WARN_ON_ONCE(!(sb_is_unfrozen(sb)));
if (sb->s_root) {
shrink_dcache_for_umount(sb);
sync_filesystem(sb);
@@ -1354,6 +1356,12 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
if (IS_ERR(s))
goto error_s;
+ if (!(sb_is_unfrozen(sb))) {
+ deactivate_locked_super(s);
+ error = -EBUSY;
+ goto error_bdev;
+ }
+
if (s->s_root) {
if ((flags ^ s->s_flags) & SB_RDONLY) {
deactivate_locked_super(s);
@@ -1473,6 +1481,10 @@ struct dentry *mount_single(struct file_system_type *fs_type,
s = sget(fs_type, compare_single, set_anon_super, flags, NULL);
if (IS_ERR(s))
return ERR_CAST(s);
+ if (!(sb_is_unfrozen(sb))) {
+ deactivate_locked_super(s);
+ return ERR_PTR(-EBUSY);
+ }
if (!s->s_root) {
error = fill_super(s, data, flags & SB_SILENT ? 1 : 0);
if (!error)
@@ -1522,6 +1534,8 @@ int vfs_get_tree(struct fs_context *fc)
sb = fc->root->d_sb;
WARN_ON(!sb->s_bdi);
+ if (!(sb_is_unfrozen(sb)))
+ return -EBUSY;
/*
* Write barrier is for super_cache_count(). We place it before setting
--
2.39.2
On Thu, Feb 23, 2023 at 07:08:37PM -0800, Darrick J. Wong wrote:
> On Fri, Jan 13, 2023 at 04:33:50PM -0800, Luis Chamberlain wrote:
> > Add support to automatically handle freezing and thawing filesystems
> > during the kernel's suspend/resume cycle.
> >
> > This is needed so that we properly really stop IO in flight without
> > races after userspace has been frozen. Without this we rely on
> > kthread freezing and its semantics are loose and error prone.
> > For instance, even though a kthread may use try_to_freeze() and end
> > up being frozen we have no way of being sure that everything that
> > has been spawned asynchronously from it (such as timers) have also
> > been stopped as well.
> >
> > A long term advantage of also adding filesystem freeze / thawing
> > supporting during suspend / hibernation is that long term we may
> > be able to eventually drop the kernel's thread freezing completely
> > as it was originally added to stop disk IO in flight as we hibernate
> > or suspend.
>
> Hooray!
>
> One evil question though --
>
> Say you have dm devices A and B. Each has a distinct fs on it.
> If you mount A and then B and initiate a suspend, that should result in
> first B and then A freezing, right?
>
> After resuming, you then change A's dm-table definition to point it
> at a loop device backed by a file on B.
>
> What happens now when you initiate a suspend? B freezes, then A tries
> to flush data to the loop-mounted file on B, but it's too late for that.
> That sounds like a deadlock?
>
> Though I don't know how much we care about this corner case,
As you suggest this is not the only corner case that one could draw
upon. There was that evil ioctl added years ago to allow flipping an
installed system bootted from a USB or ISO over to the real freshly
installed root mount point. To make this bullet-proof we'll need to
eventually add a simple graph implementation to keep tags on ordering
requirements for the super blocks. I have some C code which tries to
implement a graph Linux style but since these are all corner cases at
this time, I think it's best we fix first suspend for most and later
address a proper graph solution.
> Anyway, just wondering if you'd thought about that kind of doomsday
> scenario that a nutty sysadmin could set up.
>
> The only way I can think of to solve that kind of thing would be to hook
> filesystems and loop devices into the device model, make fs "device"
> suspend actually freeze, hope the suspend code suspends from the leaves
> inward, and hope I actually understand how the device model works (I
> don't.)
There's probably really odd things one can do, and one thing I think
we can later do is simply annotate those cases and *not* allow auto-freeze
with time for those horrible situations.
A real long term solution I think will involve a graph.
Luis