The following series is for to address bugs in the emergency thawing code, as
well as mismatcheѕ with freezing at the block layer and the superblock that
break the freeze/thaw nesting order.
The first two patches fix the emergency thaw infinite loop reported by Jeff
Merkey and the deadlock on sb->s_umount that the infinite loop hid. These may
be stable kernel candidates.
The remainder of the patches address the bdev/sb mismatch and the fact that sb
level freezing does not nest correctly. For all the places that the bdev
interfaces are used, we need a superblock anyway so we may as well make
freeze/thaw work only at the sb level. As such, this series moves all the
nesting code to the sb from the bdev level and removes the
freeze_bdev/thaw_bdev interfaces completely. It also converts the emergency
thaw to work at the superblock level such that it will now thaw manually frozen
filesystems.
A *big* outstanding problem still exists - freezing takes an active reference
to the superblock, so unmounting an frozen filesystem has some nasty and
unexpected side effects. The existing code results in an unmountable block
device:
# mount /dev/vda /mnt/test
# xfs_freeze -f /mnt/test
# umount /mnt/test
# grep test /proc/mounts
# mkfs.xfs -f -l size=128m /dev/vda
mkfs.xfs: /dev/vda contains a mounted filesystem
Usage: mkfs.xfs
....
# mount /dev/vda /mnt/test
mount: /dev/vda already mounted or /mnt/test busy
#
At this point I can't get access to /dev/vda and needs a reboot to
get it and /mnt/test back.
This patch series results in the block device being mountable, but
remains frozen across unmount/mount:
# mount /dev/vda /mnt/test
# xfs_freeze -f /mnt/test
# umount /mnt/test
# grep test /proc/mounts
# mkfs.xfs -f -l size=128m /dev/vda
mkfs.xfs: /dev/vda contains a mounted filesystem
Usage: mkfs.xfs
....
# mount /dev/vda /mnt/test
# touch /mnt/test/foo &
[1] 2647
#
# xfs_freeze -u /mnt/test
[1]+ Done sudo touch /mnt/test/foo
# umount /mnt/test
# mkfs.xfs -f -l size=128m /dev/vda
meta-data=/dev/vda isize=256 agcount=4, agsize=262144 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=1048576, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
#
This behaviour is only marginally better than the existing behaviour
(at least you can release the references). However, I don't really
like either option - we used to disallow umount on a frozen
filesystems to avoid this problem.
So What is really supposed to happen when we unmount a frozen
superblock? Should unmount return EBUSY? Should it be automatically
thawed so it doesn't affect block device behaviour after unmount?
Something else?
Cheers,
Dave.
From: Dave Chinner <[email protected]>
thaw_bdev() has re-enterency guards to allow freezes to nest
together. That is, it ensures that the filesystem is not thawed
until the last thaw command is issued. This is needed to prevent the
filesystem from being unfrozen while an existing freezer is still
operating on the filesystem in a frozen state (e.g. dm-snapshot).
Currently, freeze_super() and thaw_super() bypasses these guards,
and as a result manual freezing and unfreezing via the ioctl methods
do not nest correctly. hence mixing userspace directed freezes with
block device level freezes result in inconsistency due to premature
thawing of the filesystem.
Move the re-enterency guards to the superblock and into freeze_super
and thaw_super() so that userspace directed freezes nest correctly
again.
Signed-off-by: Dave Chinner <[email protected]>
---
fs/block_dev.c | 59 ++++----------------------------------
fs/gfs2/ops_fstype.c | 12 --------
fs/super.c | 77 ++++++++++++++++++++++++++++++-------------------
include/linux/fs.h | 8 ++---
4 files changed, 56 insertions(+), 100 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index a8c8224..84899b3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -230,71 +230,22 @@ struct super_block *freeze_bdev(struct block_device *bdev)
struct super_block *sb;
int error = 0;
- mutex_lock(&bdev->bd_fsfreeze_mutex);
- if (++bdev->bd_fsfreeze_count > 1) {
- /*
- * We don't even need to grab a reference - the first call
- * to freeze_bdev grab an active reference and only the last
- * thaw_bdev drops it.
- */
- sb = get_super(bdev);
- drop_super(sb);
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return sb;
- }
-
sb = get_active_super(bdev);
if (!sb)
goto out;
error = freeze_super(sb);
if (error) {
deactivate_super(sb);
- bdev->bd_fsfreeze_count--;
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
return ERR_PTR(error);
}
deactivate_super(sb);
out:
sync_blockdev(bdev);
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
return sb; /* thaw_bdev releases s->s_umount */
}
EXPORT_SYMBOL(freeze_bdev);
/**
- * __thaw_bdev -- unlock filesystem
- * @bdev: blockdevice to unlock
- * @sb: associated superblock
- * @emergency: emergency thaw
- *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
- */
-static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
-{
- int error = -EINVAL;
-
- mutex_lock(&bdev->bd_fsfreeze_mutex);
- if (!bdev->bd_fsfreeze_count)
- goto out;
-
- if (!sb)
- goto out;
-
- error = 0;
- if (--bdev->bd_fsfreeze_count > 0)
- goto out;
-
- if (emergency)
- error = thaw_super_emergency(sb);
- else
- error = thaw_super(sb);
- if (error)
- bdev->bd_fsfreeze_count++;
-out:
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return error;
-}
-/**
* thaw_bdev -- unlock filesystem
* @bdev: blockdevice to unlock
* @sb: associated superblock
@@ -303,13 +254,17 @@ out:
*/
int thaw_bdev(struct block_device *bdev, struct super_block *sb)
{
- return __thaw_bdev(bdev, sb, 0);
+ if (!sb)
+ return -EINVAL;
+ return thaw_super(sb);
}
EXPORT_SYMBOL(thaw_bdev);
int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
{
- return __thaw_bdev(bdev, sb, 1);
+ if (!sb)
+ return -EINVAL;
+ return thaw_super_emergency(sb);
}
static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
@@ -434,8 +389,6 @@ static void init_once(void *foo)
INIT_LIST_HEAD(&bdev->bd_holder_list);
#endif
inode_init_once(&ei->vfs_inode);
- /* Initialize mutex for freeze. */
- mutex_init(&bdev->bd_fsfreeze_mutex);
}
static inline void __bd_forget(struct inode *inode)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 3593b3a..5acc907 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1304,19 +1304,7 @@ static int gfs2_get_sb(struct file_system_type *fs_type, int flags,
if (IS_ERR(bdev))
return PTR_ERR(bdev);
- /*
- * once the super is inserted into the list by sget, s_umount
- * will protect the lockfs code from trying to start a snapshot
- * while we are mounting
- */
- mutex_lock(&bdev->bd_fsfreeze_mutex);
- if (bdev->bd_fsfreeze_count > 0) {
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- error = -EBUSY;
- goto error_bdev;
- }
s = sget(fs_type, test_gfs2_super, set_gfs2_super, bdev);
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
error = PTR_ERR(s);
if (IS_ERR(s))
goto error_bdev;
diff --git a/fs/super.c b/fs/super.c
index 76ed922..5f13431 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -95,6 +95,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
s->s_maxbytes = MAX_NON_LFS;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
+ mutex_init(&s->s_freeze_mutex);
}
out:
return s;
@@ -744,19 +745,7 @@ int get_sb_bdev(struct file_system_type *fs_type,
if (IS_ERR(bdev))
return PTR_ERR(bdev);
- /*
- * once the super is inserted into the list by sget, s_umount
- * will protect the lockfs code from trying to start a snapshot
- * while we are mounting
- */
- mutex_lock(&bdev->bd_fsfreeze_mutex);
- if (bdev->bd_fsfreeze_count > 0) {
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- error = -EBUSY;
- goto error_bdev;
- }
s = sget(fs_type, test_bdev_super, set_bdev_super, bdev);
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
if (IS_ERR(s))
goto error_s;
@@ -941,25 +930,31 @@ EXPORT_SYMBOL_GPL(vfs_kern_mount);
* @sb: the super to lock
*
* Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs. Subsequent calls to this without first thawing the fs will return
- * -EBUSY.
+ * freeze_fs. The reference counter (s_freeze_count) guarantees that only the
+ * last unfreeze process can unfreeze the frozen filesystem actually when
+ * multiple freeze requests arrive simultaneously. It counts up in
+ * freeze_super() and count down in thaw_super(). When it becomes 0,
+ * thaw_super() will execute the unfreeze.
*/
int freeze_super(struct super_block *sb)
{
- int ret;
+ int ret = 0;
atomic_inc(&sb->s_active);
down_write(&sb->s_umount);
+ mutex_lock(&sb->s_freeze_mutex);
+ if (++sb->s_freeze_count > 1)
+ goto out_deactivate;
+
if (sb->s_frozen) {
- deactivate_locked_super(sb);
- return -EBUSY;
+ ret = -EBUSY;
+ goto out_deactivate;
}
if (sb->s_flags & MS_RDONLY) {
sb->s_frozen = SB_FREEZE_TRANS;
smp_wmb();
- up_write(&sb->s_umount);
- return 0;
+ goto out_active;
}
sb->s_frozen = SB_FREEZE_WRITE;
@@ -977,12 +972,18 @@ int freeze_super(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem freeze failed\n");
sb->s_frozen = SB_UNFROZEN;
- deactivate_locked_super(sb);
- return ret;
+ goto out_deactivate;
}
}
+out_active:
up_write(&sb->s_umount);
- return 0;
+out_unlock:
+ mutex_unlock(&sb->s_freeze_mutex);
+ return ret;
+
+out_deactivate:
+ deactivate_locked_super(sb);
+ goto out_unlock;
}
EXPORT_SYMBOL(freeze_super);
@@ -993,22 +994,34 @@ EXPORT_SYMBOL(freeze_super);
*
* Unlocks the filesystem and marks it writeable again after freeze_super().
* If we are doing an emergency thaw, we don't need to grab the sb->s_umount
- * lock as it is already held.
+ * lock as it is already held. Return -EINVAL if @sb is not frozen, 0 if the
+ * unfreeze was executed and succeeded or the error if it failed. If the
+ * unfreeze fails, then leave @sb in the frozen state.
*/
static int __thaw_super(struct super_block *sb, int emergency)
{
- int error = 0;
+ int error = -EINVAL;
- if (!emergency)
+ if (!emergency) {
down_write(&sb->s_umount);
+ mutex_lock(&sb->s_freeze_mutex);
+ if (!sb->s_freeze_count)
+ goto out_unlock;
- if (sb->s_frozen == SB_UNFROZEN) {
- error = -EINVAL;
- goto out_unlock;
+ if (--sb->s_freeze_count > 0) {
+ error = 0;
+ goto out_unlock;
+ }
+ } else {
+ /* already under down_read(&sb->s_umount) */
+ mutex_lock(&sb->s_freeze_mutex);
}
+ if (sb->s_frozen == SB_UNFROZEN)
+ goto out_unlock;
+
if (sb->s_flags & MS_RDONLY)
- goto out;
+ goto out_unfreeze;
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
@@ -1020,10 +1033,11 @@ static int __thaw_super(struct super_block *sb, int emergency)
}
}
-out:
+out_unfreeze:
sb->s_frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_wait_unfrozen);
+ mutex_unlock(&sb->s_freeze_mutex);
/*
* When called from emergency scope, we cannot grab the s_umount lock
* so we cannot deactivate the superblock. This may leave unbalanced
@@ -1035,6 +1049,9 @@ out:
return 0;
out_unlock:
+ if (error && error != -EINVAL)
+ sb->s_freeze_count++;
+ mutex_unlock(&sb->s_freeze_mutex);
if (!emergency)
up_write(&sb->s_umount);
return error;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e246389..f92b077 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -672,11 +672,6 @@ struct block_device {
* care to not mess up bd_private for that case.
*/
unsigned long bd_private;
-
- /* The counter of freeze processes */
- int bd_fsfreeze_count;
- /* Mutex for freeze */
- struct mutex bd_fsfreeze_mutex;
};
/*
@@ -1382,6 +1377,9 @@ struct super_block {
* generic_show_options()
*/
char *s_options;
+
+ int s_freeze_count; /* nr of nested freezes */
+ struct mutex s_freeze_mutex; /* nesting lock */
};
extern struct timespec current_fs_time(struct super_block *sb);
--
1.7.1
From: Dave Chinner <[email protected]>
It makes no sense having the emergency thaw code in fs/buffer.c when all of
it's operations are one superblocks and the code it executes is all in
fs/super.c. Move the code there and clean it up.
Signed-off-by: Dave Chinner <[email protected]>
---
fs/buffer.c | 31 -------------------------------
fs/super.c | 37 +++++++++++++++++++++++++++++++++----
include/linux/fs.h | 1 -
3 files changed, 33 insertions(+), 36 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index b095fc1..61bd994 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -561,37 +561,6 @@ repeat:
return err;
}
-static void do_thaw_one(struct super_block *sb, void *unused)
-{
- char b[BDEVNAME_SIZE];
- while (sb->s_bdev && !thaw_super_emergency(sb))
- printk(KERN_WARNING "Emergency Thaw on %s\n",
- bdevname(sb->s_bdev, b));
-}
-
-static void do_thaw_all(struct work_struct *work)
-{
- iterate_supers(do_thaw_one, NULL);
- kfree(work);
- printk(KERN_WARNING "Emergency Thaw complete\n");
-}
-
-/**
- * emergency_thaw_all -- forcibly thaw every frozen filesystem
- *
- * Used for emergency unfreeze of all filesystems via SysRq
- */
-void emergency_thaw_all(void)
-{
- struct work_struct *work;
-
- work = kmalloc(sizeof(*work), GFP_ATOMIC);
- if (work) {
- INIT_WORK(work, do_thaw_all);
- schedule_work(work);
- }
-}
-
/**
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
diff --git a/fs/super.c b/fs/super.c
index 81a4034..680b8d5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1059,7 +1059,7 @@ out_unlock:
}
/**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
* @sb: the super to thaw
*
* Unlocks the filesystem and marks it writeable again after freeze_super().
@@ -1075,11 +1075,40 @@ EXPORT_SYMBOL(thaw_super);
* @sb: the super to thaw
*
* Unlocks the filesystem and marks it writeable again after freeze_super().
- * This avoids taking the s_umount lock if it is already held.
+ * This avoids taking the s_umount lock because it is already held.
+ */
+static void thaw_super_emergency(struct super_block *sb, void *unused)
+{
+
+ if (sb->s_bdev) {
+ char b[BDEVNAME_SIZE];
+ printk(KERN_WARNING "Emergency Thaw on %s.\n",
+ bdevname(sb->s_bdev, b));
+ }
+ while (!__thaw_super(sb, 1));
+}
+
+static void do_thaw_all(struct work_struct *work)
+{
+ iterate_supers(thaw_super_emergency, NULL);
+ kfree(work);
+ printk(KERN_WARNING "Emergency Thaw complete\n");
+}
+
+/**
+ * emergency_thaw_all -- forcibly thaw every frozen filesystem
+ *
+ * Used for emergency unfreeze of all filesystems via SysRq
*/
-int thaw_super_emergency(struct super_block *sb)
+void emergency_thaw_all(void)
{
- return __thaw_super(sb, 1);
+ struct work_struct *work;
+
+ work = kmalloc(sizeof(*work), GFP_ATOMIC);
+ if (work) {
+ INIT_WORK(work, do_thaw_all);
+ schedule_work(work);
+ }
}
static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 39bf4ac..a704062 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1801,7 +1801,6 @@ extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *,
extern int vfs_statfs(struct dentry *, struct kstatfs *);
extern int freeze_super(struct super_block *super);
extern int thaw_super(struct super_block *super);
-extern int thaw_super_emergency(struct super_block *super);
extern int current_umask(void);
--
1.7.1
From: Dave Chinner <[email protected]>
The emergency thaw process uses iterate_super() which holds the
sb->s_umount lock in read mode. The current thaw_super() code takes
the sb->s_umount lock in write mode, hence leading to an instant
deadlock.
Pass the emergency state into the thaw_bdev/thaw_super code to avoid
taking the s_umount lock in this case. We are running under the bdev
freeze mutex, so this is still serialised against freeze despite
only having a read lock on the sb->s_umount. Hence it should be safe
to execute in this manner, especially given that emergency thaw is a
rarely executed "get-out-of-jail" feature.
Signed-off-by: Dave Chinner <[email protected]>
---
fs/block_dev.c | 26 ++++++++++++++++++++--
fs/buffer.c | 2 +-
fs/super.c | 58 +++++++++++++++++++++++++++++++++++++++++++---------
include/linux/fs.h | 9 ++++++++
4 files changed, 81 insertions(+), 14 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 366ac38..a8c8224 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -262,13 +262,14 @@ struct super_block *freeze_bdev(struct block_device *bdev)
EXPORT_SYMBOL(freeze_bdev);
/**
- * thaw_bdev -- unlock filesystem
+ * __thaw_bdev -- unlock filesystem
* @bdev: blockdevice to unlock
* @sb: associated superblock
+ * @emergency: emergency thaw
*
* Unlocks the filesystem and marks it writeable again after freeze_bdev().
*/
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
{
int error = -EINVAL;
@@ -283,15 +284,34 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
if (--bdev->bd_fsfreeze_count > 0)
goto out;
- error = thaw_super(sb);
+ if (emergency)
+ error = thaw_super_emergency(sb);
+ else
+ error = thaw_super(sb);
if (error)
bdev->bd_fsfreeze_count++;
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return error;
}
+/**
+ * thaw_bdev -- unlock filesystem
+ * @bdev: blockdevice to unlock
+ * @sb: associated superblock
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ */
+int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+{
+ return __thaw_bdev(bdev, sb, 0);
+}
EXPORT_SYMBOL(thaw_bdev);
+int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
+{
+ return __thaw_bdev(bdev, sb, 1);
+}
+
static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
{
return block_write_full_page(page, blkdev_get_block, wbc);
diff --git a/fs/buffer.c b/fs/buffer.c
index d54812b..f0c55d9 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -564,7 +564,7 @@ repeat:
static void do_thaw_one(struct super_block *sb, void *unused)
{
char b[BDEVNAME_SIZE];
- while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
+ while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))
printk(KERN_WARNING "Emergency Thaw on %s\n",
bdevname(sb->s_bdev, b));
}
diff --git a/fs/super.c b/fs/super.c
index 5c35bc7..76ed922 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -987,19 +987,24 @@ int freeze_super(struct super_block *sb)
EXPORT_SYMBOL(freeze_super);
/**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
* @sb: the super to thaw
+ * @emergency: emergency thaw
*
* Unlocks the filesystem and marks it writeable again after freeze_super().
+ * If we are doing an emergency thaw, we don't need to grab the sb->s_umount
+ * lock as it is already held.
*/
-int thaw_super(struct super_block *sb)
+static int __thaw_super(struct super_block *sb, int emergency)
{
- int error;
+ int error = 0;
+
+ if (!emergency)
+ down_write(&sb->s_umount);
- down_write(&sb->s_umount);
if (sb->s_frozen == SB_UNFROZEN) {
- up_write(&sb->s_umount);
- return -EINVAL;
+ error = -EINVAL;
+ goto out_unlock;
}
if (sb->s_flags & MS_RDONLY)
@@ -1011,8 +1016,7 @@ int thaw_super(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
sb->s_frozen = SB_FREEZE_TRANS;
- up_write(&sb->s_umount);
- return error;
+ goto out_unlock;
}
}
@@ -1020,12 +1024,46 @@ out:
sb->s_frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_wait_unfrozen);
- deactivate_locked_super(sb);
-
+ /*
+ * When called from emergency scope, we cannot grab the s_umount lock
+ * so we cannot deactivate the superblock. This may leave unbalanced
+ * superblock references which could prevent unmount, but given this is
+ * an emergency operation....
+ */
+ if (!emergency)
+ deactivate_locked_super(sb);
return 0;
+
+out_unlock:
+ if (!emergency)
+ up_write(&sb->s_umount);
+ return error;
+}
+
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+ return __thaw_super(sb, 0);
}
EXPORT_SYMBOL(thaw_super);
+/**
+ * thaw_super_emergency -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * This avoids taking the s_umount lock if it is already held.
+ */
+int thaw_super_emergency(struct super_block *sb)
+{
+ return __thaw_super(sb, 1);
+}
+
static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)
{
int err;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 471e1ff..e246389 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1803,6 +1803,7 @@ extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *,
extern int vfs_statfs(struct dentry *, struct kstatfs *);
extern int freeze_super(struct super_block *super);
extern int thaw_super(struct super_block *super);
+extern int thaw_super_emergency(struct super_block *super);
extern int current_umask(void);
@@ -1953,6 +1954,8 @@ extern int sync_blockdev(struct block_device *bdev);
extern struct super_block *freeze_bdev(struct block_device *);
extern void emergency_thaw_all(void);
extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
+extern int thaw_bdev_emergency(struct block_device *bdev,
+ struct super_block *sb);
extern int fsync_bdev(struct block_device *);
#else
static inline void bd_forget(struct inode *inode) {}
@@ -1968,6 +1971,12 @@ static inline int thaw_bdev(struct block_device *bdev, struct super_block *sb)
{
return 0;
}
+
+static inline int thaw_bdev_emergency(struct block_device *bdev,
+ struct super_block *sb)
+{
+ return 0;
+}
#endif
extern int sync_filesystem(struct super_block *);
extern const struct file_operations def_blk_fops;
--
1.7.1
From: Dave Chinner <[email protected]>
The thawing of a filesystem through sysrq-j loops infinitely as it
incorrectly detects a thawed filesytsem as frozen and tries to
unfreeze repeatedly. This is a regression caused by
4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
reference to frozen superblocks") in that it no longer returned
-EINVAL for superblocks that were not frozen.
Return -EINVAL when the filesystem is already unfrozen to avoid this
problem.
Signed-off-by: Dave Chinner <[email protected]>
---
fs/block_dev.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7346c96..366ac38 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -276,22 +276,19 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
if (!bdev->bd_fsfreeze_count)
goto out;
- error = 0;
- if (--bdev->bd_fsfreeze_count > 0)
+ if (!sb)
goto out;
- if (!sb)
+ error = 0;
+ if (--bdev->bd_fsfreeze_count > 0)
goto out;
error = thaw_super(sb);
- if (error) {
+ if (error)
bdev->bd_fsfreeze_count++;
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return error;
- }
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return 0;
+ return error;
}
EXPORT_SYMBOL(thaw_bdev);
--
1.7.1
From: Dave Chinner <[email protected]>
freeze/thaw_bdev are now just trivial wrappers around
freeze/thaw_super(). Convert all users of the bdev interfaces to use
the superblock interfaces instead, and remove the bdev interfaces.
Signed-off-by: Dave Chinner <[email protected]>
---
drivers/md/dm.c | 13 +++++++----
fs/block_dev.c | 54 ----------------------------------------------------
fs/buffer.c | 2 +-
fs/super.c | 1 +
fs/xfs/xfs_fsops.c | 10 ++------
include/linux/fs.h | 19 ------------------
6 files changed, 13 insertions(+), 86 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d21e128..70d5fd6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2427,15 +2427,18 @@ static int lock_fs(struct mapped_device *md)
WARN_ON(md->frozen_sb);
- md->frozen_sb = freeze_bdev(md->bdev);
- if (IS_ERR(md->frozen_sb)) {
- r = PTR_ERR(md->frozen_sb);
+ md->frozen_sb = get_active_super(md->bdev);
+ if (!md->frozen_sb)
+ return -EIO;
+ r = freeze_super(md->frozen_sb);
+ if (r) {
+ deactivate_super(md->frozen_sb);
md->frozen_sb = NULL;
return r;
}
+ deactivate_super(md->frozen_sb);
set_bit(DMF_FROZEN, &md->flags);
-
return 0;
}
@@ -2444,7 +2447,7 @@ static void unlock_fs(struct mapped_device *md)
if (!test_bit(DMF_FROZEN, &md->flags))
return;
- thaw_bdev(md->bdev, md->frozen_sb);
+ thaw_super(md->frozen_sb);
md->frozen_sb = NULL;
clear_bit(DMF_FROZEN, &md->flags);
}
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 84899b3..3c3d1fe 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -213,60 +213,6 @@ int fsync_bdev(struct block_device *bdev)
}
EXPORT_SYMBOL(fsync_bdev);
-/**
- * freeze_bdev -- lock a filesystem and force it into a consistent state
- * @bdev: blockdevice to lock
- *
- * If a superblock is found on this device, we take the s_umount semaphore
- * on it to make sure nobody unmounts until the snapshot creation is done.
- * The reference counter (bd_fsfreeze_count) guarantees that only the last
- * unfreeze process can unfreeze the frozen filesystem actually when multiple
- * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
- * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
- * actually.
- */
-struct super_block *freeze_bdev(struct block_device *bdev)
-{
- struct super_block *sb;
- int error = 0;
-
- sb = get_active_super(bdev);
- if (!sb)
- goto out;
- error = freeze_super(sb);
- if (error) {
- deactivate_super(sb);
- return ERR_PTR(error);
- }
- deactivate_super(sb);
- out:
- sync_blockdev(bdev);
- return sb; /* thaw_bdev releases s->s_umount */
-}
-EXPORT_SYMBOL(freeze_bdev);
-
-/**
- * thaw_bdev -- unlock filesystem
- * @bdev: blockdevice to unlock
- * @sb: associated superblock
- *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
- */
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
-{
- if (!sb)
- return -EINVAL;
- return thaw_super(sb);
-}
-EXPORT_SYMBOL(thaw_bdev);
-
-int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
-{
- if (!sb)
- return -EINVAL;
- return thaw_super_emergency(sb);
-}
-
static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
{
return block_write_full_page(page, blkdev_get_block, wbc);
diff --git a/fs/buffer.c b/fs/buffer.c
index f0c55d9..b095fc1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -564,7 +564,7 @@ repeat:
static void do_thaw_one(struct super_block *sb, void *unused)
{
char b[BDEVNAME_SIZE];
- while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))
+ while (sb->s_bdev && !thaw_super_emergency(sb))
printk(KERN_WARNING "Emergency Thaw on %s\n",
bdevname(sb->s_bdev, b));
}
diff --git a/fs/super.c b/fs/super.c
index 5f13431..81a4034 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -976,6 +976,7 @@ int freeze_super(struct super_block *sb)
}
}
out_active:
+ sync_blockdev(sb->s_bdev);
up_write(&sb->s_umount);
out_unlock:
mutex_unlock(&sb->s_freeze_mutex);
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 37a6f62..cc993a5 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -642,16 +642,12 @@ xfs_fs_goingdown(
__uint32_t inflags)
{
switch (inflags) {
- case XFS_FSOP_GOING_FLAGS_DEFAULT: {
- struct super_block *sb = freeze_bdev(mp->m_super->s_bdev);
-
- if (sb && !IS_ERR(sb)) {
+ case XFS_FSOP_GOING_FLAGS_DEFAULT:
+ if (!freeze_super(mp->m_super)) {
xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
- thaw_bdev(sb->s_bdev, sb);
+ thaw_super(mp->m_super);
}
-
break;
- }
case XFS_FSOP_GOING_FLAGS_LOGFLUSH:
xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
break;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f92b077..39bf4ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1951,30 +1951,11 @@ extern void invalidate_bdev(struct block_device *);
extern int sync_blockdev(struct block_device *bdev);
extern struct super_block *freeze_bdev(struct block_device *);
extern void emergency_thaw_all(void);
-extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
-extern int thaw_bdev_emergency(struct block_device *bdev,
- struct super_block *sb);
extern int fsync_bdev(struct block_device *);
#else
static inline void bd_forget(struct inode *inode) {}
static inline int sync_blockdev(struct block_device *bdev) { return 0; }
static inline void invalidate_bdev(struct block_device *bdev) {}
-
-static inline struct super_block *freeze_bdev(struct block_device *sb)
-{
- return NULL;
-}
-
-static inline int thaw_bdev(struct block_device *bdev, struct super_block *sb)
-{
- return 0;
-}
-
-static inline int thaw_bdev_emergency(struct block_device *bdev,
- struct super_block *sb)
-{
- return 0;
-}
#endif
extern int sync_filesystem(struct super_block *);
extern const struct file_operations def_blk_fops;
--
1.7.1
On Thu, Jun 10, 2010 at 05:19:49PM +1000, Dave Chinner wrote:
> The following series is for to address bugs in the emergency thawing code, as
> well as mismatcheѕ with freezing at the block layer and the superblock that
> break the freeze/thaw nesting order.
>
> The first two patches fix the emergency thaw infinite loop reported by Jeff
> Merkey and the deadlock on sb->s_umount that the infinite loop hid. These may
> be stable kernel candidates.
>
> The remainder of the patches address the bdev/sb mismatch and the fact that sb
> level freezing does not nest correctly. For all the places that the bdev
> interfaces are used, we need a superblock anyway so we may as well make
> freeze/thaw work only at the sb level. As such, this series moves all the
> nesting code to the sb from the bdev level and removes the
> freeze_bdev/thaw_bdev interfaces completely. It also converts the emergency
> thaw to work at the superblock level such that it will now thaw manually frozen
> filesystems.
>
> A *big* outstanding problem still exists - freezing takes an active reference
> to the superblock, so unmounting an frozen filesystem has some nasty and
> unexpected side effects. The existing code results in an unmountable block
> device:
>
> # mount /dev/vda /mnt/test
> # xfs_freeze -f /mnt/test
> # umount /mnt/test
> # grep test /proc/mounts
> # mkfs.xfs -f -l size=128m /dev/vda
> mkfs.xfs: /dev/vda contains a mounted filesystem
> Usage: mkfs.xfs
> ....
> # mount /dev/vda /mnt/test
> mount: /dev/vda already mounted or /mnt/test busy
> #
>
> At this point I can't get access to /dev/vda and needs a reboot to
> get it and /mnt/test back.
>
> This patch series results in the block device being mountable, but
> remains frozen across unmount/mount:
>
> # mount /dev/vda /mnt/test
> # xfs_freeze -f /mnt/test
> # umount /mnt/test
> # grep test /proc/mounts
> # mkfs.xfs -f -l size=128m /dev/vda
> mkfs.xfs: /dev/vda contains a mounted filesystem
> Usage: mkfs.xfs
> ....
> # mount /dev/vda /mnt/test
> # touch /mnt/test/foo &
> [1] 2647
> #
> # xfs_freeze -u /mnt/test
> [1]+ Done sudo touch /mnt/test/foo
> # umount /mnt/test
> # mkfs.xfs -f -l size=128m /dev/vda
> meta-data=/dev/vda isize=256 agcount=4, agsize=262144 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=1048576, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal log bsize=4096 blocks=32768, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> #
>
> This behaviour is only marginally better than the existing behaviour
> (at least you can release the references). However, I don't really
> like either option - we used to disallow umount on a frozen
> filesystems to avoid this problem.
>
> So What is really supposed to happen when we unmount a frozen
> superblock? Should unmount return EBUSY? Should it be automatically
> thawed so it doesn't affect block device behaviour after unmount?
> Something else?
>
My vote is for EBUSY, that way we don't get this weird unexpected behavior
thing. Plus if dm is doing a snapshot or whatever, unfreezing the fs to let it
unmount in the middle of it doing its thing could be unfun. Thanks,
Josef
On Thu, Jun 10, 2010 at 05:19:50PM +1000, Dave Chinner wrote:
> Return -EINVAL when the filesystem is already unfrozen to avoid this
> problem.
This includes some additional changes in addition to the description,
and at least one of them seems incorrect.
> - error = 0;
> - if (--bdev->bd_fsfreeze_count > 0)
> + if (!sb)
> goto out;
>
> - if (!sb)
> + error = 0;
> + if (--bdev->bd_fsfreeze_count > 0)
> goto out;
Here you reorder the sb check to be before the counter decrement. But
we do support calling freeze_bdev on a device without a superblock, and
you would leak bd_fsfreeze_count for that case and wrongly return
-EINVAL on unthaw for these now.
> error = thaw_super(sb);
> - if (error) {
> + if (error)
> bdev->bd_fsfreeze_count++;
> - mutex_unlock(&bdev->bd_fsfreeze_mutex);
> - return error;
> - }
Ok, useful cleanup.
> out:
> mutex_unlock(&bdev->bd_fsfreeze_mutex);
> - return 0;
> + return error;
And this is the actual fix of course, also looks good.
On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> The emergency thaw process uses iterate_super() which holds the
> sb->s_umount lock in read mode. The current thaw_super() code takes
> the sb->s_umount lock in write mode, hence leading to an instant
> deadlock.
>
> Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> taking the s_umount lock in this case. We are running under the bdev
> freeze mutex, so this is still serialised against freeze despite
> only having a read lock on the sb->s_umount. Hence it should be safe
> to execute in this manner, especially given that emergency thaw is a
> rarely executed "get-out-of-jail" feature.
This is correct as long as no one calls thaw_super directly, which
is not the case currently.
This breaks the "feature" that we can freeze a block device that doesn't
have a filesystem mounted yet. For filesystems using get_sb_bdev that
prevents a new filesystem to be mounted on them.
I'm not sure it's a particularly useful feature, but it's been there
since day 1 of the freeze support. The easiest way to not break it
would be to keep the per-sb freeze count only for that case and only
check it during mount.
On Thu, Jun 10, 2010 at 05:19:53PM +1000, Dave Chinner wrote:
> From: Dave Chinner <[email protected]>
>
> freeze/thaw_bdev are now just trivial wrappers around
> freeze/thaw_super(). Convert all users of the bdev interfaces to use
> the superblock interfaces instead, and remove the bdev interfaces.
This requires exporting get_active_super. Buit if we want to keep suppoting
freezing of unmounted block devices we need to keep the helpers anyway.
On Thu, Jun 10, 2010 at 05:19:54PM +1000, Dave Chinner wrote:
> From: Dave Chinner <[email protected]>
>
> It makes no sense having the emergency thaw code in fs/buffer.c when all of
> it's operations are one superblocks and the code it executes is all in
> fs/super.c. Move the code there and clean it up.
> +static void thaw_super_emergency(struct super_block *sb, void *unused)
> +{
> +
> + if (sb->s_bdev) {
> + char b[BDEVNAME_SIZE];
> + printk(KERN_WARNING "Emergency Thaw on %s.\n",
> + bdevname(sb->s_bdev, b));
> + }
> + while (!__thaw_super(sb, 1));
Please move the ; to a separate line to make the loop better redable.
On Mon, Jun 14, 2010 at 11:18:15AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 05:19:50PM +1000, Dave Chinner wrote:
> > Return -EINVAL when the filesystem is already unfrozen to avoid this
> > problem.
>
>
> This includes some additional changes in addition to the description,
> and at least one of them seems incorrect.
>
> > - error = 0;
> > - if (--bdev->bd_fsfreeze_count > 0)
> > + if (!sb)
> > goto out;
> >
> > - if (!sb)
> > + error = 0;
> > + if (--bdev->bd_fsfreeze_count > 0)
> > goto out;
>
> Here you reorder the sb check to be before the counter decrement. But
> we do support calling freeze_bdev on a device without a superblock, and
> you would leak bd_fsfreeze_count for that case and wrongly return
> -EINVAL on unthaw for these now.
Ok, will fix it.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Mon, Jun 14, 2010 at 11:20:11AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> > The emergency thaw process uses iterate_super() which holds the
> > sb->s_umount lock in read mode. The current thaw_super() code takes
> > the sb->s_umount lock in write mode, hence leading to an instant
> > deadlock.
> >
> > Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> > taking the s_umount lock in this case. We are running under the bdev
> > freeze mutex, so this is still serialised against freeze despite
> > only having a read lock on the sb->s_umount. Hence it should be safe
> > to execute in this manner, especially given that emergency thaw is a
> > rarely executed "get-out-of-jail" feature.
>
> This is correct as long as no one calls thaw_super directly, which
> is not the case currently.
Yeah, the idea of the first two patches is that they can be applied
to a current tree and backported and prevent the infinite loop or
deadlock. The problem of thaw_bdev/thaw_super is what the rest of
the patches are supposed to address.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Mon, Jun 14, 2010 at 11:22:19AM -0400, Christoph Hellwig wrote:
> This breaks the "feature" that we can freeze a block device that doesn't
> have a filesystem mounted yet. For filesystems using get_sb_bdev that
> prevents a new filesystem to be mounted on them.
>
> I'm not sure it's a particularly useful feature, but it's been there
> since day 1 of the freeze support. The easiest way to not break it
> would be to keep the per-sb freeze count only for that case and only
> check it during mount.
You mean the per-bdev freeze count, right? So freeze/thaw_bdev would
have to remain?
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Tue, Jun 15, 2010 at 10:01:07AM +1000, Dave Chinner wrote:
> On Mon, Jun 14, 2010 at 11:22:19AM -0400, Christoph Hellwig wrote:
> > This breaks the "feature" that we can freeze a block device that doesn't
> > have a filesystem mounted yet. For filesystems using get_sb_bdev that
> > prevents a new filesystem to be mounted on them.
> >
> > I'm not sure it's a particularly useful feature, but it's been there
> > since day 1 of the freeze support. The easiest way to not break it
> > would be to keep the per-sb freeze count only for that case and only
> > check it during mount.
>
> You mean the per-bdev freeze count, right? So freeze/thaw_bdev would
> have to remain?
Yes.
On Mon, Jun 14, 2010 at 11:20:11AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> > The emergency thaw process uses iterate_super() which holds the
> > sb->s_umount lock in read mode. The current thaw_super() code takes
> > the sb->s_umount lock in write mode, hence leading to an instant
> > deadlock.
> >
> > Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> > taking the s_umount lock in this case. We are running under the bdev
> > freeze mutex, so this is still serialised against freeze despite
> > only having a read lock on the sb->s_umount. Hence it should be safe
> > to execute in this manner, especially given that emergency thaw is a
> > rarely executed "get-out-of-jail" feature.
>
> This is correct as long as no one calls thaw_super directly, which
> is not the case currently.
This patch doesn't try to deal with the bdev/super mismatches; all
it does is prevent an obvious deadlock. Calling freeze/thaw_super
directly will serialise on the s_umount lock, calling
freeze/thaw_bdev() will serialise on the bdev freeze mutex, and if
we mix the two they'll serialise on the s_umount lock. So I think
with this patch serialisation will still occur correctly but avoid
the current deadlock.
I'll change the commit message to explain this better.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Mon, Jun 21, 2010 at 11:57:31AM +1000, Dave Chinner wrote:
> This patch doesn't try to deal with the bdev/super mismatches; all
> it does is prevent an obvious deadlock. Calling freeze/thaw_super
> directly will serialise on the s_umount lock, calling
> freeze/thaw_bdev() will serialise on the bdev freeze mutex, and if
> we mix the two they'll serialise on the s_umount lock. So I think
> with this patch serialisation will still occur correctly but avoid
> the current deadlock.
>
> I'll change the commit message to explain this better.
I don;t think the explanation alone is enough.
Right now thaw_super itself is only serialized by exclusive shared
s_umount. thaw_bdev it also serialized by bd_fsfreeze_mutex, but
there are callers of thaw_super that do not go through thaw_bdev, so
our locking is not enough here.