Here's version 2 of providing the subvolume name and ID in /proc/mounts.
It turns out that getting the name of a subvolume reliably is a bit
trickier than it would seem because of how mounting subvolumes by ID is
implemented. In particular, in that case, the dentry we get for the root
of the mount is not necessarily attached to the dentry tree, which means
that the obvious solution of just dumping the dentry does not work. The
solution I put together makes the tradeoff of churning a bit more code
in order to avoid implementing this with weird hacks.
Changes from v1 (https://lkml.org/lkml/2015/4/8/16):
- Put subvol= last in show_options
- Change commit log to remove comment about userspace having no way to
know which subvolume is mounted, as David pointed out you can use
btrfs inspect-internal rootid <mountpoint>
- Split up patch 2
- Minor coding style fixes
This still applies to v4.0-rc7. Tested manually and with the script
below (updated from v1).
Thanks!
Omar Sandoval (6):
Btrfs: lock superblock before remounting for rw subvol
Btrfs: remove all subvol options before mounting top-level
Btrfs: clean up error handling in mount_subvol()
Btrfs: fail on mismatched subvol and subvolid mount options
Btrfs: unify subvol= and subvolid= mounting
Btrfs: show subvol= and subvolid= in /proc/mounts
fs/btrfs/super.c | 376 ++++++++++++++++++++++++++++++++++++-------------------
fs/seq_file.c | 1 +
2 files changed, 251 insertions(+), 126 deletions(-)
Testing script:
----
#!/bin/sh
set -e
check_subvol () {
NAME="$1"
ID="$2"
# Mount by name.
mount -osubvol="$NAME" /dev/vdb /mnt
if ! findmnt -fno OPTIONS /mnt | grep -F ",subvolid=$ID,subvol=$NAME"; then
echo "Failed $NAME" >&2
umount /mnt
exit 1
fi
umount /mnt
# Mount by ID.
mount -osubvolid="$ID" /dev/vdb /mnt
if ! findmnt -fno OPTIONS /mnt | grep -F ",subvolid=$ID,subvol=$NAME"; then
echo "Failed $ID" >&2
umount /mnt
exit 1
fi
umount /mnt
}
check_default_subvol () {
NAME="$1"
ID="$2"
mount /dev/vdb /mnt
if ! findmnt -fno OPTIONS /mnt | grep -F ",subvolid=$ID,subvol=$NAME"; then
echo "Failed default" >&2
umount /mnt
exit 1
fi
umount /mnt
}
mkfs.btrfs -f /dev/vdb
mount /dev/vdb /mnt
btrfs subvolume create /mnt/vol
btrfs subvolume create /mnt/vol/nestedvol
mkdir /mnt/dir
btrfs subvolume create /mnt/dir/dirvol
btrfs subvolume create /mnt/dir/dirvol/nesteddirvol
mkdir /mnt/vol/voldir
btrfs subvolume create /mnt/vol/voldir/voldirvol
btrfs subvolume list /mnt
umount /mnt
check_subvol /vol 257
check_subvol /vol/nestedvol 258
check_subvol /dir/dirvol 259
check_subvol /dir/dirvol/nesteddirvol 260
check_subvol /vol/voldir/voldirvol 261
check_default_subvol / 5
mount /dev/vdb /mnt
btrfs subvolume set-default 257 /mnt
umount /mnt
check_default_subvol /vol 257
if mount -osubvolid=258,subvol=/vol /dev/vdb /mnt 2>/dev/null; then
umount /mnt
echo "Mount of mismatched subvol and subvolid should have failed" >&2
exit 1
fi
----
--
2.3.5
Since commit 0723a0473fb4 ("btrfs: allow mounting btrfs subvolumes with
different ro/rw options"), when mounting a subvolume read/write when
another subvolume has previously been mounted read-only, we first do a
remount. However, this should be done with the superblock locked, as per
sync_filesystem():
/*
* We need to be protected against the filesystem going from
* r/o to r/w or vice versa.
*/
WARN_ON(!rwsem_is_locked(&sb->s_umount));
This WARN_ON can easily be hit with:
mkfs.btrfs -f /dev/vdb
mount /dev/vdb /mnt
btrfs subvol create /mnt/vol1
btrfs subvol create /mnt/vol2
umount /mnt
mount -oro,subvol=/vol1 /dev/vdb /mnt
mount -orw,subvol=/vol2 /dev/vdb /mnt2
Fixes: 0723a0473fb4 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options")
Signed-off-by: Omar Sandoval <[email protected]>
---
fs/btrfs/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 05fef19..d38be09 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1205,7 +1205,9 @@ static struct dentry *mount_subvol(const char *subvol_name, int flags,
return ERR_CAST(mnt);
}
+ down_write(&mnt->mnt_sb->s_umount);
r = btrfs_remount(mnt->mnt_sb, &flags, NULL);
+ up_write(&mnt->mnt_sb->s_umount);
if (r < 0) {
/* FIXME: release vfsmount mnt ??*/
kfree(newargs);
--
2.3.5
Currently, setup_root_args() substitutes 's/subvol=[^,]*/subvolid=0/'.
But, this means that if the user passes both a subvol and subvolid for
some reason, we won't actually mount the top-level when we recursively
mount. For example, consider:
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt
btrfs subvol create /mnt/subvol1 # subvolid=257
btrfs subvol create /mnt/subvol2 # subvolid=258
umount /mnt
mount -osubvol=/subvol1,subvolid=258 /dev/sdb /mnt
In the final mount, subvol=/subvol1,subvolid=258 becomes
subvolid=0,subvolid=258, and the last option takes precedence, so we
mount subvol2 and try to look up subvol1 inside of it, which fails.
So, instead, do a thorough scan through the argument list and remove any
subvol= and subvolid= options, then append subvolid=0 to the end. This
implicitly makes subvol= take precedence over subvolid=, but we're about
to add a stricter check for that. This also makes setup_root_args() more
generic, which we'll need soon.
Signed-off-by: Omar Sandoval <[email protected]>
---
fs/btrfs/super.c | 56 ++++++++++++++++++++------------------------------------
1 file changed, 20 insertions(+), 36 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d38be09..bac3c9a 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1129,52 +1129,36 @@ static inline int is_subvolume_inode(struct inode *inode)
}
/*
- * This will strip out the subvol=%s argument for an argument string and add
- * subvolid=0 to make sure we get the actual tree root for path walking to the
- * subvol we want.
+ * This will add subvolid=0 to the argument string while removing any subvol=
+ * and subvolid= arguments to make sure we get the top-level root for path
+ * walking to the subvol we want.
*/
static char *setup_root_args(char *args)
{
- unsigned len = strlen(args) + 2 + 1;
- char *src, *dst, *buf;
+ char *buf, *dst, *sep;
- /*
- * We need the same args as before, but with this substitution:
- * s!subvol=[^,]+!subvolid=0!
- *
- * Since the replacement string is up to 2 bytes longer than the
- * original, allocate strlen(args) + 2 + 1 bytes.
- */
-
- src = strstr(args, "subvol=");
- /* This shouldn't happen, but just in case.. */
- if (!src)
- return NULL;
+ if (!args)
+ return kstrdup("subvolid=0", GFP_NOFS);
- buf = dst = kmalloc(len, GFP_NOFS);
+ /* The worst case is that we add ",subvolid=0" to the end. */
+ buf = dst = kmalloc(strlen(args) + strlen(",subvolid=0") + 1, GFP_NOFS);
if (!buf)
return NULL;
- /*
- * If the subvol= arg is not at the start of the string,
- * copy whatever precedes it into buf.
- */
- if (src != args) {
- *src++ = '\0';
- strcpy(buf, args);
- dst += strlen(args);
+ while (1) {
+ sep = strchrnul(args, ',');
+ if (!strstarts(args, "subvol=") &&
+ !strstarts(args, "subvolid=")) {
+ memcpy(dst, args, sep - args);
+ dst += sep - args;
+ *dst++ = ',';
+ }
+ if (*sep)
+ args = sep + 1;
+ else
+ break;
}
-
strcpy(dst, "subvolid=0");
- dst += strlen("subvolid=0");
-
- /*
- * If there is a "," after the original subvol=... string,
- * copy that suffix into our buffer. Otherwise, we're done.
- */
- src = strchr(src, ',');
- if (src)
- strcpy(dst, src);
return buf;
}
--
2.3.5
In preparation for new functionality in mount_subvol(), give it
ownership of subvol_name and tidy up the error paths.
Signed-off-by: Omar Sandoval <[email protected]>
---
fs/btrfs/super.c | 61 ++++++++++++++++++++++++++++++--------------------------
1 file changed, 33 insertions(+), 28 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index bac3c9a..ab100e5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1167,55 +1167,61 @@ static struct dentry *mount_subvol(const char *subvol_name, int flags,
const char *device_name, char *data)
{
struct dentry *root;
- struct vfsmount *mnt;
+ struct vfsmount *mnt = NULL;
char *newargs;
+ int ret;
newargs = setup_root_args(data);
- if (!newargs)
- return ERR_PTR(-ENOMEM);
- mnt = vfs_kern_mount(&btrfs_fs_type, flags, device_name,
- newargs);
+ if (!newargs) {
+ root = ERR_PTR(-ENOMEM);
+ goto out;
+ }
- if (PTR_RET(mnt) == -EBUSY) {
+ mnt = vfs_kern_mount(&btrfs_fs_type, flags, device_name, newargs);
+ if (PTR_ERR_OR_ZERO(mnt) == -EBUSY) {
if (flags & MS_RDONLY) {
- mnt = vfs_kern_mount(&btrfs_fs_type, flags & ~MS_RDONLY, device_name,
- newargs);
+ mnt = vfs_kern_mount(&btrfs_fs_type, flags & ~MS_RDONLY,
+ device_name, newargs);
} else {
- int r;
- mnt = vfs_kern_mount(&btrfs_fs_type, flags | MS_RDONLY, device_name,
- newargs);
+ mnt = vfs_kern_mount(&btrfs_fs_type, flags | MS_RDONLY,
+ device_name, newargs);
if (IS_ERR(mnt)) {
- kfree(newargs);
- return ERR_CAST(mnt);
+ root = ERR_CAST(mnt);
+ mnt = NULL;
+ goto out;
}
down_write(&mnt->mnt_sb->s_umount);
- r = btrfs_remount(mnt->mnt_sb, &flags, NULL);
+ ret = btrfs_remount(mnt->mnt_sb, &flags, NULL);
up_write(&mnt->mnt_sb->s_umount);
- if (r < 0) {
- /* FIXME: release vfsmount mnt ??*/
- kfree(newargs);
- return ERR_PTR(r);
+ if (ret < 0) {
+ root = ERR_PTR(ret);
+ goto out;
}
}
}
-
- kfree(newargs);
-
- if (IS_ERR(mnt))
- return ERR_CAST(mnt);
+ if (IS_ERR(mnt)) {
+ root = ERR_CAST(mnt);
+ mnt = NULL;
+ goto out;
+ }
root = mount_subtree(mnt, subvol_name);
+ /* mount_subtree() drops our reference on the vfsmount. */
+ mnt = NULL;
if (!IS_ERR(root) && !is_subvolume_inode(root->d_inode)) {
struct super_block *s = root->d_sb;
dput(root);
root = ERR_PTR(-EINVAL);
deactivate_locked_super(s);
- printk(KERN_ERR "BTRFS: '%s' is not a valid subvolume\n",
- subvol_name);
+ pr_err("BTRFS: '%s' is not a valid subvolume\n", subvol_name);
}
+out:
+ mntput(mnt);
+ kfree(newargs);
+ kfree(subvol_name);
return root;
}
@@ -1301,9 +1307,8 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
}
if (subvol_name) {
- root = mount_subvol(subvol_name, flags, device_name, data);
- kfree(subvol_name);
- return root;
+ /* mount_subvol() will free subvol_name. */
+ return mount_subvol(subvol_name, flags, device_name, data);
}
security_init_mnt_opts(&new_sec_opts);
--
2.3.5
There's nothing to stop a user from passing both subvol= and subvolid=
to mount, but if they don't refer to the same subvolume, someone is
going to be surprised at some point. Error out on this case, but allow
users to pass in both if they do match (which they could, for example,
get out of /proc/mounts).
Signed-off-by: Omar Sandoval <[email protected]>
---
fs/btrfs/super.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index ab100e5..20b470d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1163,8 +1163,9 @@ static char *setup_root_args(char *args)
return buf;
}
-static struct dentry *mount_subvol(const char *subvol_name, int flags,
- const char *device_name, char *data)
+static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
+ int flags, const char *device_name,
+ char *data)
{
struct dentry *root;
struct vfsmount *mnt = NULL;
@@ -1210,12 +1211,26 @@ static struct dentry *mount_subvol(const char *subvol_name, int flags,
/* mount_subtree() drops our reference on the vfsmount. */
mnt = NULL;
- if (!IS_ERR(root) && !is_subvolume_inode(root->d_inode)) {
+ if (!IS_ERR(root)) {
struct super_block *s = root->d_sb;
- dput(root);
- root = ERR_PTR(-EINVAL);
- deactivate_locked_super(s);
- pr_err("BTRFS: '%s' is not a valid subvolume\n", subvol_name);
+ u64 root_objectid = BTRFS_I(root->d_inode)->root->root_key.objectid;
+
+ ret = 0;
+ if (!is_subvolume_inode(root->d_inode)) {
+ pr_err("BTRFS: '%s' is not a valid subvolume\n",
+ subvol_name);
+ ret = -EINVAL;
+ }
+ if (subvol_objectid && root_objectid != subvol_objectid) {
+ pr_err("BTRFS: subvol '%s' does not match subvolid %llu\n",
+ subvol_name, subvol_objectid);
+ ret = -EINVAL;
+ }
+ if (ret) {
+ dput(root);
+ root = ERR_PTR(ret);
+ deactivate_locked_super(s);
+ }
}
out:
@@ -1308,7 +1323,8 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
if (subvol_name) {
/* mount_subvol() will free subvol_name. */
- return mount_subvol(subvol_name, flags, device_name, data);
+ return mount_subvol(subvol_name, subvol_objectid, flags,
+ device_name, data);
}
security_init_mnt_opts(&new_sec_opts);
--
2.3.5
Currently, mounting a subvolume with subvolid= takes a different code
path than mounting with subvol=. This isn't really a big deal except for
the fact that mounts done with subvolid= or the default subvolume don't
have a dentry that's connected to the dentry tree like in the subvol=
case. To unify the code paths, when given subvolid= or using the default
subvolume ID, translate it into a subvolume name by walking
ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees.
Signed-off-by: Omar Sandoval <[email protected]>
---
fs/btrfs/super.c | 229 +++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 171 insertions(+), 58 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 20b470d..80a8047 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -841,33 +841,153 @@ out:
return error;
}
-static struct dentry *get_default_root(struct super_block *sb,
- u64 subvol_objectid)
+static char *get_subvol_name_from_objectid(struct btrfs_fs_info *fs_info,
+ u64 subvol_objectid)
{
- struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_root *root = fs_info->tree_root;
- struct btrfs_root *new_root;
- struct btrfs_dir_item *di;
- struct btrfs_path *path;
- struct btrfs_key location;
- struct inode *inode;
- u64 dir_id;
- int new = 0;
+ struct btrfs_root *fs_root;
+ struct btrfs_root_ref *root_ref;
+ struct btrfs_inode_ref *inode_ref;
+ struct btrfs_key key;
+ struct btrfs_path *path = NULL;
+ char *name = NULL, *ptr;
+ u64 dirid;
+ int len;
+ int ret;
+
+ path = btrfs_alloc_path();
+ if (!path) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ path->leave_spinning = 1;
+
+ name = kmalloc(PATH_MAX, GFP_NOFS);
+ if (!name) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ ptr = name + PATH_MAX - 1;
+ ptr[0] = '\0';
/*
- * We have a specific subvol we want to mount, just setup location and
- * go look up the root.
+ * Walk up the subvolume trees in the tree of tree roots by root
+ * backrefs until we hit the top-level subvolume.
*/
- if (subvol_objectid) {
- location.objectid = subvol_objectid;
- location.type = BTRFS_ROOT_ITEM_KEY;
- location.offset = (u64)-1;
- goto find_root;
+ while (subvol_objectid != BTRFS_FS_TREE_OBJECTID) {
+ key.objectid = subvol_objectid;
+ key.type = BTRFS_ROOT_BACKREF_KEY;
+ key.offset = (u64)-1;
+
+ ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+ if (ret < 0) {
+ goto err;
+ } else if (ret > 0) {
+ ret = btrfs_previous_item(root, path, subvol_objectid,
+ BTRFS_ROOT_BACKREF_KEY);
+ if (ret < 0) {
+ goto err;
+ } else if (ret > 0) {
+ ret = -ENOENT;
+ goto err;
+ }
+ }
+
+ btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+ subvol_objectid = key.offset;
+
+ root_ref = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_root_ref);
+ len = btrfs_root_ref_name_len(path->nodes[0], root_ref);
+ ptr -= len + 1;
+ if (ptr < name) {
+ ret = -ENAMETOOLONG;
+ goto err;
+ }
+ read_extent_buffer(path->nodes[0], ptr + 1,
+ (unsigned long)(root_ref + 1), len);
+ ptr[0] = '/';
+ dirid = btrfs_root_ref_dirid(path->nodes[0], root_ref);
+ btrfs_release_path(path);
+
+ key.objectid = subvol_objectid;
+ key.type = BTRFS_ROOT_ITEM_KEY;
+ key.offset = (u64)-1;
+ fs_root = btrfs_read_fs_root_no_name(fs_info, &key);
+ if (IS_ERR(fs_root)) {
+ ret = PTR_ERR(fs_root);
+ goto err;
+ }
+
+ /*
+ * Walk up the filesystem tree by inode refs until we hit the
+ * root directory.
+ */
+ while (dirid != BTRFS_FIRST_FREE_OBJECTID) {
+ key.objectid = dirid;
+ key.type = BTRFS_INODE_REF_KEY;
+ key.offset = (u64)-1;
+
+ ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0);
+ if (ret < 0) {
+ goto err;
+ } else if (ret > 0) {
+ ret = btrfs_previous_item(fs_root, path, dirid,
+ BTRFS_INODE_REF_KEY);
+ if (ret < 0) {
+ goto err;
+ } else if (ret > 0) {
+ ret = -ENOENT;
+ goto err;
+ }
+ }
+
+ btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+ dirid = key.offset;
+
+ inode_ref = btrfs_item_ptr(path->nodes[0],
+ path->slots[0],
+ struct btrfs_inode_ref);
+ len = btrfs_inode_ref_name_len(path->nodes[0],
+ inode_ref);
+ ptr -= len + 1;
+ if (ptr < name) {
+ ret = -ENAMETOOLONG;
+ goto err;
+ }
+ read_extent_buffer(path->nodes[0], ptr + 1,
+ (unsigned long)(inode_ref + 1), len);
+ ptr[0] = '/';
+ btrfs_release_path(path);
+ }
}
+ btrfs_free_path(path);
+ if (ptr == name + PATH_MAX - 1) {
+ name[0] = '/';
+ name[1] = '\0';
+ } else {
+ memmove(name, ptr, name + PATH_MAX - ptr);
+ }
+ return name;
+
+err:
+ btrfs_free_path(path);
+ kfree(name);
+ return ERR_PTR(ret);
+}
+
+static int get_default_subvol_objectid(struct btrfs_fs_info *fs_info, u64 *objectid)
+{
+ struct btrfs_root *root = fs_info->tree_root;
+ struct btrfs_dir_item *di;
+ struct btrfs_path *path;
+ struct btrfs_key location;
+ u64 dir_id;
+
path = btrfs_alloc_path();
if (!path)
- return ERR_PTR(-ENOMEM);
+ return -ENOMEM;
path->leave_spinning = 1;
/*
@@ -879,49 +999,23 @@ static struct dentry *get_default_root(struct super_block *sb,
di = btrfs_lookup_dir_item(NULL, root, path, dir_id, "default", 7, 0);
if (IS_ERR(di)) {
btrfs_free_path(path);
- return ERR_CAST(di);
+ return PTR_ERR(di);
}
if (!di) {
/*
* Ok the default dir item isn't there. This is weird since
* it's always been there, but don't freak out, just try and
- * mount to root most subvolume.
+ * mount the top-level subvolume.
*/
btrfs_free_path(path);
- dir_id = BTRFS_FIRST_FREE_OBJECTID;
- new_root = fs_info->fs_root;
- goto setup_root;
+ *objectid = BTRFS_FS_TREE_OBJECTID;
+ return 0;
}
btrfs_dir_item_key_to_cpu(path->nodes[0], di, &location);
btrfs_free_path(path);
-
-find_root:
- new_root = btrfs_read_fs_root_no_name(fs_info, &location);
- if (IS_ERR(new_root))
- return ERR_CAST(new_root);
-
- dir_id = btrfs_root_dirid(&new_root->root_item);
-setup_root:
- location.objectid = dir_id;
- location.type = BTRFS_INODE_ITEM_KEY;
- location.offset = 0;
-
- inode = btrfs_iget(sb, &location, new_root, &new);
- if (IS_ERR(inode))
- return ERR_CAST(inode);
-
- /*
- * If we're just mounting the root most subvol put the inode and return
- * a reference to the dentry. We will have already gotten a reference
- * to the inode in btrfs_fill_super so we're good to go.
- */
- if (!new && sb->s_root->d_inode == inode) {
- iput(inode);
- return dget(sb->s_root);
- }
-
- return d_obtain_root(inode);
+ *objectid = location.objectid;
+ return 0;
}
static int btrfs_fill_super(struct super_block *sb,
@@ -1207,6 +1301,25 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
goto out;
}
+ if (!subvol_name) {
+ if (!subvol_objectid) {
+ ret = get_default_subvol_objectid(btrfs_sb(mnt->mnt_sb),
+ &subvol_objectid);
+ if (ret) {
+ root = ERR_PTR(ret);
+ goto out;
+ }
+ }
+ subvol_name = get_subvol_name_from_objectid(btrfs_sb(mnt->mnt_sb),
+ subvol_objectid);
+ if (IS_ERR(subvol_name)) {
+ root = ERR_CAST(subvol_name);
+ subvol_name = NULL;
+ goto out;
+ }
+
+ }
+
root = mount_subtree(mnt, subvol_name);
/* mount_subtree() drops our reference on the vfsmount. */
mnt = NULL;
@@ -1222,6 +1335,11 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
ret = -EINVAL;
}
if (subvol_objectid && root_objectid != subvol_objectid) {
+ /*
+ * This will also catch a race condition where a
+ * subvolume which was passed by ID is renamed and
+ * another subvolume is renamed over the old location.
+ */
pr_err("BTRFS: subvol '%s' does not match subvolid %llu\n",
subvol_name, subvol_objectid);
ret = -EINVAL;
@@ -1301,7 +1419,6 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
{
struct block_device *bdev = NULL;
struct super_block *s;
- struct dentry *root;
struct btrfs_fs_devices *fs_devices = NULL;
struct btrfs_fs_info *fs_info = NULL;
struct security_mnt_opts new_sec_opts;
@@ -1321,7 +1438,7 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
return ERR_PTR(error);
}
- if (subvol_name) {
+ if (subvol_name || subvol_objectid != BTRFS_FS_TREE_OBJECTID) {
/* mount_subvol() will free subvol_name. */
return mount_subvol(subvol_name, subvol_objectid, flags,
device_name, data);
@@ -1390,23 +1507,19 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
error = btrfs_fill_super(s, fs_devices, data,
flags & MS_SILENT ? 1 : 0);
}
-
- root = !error ? get_default_root(s, subvol_objectid) : ERR_PTR(error);
- if (IS_ERR(root)) {
+ if (error) {
deactivate_locked_super(s);
- error = PTR_ERR(root);
goto error_sec_opts;
}
fs_info = btrfs_sb(s);
error = setup_security_options(fs_info, s, &new_sec_opts);
if (error) {
- dput(root);
deactivate_locked_super(s);
goto error_sec_opts;
}
- return root;
+ return dget(s->s_root);
error_close_devices:
btrfs_close_devices(fs_devices);
--
2.3.5
Now that we're guaranteed to have a meaningful root dentry, we can just
export seq_dentry() and use it in btrfs_show_options(). The subvolume ID
is easy to get and can also be useful, so put that in there, too.
Signed-off-by: Omar Sandoval <[email protected]>
---
fs/btrfs/super.c | 4 ++++
fs/seq_file.c | 1 +
2 files changed, 5 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 80a8047..f334cc4 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1193,6 +1193,10 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
seq_puts(seq, ",fatal_errors=panic");
if (info->commit_interval != BTRFS_DEFAULT_COMMIT_INTERVAL)
seq_printf(seq, ",commit=%d", info->commit_interval);
+ seq_printf(seq, ",subvolid=%llu",
+ BTRFS_I(d_inode(dentry))->root->root_key.objectid);
+ seq_puts(seq, ",subvol=");
+ seq_dentry(seq, dentry, " \t\n\\");
return 0;
}
diff --git a/fs/seq_file.c b/fs/seq_file.c
index 555f821..52b4927 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -538,6 +538,7 @@ int seq_dentry(struct seq_file *m, struct dentry *dentry, const char *esc)
return res;
}
+EXPORT_SYMBOL(seq_dentry);
static void *single_start(struct seq_file *p, loff_t *pos)
{
--
2.3.5
-------- Original Message --------
Subject: [PATCH v2 4/6] Btrfs: fail on mismatched subvol and subvolid
mount options
From: Omar Sandoval <[email protected]>
To: Chris Mason <[email protected]>, Josef Bacik <[email protected]>, David Sterba
<[email protected]>, <[email protected]>
Date: 2015年04月10日 05:34
> There's nothing to stop a user from passing both subvol= and subvolid=
> to mount, but if they don't refer to the same subvolume, someone is
> going to be surprised at some point. Error out on this case, but allow
> users to pass in both if they do match (which they could, for example,
> get out of /proc/mounts).
Not sure should we do this extra check, as later mount options override
previous mount option.
I previous tried to do such thing for mount option like inode/noinode,
but was rejected for that reason.
So not sure such error-out behavior is OK or not.
Maybe only taking the latest subvol/subvolid is a better choice?
Thanks,
Qu
>
> Signed-off-by: Omar Sandoval <[email protected]>
> ---
> fs/btrfs/super.c | 32 ++++++++++++++++++++++++--------
> 1 file changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index ab100e5..20b470d 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1163,8 +1163,9 @@ static char *setup_root_args(char *args)
> return buf;
> }
>
> -static struct dentry *mount_subvol(const char *subvol_name, int flags,
> - const char *device_name, char *data)
> +static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
> + int flags, const char *device_name,
> + char *data)
> {
> struct dentry *root;
> struct vfsmount *mnt = NULL;
> @@ -1210,12 +1211,26 @@ static struct dentry *mount_subvol(const char *subvol_name, int flags,
> /* mount_subtree() drops our reference on the vfsmount. */
> mnt = NULL;
>
> - if (!IS_ERR(root) && !is_subvolume_inode(root->d_inode)) {
> + if (!IS_ERR(root)) {
> struct super_block *s = root->d_sb;
> - dput(root);
> - root = ERR_PTR(-EINVAL);
> - deactivate_locked_super(s);
> - pr_err("BTRFS: '%s' is not a valid subvolume\n", subvol_name);
> + u64 root_objectid = BTRFS_I(root->d_inode)->root->root_key.objectid;
> +
> + ret = 0;
> + if (!is_subvolume_inode(root->d_inode)) {
> + pr_err("BTRFS: '%s' is not a valid subvolume\n",
> + subvol_name);
> + ret = -EINVAL;
> + }
> + if (subvol_objectid && root_objectid != subvol_objectid) {
> + pr_err("BTRFS: subvol '%s' does not match subvolid %llu\n",
> + subvol_name, subvol_objectid);
> + ret = -EINVAL;
> + }
> + if (ret) {
> + dput(root);
> + root = ERR_PTR(ret);
> + deactivate_locked_super(s);
> + }
> }
>
> out:
> @@ -1308,7 +1323,8 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
>
> if (subvol_name) {
> /* mount_subvol() will free subvol_name. */
> - return mount_subvol(subvol_name, flags, device_name, data);
> + return mount_subvol(subvol_name, subvol_objectid, flags,
> + device_name, data);
> }
>
> security_init_mnt_opts(&new_sec_opts);
>
On Fri, Apr 10, 2015 at 08:39:53AM +0800, Qu Wenruo wrote:
> > There's nothing to stop a user from passing both subvol= and subvolid=
> > to mount, but if they don't refer to the same subvolume, someone is
> > going to be surprised at some point. Error out on this case, but allow
> > users to pass in both if they do match (which they could, for example,
> > get out of /proc/mounts).
> Not sure should we do this extra check, as later mount options override
> previous mount option.
>
> I previous tried to do such thing for mount option like inode/noinode,
> but was rejected for that reason.
Do you have a link to the discussion?
> So not sure such error-out behavior is OK or not.
> Maybe only taking the latest subvol/subvolid is a better choice?
If not sure, follow the principle of least surprise. If both subvolid
and subvol are passed and match then it's IMHO ok, no matter if the
options match "by accident" or intentionally. Eg. copy&paste from
/proc/mounts should work.
If the options do not match we can't decide which one is the right one.
The surprise would come if the user wants one (eg. subvolid) but the
other one would be applied in the end (subvol).
On Thu, Apr 09, 2015 at 02:34:50PM -0700, Omar Sandoval wrote:
> Here's version 2 of providing the subvolume name and ID in /proc/mounts.
>
> It turns out that getting the name of a subvolume reliably is a bit
> trickier than it would seem because of how mounting subvolumes by ID is
> implemented. In particular, in that case, the dentry we get for the root
> of the mount is not necessarily attached to the dentry tree, which means
> that the obvious solution of just dumping the dentry does not work. The
> solution I put together makes the tradeoff of churning a bit more code
> in order to avoid implementing this with weird hacks.
>
> Changes from v1 (https://lkml.org/lkml/2015/4/8/16):
>
> - Put subvol= last in show_options
> - Change commit log to remove comment about userspace having no way to
> know which subvolume is mounted, as David pointed out you can use
> btrfs inspect-internal rootid <mountpoint>
> - Split up patch 2
> - Minor coding style fixes
>
> This still applies to v4.0-rc7. Tested manually and with the script
> below (updated from v1).
>
> Thanks!
>
> Omar Sandoval (6):
> Btrfs: lock superblock before remounting for rw subvol
> Btrfs: remove all subvol options before mounting top-level
> Btrfs: clean up error handling in mount_subvol()
> Btrfs: fail on mismatched subvol and subvolid mount options
> Btrfs: unify subvol= and subvolid= mounting
> Btrfs: show subvol= and subvolid= in /proc/mounts
>
> fs/btrfs/super.c | 376 ++++++++++++++++++++++++++++++++++++-------------------
> fs/seq_file.c | 1 +
> 2 files changed, 251 insertions(+), 126 deletions(-)
>
Hi, everyone,
Just wanted to revive this so we can hopefully come up with a solution
we agree on in time for 4.2.
Just to recap, my approach (and also Qu Wenruo's original approach) is
to convert subvolid= mounts to subvol= mounts at mount time, which makes
showing the subvolume in /proc/mounts easy. The benefit of this approach
is that looking at mount information, which is supposed to be a
lightweight operation, is simple and always works. Additionally, we'll
have the info in a convenient format in /proc/mounts in addition to
/proc/$PID/mountinfo. The only caveat is that a mount by subvolid can
fail if the mount races with a rename of the subvolume.
Qu Wenruo's second approach was to instead convert the subvolid to a
subvolume path when reading /proc/$PID/mountinfo. The benefit of this
approach is that mounts by subvolid will always succeed in the face of
concurrent renames. However, instead, getting the subvolume path in
mountinfo can now fail, and it makes what should probably be a
lightweight operation somewhat complex.
In terms of the code, I think the original approach is cleaner: the
heavy lifting is done when mounting instead of when reading a proc file.
Additionally, I don't think that the concurrent rename race will be much
of a problem in practice. I can't imagine that too many people are
actively renaming subvolumes at the same time as they are mounting them,
and even if they are, I don't think it's so surprising that it would
fail. On the other hand, reading mount info while renaming subvolumes
might be marginally more common, and personally, if that failed, I'd be
unpleasantly surprised.
Orthogonal to that decision is the precedence of subvolid= and subvol=.
Although it's true that mount options usually have last-one-wins
behavior, I think David's argument regarding the principle of least
surprise is solid. Namely, someone's going to be unhappy with a
seemingly arbitrary decision when they don't match.
Sorry for the long-winded email! Thoughts, David, Qu?
Thanks,
--
Omar
On Mon, May 11, 2015 at 02:42:58AM -0700, Omar Sandoval wrote:
> Sorry for the long-winded email! Thoughts, David, Qu?
I'm ok with the approach and the patchset as a whole. There was one
minor issue with the aliasing of the toplevel subvolume (id 0 or 5).
I'll post a comment to the patch.
On Thu, Apr 09, 2015 at 02:34:55PM -0700, Omar Sandoval wrote:
> @@ -1321,7 +1438,7 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
> return ERR_PTR(error);
> }
>
> - if (subvol_name) {
> + if (subvol_name || subvol_objectid != BTRFS_FS_TREE_OBJECTID) {
> /* mount_subvol() will free subvol_name. */
> return mount_subvol(subvol_name, subvol_objectid, flags,
> device_name, data);
The story is that I've used this patchset as a base for my per-subvolume
mount flags because it made the mount path cleaner, so it's possible that
the issue I saw was related to my changes.
The modified 'if' above did not catch subvol_objectid == 0. I'm not sure
now how it got there, btrfs_parse_early_options recognizes 0 and
switches it to 5.
My testing script is quite simple and does only
mount -o compress-force=lzo,subvol=/subv1 $dev mnt
mount -o compress-force=zlib,subvol=/subv2 $dev mnt2
after mkfs. The first pass would resolve the path to the subvol root and
replace the options with subvolid=0, calls mount_subvol, vfs_kern_mount
than in turn goes back to btrfs_mount. Unless I'm missing something,
this should also work, because parse_early_options is called again and
does subvolid=0 -> subvol_objectid=5 . Oh well, more debugging needed,
not a blocker for this patchset.
On Thu, Apr 09, 2015 at 02:34:51PM -0700, Omar Sandoval wrote:
> Since commit 0723a0473fb4 ("btrfs: allow mounting btrfs subvolumes with
> different ro/rw options"), when mounting a subvolume read/write when
> another subvolume has previously been mounted read-only, we first do a
> remount. However, this should be done with the superblock locked, as per
> sync_filesystem():
>
> /*
> * We need to be protected against the filesystem going from
> * r/o to r/w or vice versa.
> */
> WARN_ON(!rwsem_is_locked(&sb->s_umount));
>
> This WARN_ON can easily be hit with:
>
> mkfs.btrfs -f /dev/vdb
> mount /dev/vdb /mnt
> btrfs subvol create /mnt/vol1
> btrfs subvol create /mnt/vol2
> umount /mnt
> mount -oro,subvol=/vol1 /dev/vdb /mnt
> mount -orw,subvol=/vol2 /dev/vdb /mnt2
>
> Fixes: 0723a0473fb4 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options")
> Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: David Sterba <[email protected]>
On Thu, Apr 09, 2015 at 02:34:52PM -0700, Omar Sandoval wrote:
> Currently, setup_root_args() substitutes 's/subvol=[^,]*/subvolid=0/'.
> But, this means that if the user passes both a subvol and subvolid for
> some reason, we won't actually mount the top-level when we recursively
> mount. For example, consider:
>
> mkfs.btrfs -f /dev/sdb
> mount /dev/sdb /mnt
> btrfs subvol create /mnt/subvol1 # subvolid=257
> btrfs subvol create /mnt/subvol2 # subvolid=258
> umount /mnt
> mount -osubvol=/subvol1,subvolid=258 /dev/sdb /mnt
>
> In the final mount, subvol=/subvol1,subvolid=258 becomes
> subvolid=0,subvolid=258, and the last option takes precedence, so we
> mount subvol2 and try to look up subvol1 inside of it, which fails.
>
> So, instead, do a thorough scan through the argument list and remove any
> subvol= and subvolid= options, then append subvolid=0 to the end. This
> implicitly makes subvol= take precedence over subvolid=, but we're about
> to add a stricter check for that. This also makes setup_root_args() more
> generic, which we'll need soon.
>
> Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: David Sterba <[email protected]>
On Thu, Apr 09, 2015 at 02:34:53PM -0700, Omar Sandoval wrote:
> In preparation for new functionality in mount_subvol(), give it
> ownership of subvol_name and tidy up the error paths.
>
> Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: David Sterba <[email protected]>
On Thu, Apr 09, 2015 at 02:34:54PM -0700, Omar Sandoval wrote:
> There's nothing to stop a user from passing both subvol= and subvolid=
> to mount, but if they don't refer to the same subvolume, someone is
> going to be surprised at some point. Error out on this case, but allow
> users to pass in both if they do match (which they could, for example,
> get out of /proc/mounts).
>
> Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: David Sterba <[email protected]>
On Thu, Apr 09, 2015 at 02:34:55PM -0700, Omar Sandoval wrote:
> Currently, mounting a subvolume with subvolid= takes a different code
> path than mounting with subvol=. This isn't really a big deal except for
> the fact that mounts done with subvolid= or the default subvolume don't
> have a dentry that's connected to the dentry tree like in the subvol=
> case. To unify the code paths, when given subvolid= or using the default
> subvolume ID, translate it into a subvolume name by walking
> ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees.
>
> Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: David Sterba <[email protected]>
On Thu, Apr 09, 2015 at 02:34:56PM -0700, Omar Sandoval wrote:
> Now that we're guaranteed to have a meaningful root dentry, we can just
> export seq_dentry() and use it in btrfs_show_options(). The subvolume ID
> is easy to get and can also be useful, so put that in there, too.
>
> Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: David Sterba <[email protected]>
On Mon, May 11, 2015 at 05:37:24PM +0200, David Sterba wrote:
> On Thu, Apr 09, 2015 at 02:34:55PM -0700, Omar Sandoval wrote:
> > @@ -1321,7 +1438,7 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
> > return ERR_PTR(error);
> > }
> >
> > - if (subvol_name) {
> > + if (subvol_name || subvol_objectid != BTRFS_FS_TREE_OBJECTID) {
> > /* mount_subvol() will free subvol_name. */
> > return mount_subvol(subvol_name, subvol_objectid, flags,
> > device_name, data);
>
> The story is that I've used this patchset as a base for my per-subvolume
> mount flags because it made the mount path cleaner, so it's possible that
> the issue I saw was related to my changes.
>
> The modified 'if' above did not catch subvol_objectid == 0. I'm not sure
> now how it got there, btrfs_parse_early_options recognizes 0 and
> switches it to 5.
>
> My testing script is quite simple and does only
>
> mount -o compress-force=lzo,subvol=/subv1 $dev mnt
> mount -o compress-force=zlib,subvol=/subv2 $dev mnt2
>
> after mkfs. The first pass would resolve the path to the subvol root and
> replace the options with subvolid=0, calls mount_subvol, vfs_kern_mount
> than in turn goes back to btrfs_mount. Unless I'm missing something,
> this should also work, because parse_early_options is called again and
> does subvolid=0 -> subvol_objectid=5 . Oh well, more debugging needed,
> not a blocker for this patchset.
Hm, yeah, I don't see how that would happen. Do you have that patchset
up anywhere?
--
Omar
On Mon, May 11, 2015 at 04:51:37PM +0200, David Sterba wrote:
> On Mon, May 11, 2015 at 02:42:58AM -0700, Omar Sandoval wrote:
> > Sorry for the long-winded email! Thoughts, David, Qu?
>
> I'm ok with the approach and the patchset as a whole. There was one
> minor issue with the aliasing of the toplevel subvolume (id 0 or 5).
> I'll post a comment to the patch.
Thanks a lot for the review, I appreciate it.
I had to resolve a couple of minor conflicts rebasing this on v4.1-rc3
(due to the dentry->d_inode to d_inode(dentry) conversion), so I can
resend this soon if there are aren't any other issues.
--
Omar
-------- Original Message --------
Subject: Re: [PATCH v2 0/6] Btrfs: show subvolume name and ID in
/proc/mounts
From: Omar Sandoval <[email protected]>
To: David Sterba <[email protected]>, Qu Wenruo <[email protected]>,
<[email protected]>
Date: 2015年05月11日 17:42
> On Thu, Apr 09, 2015 at 02:34:50PM -0700, Omar Sandoval wrote:
>> Here's version 2 of providing the subvolume name and ID in /proc/mounts.
>>
>> It turns out that getting the name of a subvolume reliably is a bit
>> trickier than it would seem because of how mounting subvolumes by ID is
>> implemented. In particular, in that case, the dentry we get for the root
>> of the mount is not necessarily attached to the dentry tree, which means
>> that the obvious solution of just dumping the dentry does not work. The
>> solution I put together makes the tradeoff of churning a bit more code
>> in order to avoid implementing this with weird hacks.
>>
>> Changes from v1 (https://lkml.org/lkml/2015/4/8/16):
>>
>> - Put subvol= last in show_options
>> - Change commit log to remove comment about userspace having no way to
>> know which subvolume is mounted, as David pointed out you can use
>> btrfs inspect-internal rootid <mountpoint>
>> - Split up patch 2
>> - Minor coding style fixes
>>
>> This still applies to v4.0-rc7. Tested manually and with the script
>> below (updated from v1).
>>
>> Thanks!
>>
>> Omar Sandoval (6):
>> Btrfs: lock superblock before remounting for rw subvol
>> Btrfs: remove all subvol options before mounting top-level
>> Btrfs: clean up error handling in mount_subvol()
>> Btrfs: fail on mismatched subvol and subvolid mount options
>> Btrfs: unify subvol= and subvolid= mounting
>> Btrfs: show subvol= and subvolid= in /proc/mounts
>>
>> fs/btrfs/super.c | 376 ++++++++++++++++++++++++++++++++++++-------------------
>> fs/seq_file.c | 1 +
>> 2 files changed, 251 insertions(+), 126 deletions(-)
>>
>
> Hi, everyone,
>
> Just wanted to revive this so we can hopefully come up with a solution
> we agree on in time for 4.2.
>
> Just to recap, my approach (and also Qu Wenruo's original approach) is
> to convert subvolid= mounts to subvol= mounts at mount time, which makes
> showing the subvolume in /proc/mounts easy. The benefit of this approach
> is that looking at mount information, which is supposed to be a
> lightweight operation, is simple and always works. Additionally, we'll
> have the info in a convenient format in /proc/mounts in addition to
> /proc/$PID/mountinfo. The only caveat is that a mount by subvolid can
> fail if the mount races with a rename of the subvolume.
>
> Qu Wenruo's second approach was to instead convert the subvolid to a
> subvolume path when reading /proc/$PID/mountinfo. The benefit of this
> approach is that mounts by subvolid will always succeed in the face of
> concurrent renames. However, instead, getting the subvolume path in
> mountinfo can now fail, and it makes what should probably be a
> lightweight operation somewhat complex.
>
> In terms of the code, I think the original approach is cleaner: the
> heavy lifting is done when mounting instead of when reading a proc file.
> Additionally, I don't think that the concurrent rename race will be much
> of a problem in practice. I can't imagine that too many people are
> actively renaming subvolumes at the same time as they are mounting them,
> and even if they are, I don't think it's so surprising that it would
> fail. On the other hand, reading mount info while renaming subvolumes
> might be marginally more common, and personally, if that failed, I'd be
> unpleasantly surprised.
>
> Orthogonal to that decision is the precedence of subvolid= and subvol=.
> Although it's true that mount options usually have last-one-wins
> behavior, I think David's argument regarding the principle of least
> surprise is solid. Namely, someone's going to be unhappy with a
> seemingly arbitrary decision when they don't match.
>
> Sorry for the long-winded email! Thoughts, David, Qu?
>
> Thanks,
>
I'm OK with your patchset, just as you mentioned, concurrently mount
with rename is not such a common thing.
And I'm also happy with the cleaner unified mount codes.
Thanks,
Qu