2014-11-06 01:44:19

by Richard Yao

[permalink] [raw]
Subject: [PATCH v4 0/1] vfs: Respect MS_RDONLY at bind mount creation

Dear Linus,

I am the ZFSOnLinux developer that ran into you on his way to the airport at
LinuxCon Europe last month. This is the VFS patch that you volunteered to help
me get properly reviewed. My plan was to send it the week after LinuxCon Europe
(last week) as you had asked. However, that was delayed until Monday last week.

Coincidentally, I was in Boston last week and I had the opportunity to chat
with Ted about filesystem stuff in person. He suggested me that this patch
likely got lost in your inbox. He has also volunteered to review the patch and
asked that I resend.

This patch fixes a bug that has annoyed both myself and others for some time
now. I am presently super busy preparing for next week's Open ZFS developer
summit, but I would love to see this patch merged. I will make time to respond
to any emails regarding it.

Yours truly,
Richard Yao

Richard Yao (1):
vfs: Respect MS_RDONLY at bind mount creation

fs/namespace.c | 20 ++++++++++++++++----
fs/pnode.h | 17 +++++++++--------
2 files changed, 25 insertions(+), 12 deletions(-)

--
2.0.4


2014-11-06 01:44:30

by Richard Yao

[permalink] [raw]
Subject: [PATCH v4 1/1] vfs: Respect MS_RDONLY at bind mount creation

`mount -o bind,ro ...` suffers from a silent failure where the readonly
flag is ignored. The bind mount will be created rw whenever the target
is rw. Users typically workaround this by remounting readonly, but that
does not work when you want to define readonly bind mounts in fstab.
This is a major annoyance when dealing with recursive bind mounts
because the userland mount command does not expose the option to
recursively remount a subtree as readonly.

Signed-off-by: Richard Yao <[email protected]>
---
fs/namespace.c | 20 ++++++++++++++++----
fs/pnode.h | 17 +++++++++--------
2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5b66b2b..6f07336 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -990,6 +990,14 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
if (flag & CL_MAKE_SHARED)
set_mnt_shared(mnt);

+ /*
+ * We set the flag directly because the mount point is not yet visible.
+ * This means there are no writers that require the checks in
+ * mnt_make_readonly().
+ */
+ if (flag & CL_MAKE_RDONLY)
+ mnt->mnt.mnt_flags |= MNT_READONLY;
+
/* stick the duplicate mount on the same expiry list
* as the original if that was on one */
if (flag & CL_EXPIRE) {
@@ -1992,11 +2000,13 @@ static bool has_locked_children(struct mount *mnt, struct dentry *dentry)
* do loopback mount.
*/
static int do_loopback(struct path *path, const char *old_name,
- int recurse)
+ unsigned long flags)
{
struct path old_path;
struct mount *mnt = NULL, *old, *parent;
struct mountpoint *mp;
+ int recurse = flags & MS_REC;
+ int clflags = (flags & MS_RDONLY) ? CL_MAKE_RDONLY : 0;
int err;
if (!old_name || !*old_name)
return -EINVAL;
@@ -2027,9 +2037,10 @@ static int do_loopback(struct path *path, const char *old_name,
goto out2;

if (recurse)
- mnt = copy_tree(old, old_path.dentry, CL_COPY_MNT_NS_FILE);
+ mnt = copy_tree(old, old_path.dentry, CL_COPY_MNT_NS_FILE |
+ clflags);
else
- mnt = clone_mnt(old, old_path.dentry, 0);
+ mnt = clone_mnt(old, old_path.dentry, clflags);

if (IS_ERR(mnt)) {
err = PTR_ERR(mnt);
@@ -2625,7 +2636,8 @@ long do_mount(const char *dev_name, const char __user *dir_name,
retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
data_page);
else if (flags & MS_BIND)
- retval = do_loopback(&path, dev_name, flags & MS_REC);
+ retval = do_loopback(&path, dev_name, flags & (MS_REC |
+ MS_RDONLY));
else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
retval = do_change_type(&path, flags);
else if (flags & MS_MOVE)
diff --git a/fs/pnode.h b/fs/pnode.h
index 4a24635..326f5be 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -20,14 +20,15 @@
#define SET_MNT_MARK(m) ((m)->mnt.mnt_flags |= MNT_MARKED)
#define CLEAR_MNT_MARK(m) ((m)->mnt.mnt_flags &= ~MNT_MARKED)

-#define CL_EXPIRE 0x01
-#define CL_SLAVE 0x02
-#define CL_COPY_UNBINDABLE 0x04
-#define CL_MAKE_SHARED 0x08
-#define CL_PRIVATE 0x10
-#define CL_SHARED_TO_SLAVE 0x20
-#define CL_UNPRIVILEGED 0x40
-#define CL_COPY_MNT_NS_FILE 0x80
+#define CL_EXPIRE 0x001
+#define CL_SLAVE 0x002
+#define CL_COPY_UNBINDABLE 0x004
+#define CL_MAKE_SHARED 0x008
+#define CL_PRIVATE 0x010
+#define CL_SHARED_TO_SLAVE 0x020
+#define CL_UNPRIVILEGED 0x040
+#define CL_COPY_MNT_NS_FILE 0x080
+#define CL_MAKE_RDONLY 0x100

#define CL_COPY_ALL (CL_COPY_UNBINDABLE | CL_COPY_MNT_NS_FILE)

--
2.0.4

2015-02-26 01:47:45

by Mateusz Guzik

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] vfs: Respect MS_RDONLY at bind mount creation

On Wed, Nov 05, 2014 at 08:44:03PM -0500, Richard Yao wrote:
> `mount -o bind,ro ...` suffers from a silent failure where the readonly
> flag is ignored. The bind mount will be created rw whenever the target
> is rw. Users typically workaround this by remounting readonly, but that
> does not work when you want to define readonly bind mounts in fstab.
> This is a major annoyance when dealing with recursive bind mounts
> because the userland mount command does not expose the option to
> recursively remount a subtree as readonly.

This is still a problem in 4.0 kernels.

Can we get this fixed please?

Is there anything wrong with the patch posted here? AFAICT the patch is
still fine (apart from some whitespace issues).

Thanks,
>
> Signed-off-by: Richard Yao <[email protected]>
> ---
> fs/namespace.c | 20 ++++++++++++++++----
> fs/pnode.h | 17 +++++++++--------
> 2 files changed, 25 insertions(+), 12 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 5b66b2b..6f07336 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -990,6 +990,14 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
> if (flag & CL_MAKE_SHARED)
> set_mnt_shared(mnt);
>
> + /*
> + * We set the flag directly because the mount point is not yet visible.
> + * This means there are no writers that require the checks in
> + * mnt_make_readonly().
> + */
> + if (flag & CL_MAKE_RDONLY)
> + mnt->mnt.mnt_flags |= MNT_READONLY;
> +
> /* stick the duplicate mount on the same expiry list
> * as the original if that was on one */
> if (flag & CL_EXPIRE) {
> @@ -1992,11 +2000,13 @@ static bool has_locked_children(struct mount *mnt, struct dentry *dentry)
> * do loopback mount.
> */
> static int do_loopback(struct path *path, const char *old_name,
> - int recurse)
> + unsigned long flags)
> {
> struct path old_path;
> struct mount *mnt = NULL, *old, *parent;
> struct mountpoint *mp;
> + int recurse = flags & MS_REC;
> + int clflags = (flags & MS_RDONLY) ? CL_MAKE_RDONLY : 0;
> int err;
> if (!old_name || !*old_name)
> return -EINVAL;
> @@ -2027,9 +2037,10 @@ static int do_loopback(struct path *path, const char *old_name,
> goto out2;
>
> if (recurse)
> - mnt = copy_tree(old, old_path.dentry, CL_COPY_MNT_NS_FILE);
> + mnt = copy_tree(old, old_path.dentry, CL_COPY_MNT_NS_FILE |
> + clflags);
> else
> - mnt = clone_mnt(old, old_path.dentry, 0);
> + mnt = clone_mnt(old, old_path.dentry, clflags);
>
> if (IS_ERR(mnt)) {
> err = PTR_ERR(mnt);
> @@ -2625,7 +2636,8 @@ long do_mount(const char *dev_name, const char __user *dir_name,
> retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
> data_page);
> else if (flags & MS_BIND)
> - retval = do_loopback(&path, dev_name, flags & MS_REC);
> + retval = do_loopback(&path, dev_name, flags & (MS_REC |
> + MS_RDONLY));
> else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
> retval = do_change_type(&path, flags);
> else if (flags & MS_MOVE)
> diff --git a/fs/pnode.h b/fs/pnode.h
> index 4a24635..326f5be 100644
> --- a/fs/pnode.h
> +++ b/fs/pnode.h
> @@ -20,14 +20,15 @@
> #define SET_MNT_MARK(m) ((m)->mnt.mnt_flags |= MNT_MARKED)
> #define CLEAR_MNT_MARK(m) ((m)->mnt.mnt_flags &= ~MNT_MARKED)
>
> -#define CL_EXPIRE 0x01
> -#define CL_SLAVE 0x02
> -#define CL_COPY_UNBINDABLE 0x04
> -#define CL_MAKE_SHARED 0x08
> -#define CL_PRIVATE 0x10
> -#define CL_SHARED_TO_SLAVE 0x20
> -#define CL_UNPRIVILEGED 0x40
> -#define CL_COPY_MNT_NS_FILE 0x80
> +#define CL_EXPIRE 0x001
> +#define CL_SLAVE 0x002
> +#define CL_COPY_UNBINDABLE 0x004
> +#define CL_MAKE_SHARED 0x008
> +#define CL_PRIVATE 0x010
> +#define CL_SHARED_TO_SLAVE 0x020
> +#define CL_UNPRIVILEGED 0x040
> +#define CL_COPY_MNT_NS_FILE 0x080
> +#define CL_MAKE_RDONLY 0x100
>
> #define CL_COPY_ALL (CL_COPY_UNBINDABLE | CL_COPY_MNT_NS_FILE)
>
> --
> 2.0.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Mateusz Guzik