2018-05-02 15:48:44

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 0/6 resend] statfs: handle mount propagation

Hey,

This is the second resend of this patchset. I'm not sure whether it has
simply been overlooked but the number of people get_maintainer.pl was
rather small and seemed a little random so I added Linus and Christoph,
two people I know that do look at VFS stuff at least from time to time,
although they weren't listed by get_maintainer.pl. I hope that's ok.

This little series
- unifies the definition of constants in statfs.h and fs.h
*Note, that Andreas has expressed doubts whether this unification is
useful. Please see https://lkml.org/lkml/2018/4/13/571 . I still think
it is but I'm happy to drop these two patches if others agree.*
- extends statfs to handle mount propagation. This will let userspace
easily query a given mountpoint for MS_UNBINDABLE, MS_SHARED,
MS_PRIVATE and MS_SLAVE without always having to do costly parsing of
/proc/<pid>/mountinfo.
To this end the flags:
- ST_UNBINDABLE
- ST_SHARED
- ST_PRIVATE
- ST_SLAVE
are added. They have the same value as their MS_* counterparts.

Thanks!
Christian

Christian Brauner (6):
fs: use << for MS_* flags
statfs: use << to align with fs header
statfs: add ST_UNBINDABLE
statfs: add ST_SHARED
statfs: add ST_SLAVE
statfs: add ST_PRIVATE

fs/statfs.c | 16 +++++++++++++++-
include/linux/statfs.h | 30 +++++++++++++++++-------------
include/uapi/linux/fs.h | 33 +++++++++++++++++----------------
3 files changed, 49 insertions(+), 30 deletions(-)

--
2.17.0



2018-05-02 15:47:00

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 3/6 resend] statfs: add ST_UNBINDABLE

Signed-off-by: Christian Brauner <[email protected]>
---
fs/statfs.c | 2 ++
include/linux/statfs.h | 1 +
2 files changed, 3 insertions(+)

diff --git a/fs/statfs.c b/fs/statfs.c
index 5b2a24f0f263..61b3063d3921 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -29,6 +29,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_NODIRATIME;
if (mnt_flags & MNT_RELATIME)
flags |= ST_RELATIME;
+ if (mnt_flags & MNT_UNBINDABLE)
+ flags |= ST_UNBINDABLE;
return flags;
}

diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index b336c04e793c..e1b84d0388c1 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -40,5 +40,6 @@ struct kstatfs {
#define ST_NOATIME (1<<10) /* do not update access times */
#define ST_NODIRATIME (1<<11) /* do not update directory access times */
#define ST_RELATIME (1<<12) /* update atime relative to mtime/ctime */
+#define ST_UNBINDABLE (1<<17) /* change to unbindable */

#endif
--
2.17.0


2018-05-02 15:47:54

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 2/6 resend] statfs: use << to align with fs header

Signed-off-by: Christian Brauner <[email protected]>
---
include/linux/statfs.h | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 3142e98546ac..b336c04e793c 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -27,18 +27,18 @@ struct kstatfs {
* ABI. The exception is ST_VALID which has the same value as MS_REMOUNT
* which doesn't make any sense for statfs.
*/
-#define ST_RDONLY 0x0001 /* mount read-only */
-#define ST_NOSUID 0x0002 /* ignore suid and sgid bits */
-#define ST_NODEV 0x0004 /* disallow access to device special files */
-#define ST_NOEXEC 0x0008 /* disallow program execution */
-#define ST_SYNCHRONOUS 0x0010 /* writes are synced at once */
-#define ST_VALID 0x0020 /* f_flags support is implemented */
-#define ST_MANDLOCK 0x0040 /* allow mandatory locks on an FS */
-/* 0x0080 used for ST_WRITE in glibc */
-/* 0x0100 used for ST_APPEND in glibc */
-/* 0x0200 used for ST_IMMUTABLE in glibc */
-#define ST_NOATIME 0x0400 /* do not update access times */
-#define ST_NODIRATIME 0x0800 /* do not update directory access times */
-#define ST_RELATIME 0x1000 /* update atime relative to mtime/ctime */
+#define ST_RDONLY (1<<0) /* mount read-only */
+#define ST_NOSUID (1<<1) /* ignore suid and sgid bits */
+#define ST_NODEV (1<<2) /* disallow access to device special files */
+#define ST_NOEXEC (1<<3) /* disallow program execution */
+#define ST_SYNCHRONOUS (1<<4) /* writes are synced at once */
+#define ST_VALID (1<<5) /* f_flags support is implemented */
+#define ST_MANDLOCK (1<<6) /* allow mandatory locks on an FS */
+/* (1<<7) used for ST_WRITE in glibc */
+/* (1<<8) used for ST_APPEND in glibc */
+/* (1<<9) used for ST_IMMUTABLE in glibc */
+#define ST_NOATIME (1<<10) /* do not update access times */
+#define ST_NODIRATIME (1<<11) /* do not update directory access times */
+#define ST_RELATIME (1<<12) /* update atime relative to mtime/ctime */

#endif
--
2.17.0


2018-05-02 15:47:57

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 6/6 resend] statfs: add ST_PRIVATE

Signed-off-by: Christian Brauner <[email protected]>
---
fs/statfs.c | 2 ++
include/linux/statfs.h | 1 +
2 files changed, 3 insertions(+)

diff --git a/fs/statfs.c b/fs/statfs.c
index 35ad0402c9a3..899e899ee84c 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -58,6 +58,8 @@ static int calculate_f_flags(struct vfsmount *mnt)

if (IS_MNT_SLAVE(real_mount(mnt)))
flags |= ST_SLAVE;
+ else if (!(flags & ST_SHARED))
+ flags |= ST_PRIVATE;

return flags;
}
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 048127effaad..663fa5498a7d 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -41,6 +41,7 @@ struct kstatfs {
#define ST_NODIRATIME (1<<11) /* do not update directory access times */
#define ST_RELATIME (1<<12) /* update atime relative to mtime/ctime */
#define ST_UNBINDABLE (1<<17) /* change to unbindable */
+#define ST_PRIVATE (1<<18) /* change to private */
#define ST_SLAVE (1<<19) /* change to slave */
#define ST_SHARED (1<<20) /* change to shared */

--
2.17.0


2018-05-02 15:48:15

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 4/6 resend] statfs: add ST_SHARED

Signed-off-by: Christian Brauner <[email protected]>
---
fs/statfs.c | 2 ++
include/linux/statfs.h | 1 +
2 files changed, 3 insertions(+)

diff --git a/fs/statfs.c b/fs/statfs.c
index 61b3063d3921..2fc6f9c3793c 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -31,6 +31,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_RELATIME;
if (mnt_flags & MNT_UNBINDABLE)
flags |= ST_UNBINDABLE;
+ if (mnt_flags & MNT_SHARED)
+ flags |= ST_SHARED;
return flags;
}

diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index e1b84d0388c1..5416b2936dd9 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -41,5 +41,6 @@ struct kstatfs {
#define ST_NODIRATIME (1<<11) /* do not update directory access times */
#define ST_RELATIME (1<<12) /* update atime relative to mtime/ctime */
#define ST_UNBINDABLE (1<<17) /* change to unbindable */
+#define ST_SHARED (1<<20) /* change to shared */

#endif
--
2.17.0


2018-05-02 15:48:25

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 5/6 resend] statfs: add ST_SLAVE

Signed-off-by: Christian Brauner <[email protected]>
---
fs/statfs.c | 10 +++++++++-
include/linux/statfs.h | 1 +
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/statfs.c b/fs/statfs.c
index 2fc6f9c3793c..35ad0402c9a3 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -10,6 +10,7 @@
#include <linux/uaccess.h>
#include <linux/compat.h>
#include "internal.h"
+#include "pnode.h"

static int flags_by_mnt(int mnt_flags)
{
@@ -50,8 +51,15 @@ static int flags_by_sb(int s_flags)

static int calculate_f_flags(struct vfsmount *mnt)
{
- return ST_VALID | flags_by_mnt(mnt->mnt_flags) |
+ int flags = 0;
+
+ flags = ST_VALID | flags_by_mnt(mnt->mnt_flags) |
flags_by_sb(mnt->mnt_sb->s_flags);
+
+ if (IS_MNT_SLAVE(real_mount(mnt)))
+ flags |= ST_SLAVE;
+
+ return flags;
}

static int statfs_by_dentry(struct dentry *dentry, struct kstatfs *buf)
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 5416b2936dd9..048127effaad 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -41,6 +41,7 @@ struct kstatfs {
#define ST_NODIRATIME (1<<11) /* do not update directory access times */
#define ST_RELATIME (1<<12) /* update atime relative to mtime/ctime */
#define ST_UNBINDABLE (1<<17) /* change to unbindable */
+#define ST_SLAVE (1<<19) /* change to slave */
#define ST_SHARED (1<<20) /* change to shared */

#endif
--
2.17.0


2018-05-02 15:48:43

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 1/6 resend] fs: use << for MS_* flags

Signed-off-by: Christian Brauner <[email protected]>
---
include/uapi/linux/fs.h | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index d2a8313fabd7..9662790a657c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -105,22 +105,23 @@ struct inodes_stat_t {
/*
* These are the fs-independent mount-flags: up to 32 flags are supported
*/
-#define MS_RDONLY 1 /* Mount read-only */
-#define MS_NOSUID 2 /* Ignore suid and sgid bits */
-#define MS_NODEV 4 /* Disallow access to device special files */
-#define MS_NOEXEC 8 /* Disallow program execution */
-#define MS_SYNCHRONOUS 16 /* Writes are synced at once */
-#define MS_REMOUNT 32 /* Alter flags of a mounted FS */
-#define MS_MANDLOCK 64 /* Allow mandatory locks on an FS */
-#define MS_DIRSYNC 128 /* Directory modifications are synchronous */
-#define MS_NOATIME 1024 /* Do not update access times. */
-#define MS_NODIRATIME 2048 /* Do not update directory access times */
-#define MS_BIND 4096
-#define MS_MOVE 8192
-#define MS_REC 16384
-#define MS_VERBOSE 32768 /* War is peace. Verbosity is silence.
- MS_VERBOSE is deprecated. */
-#define MS_SILENT 32768
+#define MS_RDONLY (1<<0) /* Mount read-only */
+#define MS_NOSUID (1<<1) /* Ignore suid and sgid bits */
+#define MS_NODEV (1<<2) /* Disallow access to device special files */
+#define MS_NOEXEC (1<<3) /* Disallow program execution */
+#define MS_SYNCHRONOUS (1<<4) /* Writes are synced at once */
+#define MS_REMOUNT (1<<5) /* Alter flags of a mounted FS */
+#define MS_MANDLOCK (1<<6) /* Allow mandatory locks on an FS */
+#define MS_DIRSYNC (1<<7) /* Directory modifications are synchronous */
+#define MS_NOATIME (1<<10) /* Do not update access times. */
+#define MS_NODIRATIME (1<<11) /* Do not update directory access times */
+#define MS_BIND (1<<12)
+#define MS_MOVE (1<<13)
+#define MS_REC (1<<14)
+#define MS_VERBOSE (1<<15) /* War is peace. Verbosity is silence.
+ * MS_VERBOSE is deprecated.
+ */
+#define MS_SILENT (1<<15)
#define MS_POSIXACL (1<<16) /* VFS does not apply the umask */
#define MS_UNBINDABLE (1<<17) /* change to unbindable */
#define MS_PRIVATE (1<<18) /* change to private */
--
2.17.0


2018-05-02 16:10:10

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 0/6 resend] statfs: handle mount propagation

On Wed, May 02, 2018 at 05:42:33PM +0200, Christian Brauner wrote:
> Hey,
>
> This is the second resend of this patchset. I'm not sure whether it has
> simply been overlooked but the number of people get_maintainer.pl was
> rather small and seemed a little random so I added Linus and Christoph,
> two people I know that do look at VFS stuff at least from time to time,
> although they weren't listed by get_maintainer.pl. I hope that's ok.
>
> This little series
> - unifies the definition of constants in statfs.h and fs.h
> *Note, that Andreas has expressed doubts whether this unification is
> useful. Please see https://lkml.org/lkml/2018/4/13/571 . I still think
> it is but I'm happy to drop these two patches if others agree.*
> - extends statfs to handle mount propagation. This will let userspace
> easily query a given mountpoint for MS_UNBINDABLE, MS_SHARED,
> MS_PRIVATE and MS_SLAVE without always having to do costly parsing of
> /proc/<pid>/mountinfo.
> To this end the flags:
> - ST_UNBINDABLE
> - ST_SHARED
> - ST_PRIVATE
> - ST_SLAVE
> are added. They have the same value as their MS_* counterparts.

How about some rationale for that in the first place? statfs() looks like
a bad match for that - not to mention anything else, there's no way to
get anything beyond "it is a peer of something", not even "do these two
get propagation between them". What would be using that, what would the
userland side of users look like, etc...

And in any case linux-api should've been Cc'd. I'm not saying that this
(or something similar) would be an inherently bad idea, but the question
"why this way?" deserves a bit more than "parsing is costly"...

2018-05-03 02:36:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 5/6 resend] statfs: add ST_SLAVE

No explanation, no nothing?

NAK.

Linus

2018-05-03 13:06:21

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 0/6 resend] statfs: handle mount propagation

On Wed, May 02, 2018 at 05:09:39PM +0100, Al Viro wrote:
> On Wed, May 02, 2018 at 05:42:33PM +0200, Christian Brauner wrote:
> > Hey,
> >
> > This is the second resend of this patchset. I'm not sure whether it has
> > simply been overlooked but the number of people get_maintainer.pl was
> > rather small and seemed a little random so I added Linus and Christoph,
> > two people I know that do look at VFS stuff at least from time to time,
> > although they weren't listed by get_maintainer.pl. I hope that's ok.
> >
> > This little series
> > - unifies the definition of constants in statfs.h and fs.h
> > *Note, that Andreas has expressed doubts whether this unification is
> > useful. Please see https://lkml.org/lkml/2018/4/13/571 . I still think
> > it is but I'm happy to drop these two patches if others agree.*
> > - extends statfs to handle mount propagation. This will let userspace
> > easily query a given mountpoint for MS_UNBINDABLE, MS_SHARED,
> > MS_PRIVATE and MS_SLAVE without always having to do costly parsing of
> > /proc/<pid>/mountinfo.
> > To this end the flags:
> > - ST_UNBINDABLE
> > - ST_SHARED
> > - ST_PRIVATE
> > - ST_SLAVE
> > are added. They have the same value as their MS_* counterparts.
>
> How about some rationale for that in the first place? statfs() looks like
> a bad match for that - not to mention anything else, there's no way to
> get anything beyond "it is a peer of something", not even "do these two
> get propagation between them". What would be using that, what would the
> userland side of users look like, etc...

Ok, sorry if I wasn't detailed enough.

From a userspace perspective we often run into the case where we simply
want to know whether a given mountpoint is MS_SHARED or is MS_SLAVE.
If it is we remount it as MS_PRIVATE to prevent any propagation from
happening. We don't care about the peer relationship or how the
propagation is exactly setup. We only want to prevent any propagation
from happening.

The above case is what I see most often. A more specific use-case is to
differentiate between MS_SLAVE and MS_SHARED mountpoints.
Mountpoints that are MS_SLAVE are kept intact and mountpoints that are
MS_SHARED are made MS_PRIVATE.

For both cases the only way to do this right now is by parsing
/proc/<pid>/mountinfo. Yes, it is doable but still it is somewhat costly
and annoying as e.g. those mount propagation fields are optional.

Another problem are scenarios where propagation matters but /proc is not
mounted.

Here are three concrete use-cases I run into frequently:
- (*buzzword alarm*) container runtimes:
They do usually clone(CLONE_NEWNS) and inherit the mount table but
want to remount specific mountpoints to prevent propagation.
Again the specifics of propagation don't matter in this case usually.

- Sharing shared mountpoints:
Sometimes it is desirable to share shared mountpoints between
namespaces.
Say, the first process sets up the shared mountpoint on the host and
all the other ones want to know "Did someone already make this rshared
or do i need to set this up?". It would be very helpful if you could
just check a mount by using the statvfs() syscall.

- I want to know whether a mountpoint is unbindable to e.g. skip it if
it is or remount it.

About how this would work what I simply expect and tested with this
patch was e.g. this:

int ret;
char *s = "/some/path";
struct statvfs sb;

ret = statvfs(s, &sb);
if (ret < 0)
return false;

if (sb.f_flag & ST_SHARED) {
ret = mount("", s, NULL, MS_SLAVE | MS_REC, NULL);
if (ret < 0)
return -1;
}

>
> And in any case linux-api should've been Cc'd. I'm not saying that this

Ah, thanks! Sorry I missed this.

> (or something similar) would be an inherently bad idea, but the question
> "why this way?" deserves a bit more than "parsing is costly"...

I hope some of the above helped to clarify this.

Thanks!
Christian

2018-05-03 13:19:58

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 5/6 resend] statfs: add ST_SLAVE

On Thu, May 03, 2018 at 02:36:09AM +0000, Linus Torvalds wrote:
> No explanation, no nothing?

I tried to explain in more detail why this could be helpful in a
response Al's comments. But I take it you're talking about the
commit messages.

When I'm redoing the series - in case this something that we decide is
worth having - I'm going to make sure that there are more detailed
commit messages. Sorry for the brevity.

Thanks!
Christian

>
> NAK.
>
> Linus

2018-05-03 13:48:01

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 0/6 resend] statfs: handle mount propagation

On Thu, May 03, 2018 at 03:04:36PM +0200, Christian Brauner wrote:

> >From a userspace perspective we often run into the case where we simply
> want to know whether a given mountpoint is MS_SHARED or is MS_SLAVE.
> If it is we remount it as MS_PRIVATE to prevent any propagation from
> happening. We don't care about the peer relationship or how the
> propagation is exactly setup. We only want to prevent any propagation
> from happening.

So what's to stop you from doing exactly that --
mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL)
without bothering with statfs() in the first place? It's not like the
damn thing had been costly - it's O(mounts in your namespace) with not
to high constant, and in any case cheaper than allocating all of them
back when you did clone(2). Confused...

> The above case is what I see most often. A more specific use-case is to
> differentiate between MS_SLAVE and MS_SHARED mountpoints.
> Mountpoints that are MS_SLAVE are kept intact and mountpoints that are
> MS_SHARED are made MS_PRIVATE.
>
> For both cases the only way to do this right now is by parsing
> /proc/<pid>/mountinfo. Yes, it is doable but still it is somewhat costly
> and annoying as e.g. those mount propagation fields are optional.

Umm... And how would you get the list of mountpoints to feed to statfs()
if not by parsing mountinfo? IDGI...

2018-05-03 13:58:58

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 0/6 resend] statfs: handle mount propagation

On Thu, May 03, 2018 at 02:43:17PM +0100, Al Viro wrote:
> On Thu, May 03, 2018 at 03:04:36PM +0200, Christian Brauner wrote:
>
> > >From a userspace perspective we often run into the case where we simply
> > want to know whether a given mountpoint is MS_SHARED or is MS_SLAVE.
> > If it is we remount it as MS_PRIVATE to prevent any propagation from
> > happening. We don't care about the peer relationship or how the
> > propagation is exactly setup. We only want to prevent any propagation
> > from happening.
>
> So what's to stop you from doing exactly that --
> mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL)
> without bothering with statfs() in the first place? It's not like the
> damn thing had been costly - it's O(mounts in your namespace) with not
> to high constant, and in any case cheaper than allocating all of them
> back when you did clone(2). Confused...

That's the case for the whole rootfs. But there are cases where I want
to leave the rootfs itself alone and only remount some paths.

>
> > The above case is what I see most often. A more specific use-case is to
> > differentiate between MS_SLAVE and MS_SHARED mountpoints.
> > Mountpoints that are MS_SLAVE are kept intact and mountpoints that are
> > MS_SHARED are made MS_PRIVATE.
> >
> > For both cases the only way to do this right now is by parsing
> > /proc/<pid>/mountinfo. Yes, it is doable but still it is somewhat costly
> > and annoying as e.g. those mount propagation fields are optional.
>
> Umm... And how would you get the list of mountpoints to feed to statfs()
> if not by parsing mountinfo? IDGI...

There are cases where you have list of known-mountpoints or paths to
check but you don't necessarily know whether they are MS_SHARED or
MS_SLAVE.

There's also a case where I send a list of fds via SCM_RIGHTS to an
isolated process that calls fstatvfs() on these fds to determine their
mount flags without changing them but cache this info.

Christian