2008-01-16 22:12:55

by Miklos Szeredi

[permalink] [raw]
Subject: [patch] VFS: extend /proc/mounts

The reason, why this patch was dug up, is that if the bdi-sysfs patch
is going to use device numbers to identify BDIs, then there should be
a way for the user to map the device number into mount(s).

But it's useful regardless of the bdi-sysfs patch.

Can this be added to -mm?

In theory it could break userspace, but I think it's very unlikely to
do so, because stuff is added only at the end of the lines, and
because most programs probably parse it through the libc interface
which is not broken by this change. Despite this, it should be tested
on as many systems as possible.

The alternative (and completely safe) solution is to add another file
to proc. Me no likey.

Miklos

----
From: Ram Pai <[email protected]>

/proc/mounts in its current state fail to disambiguate bind mounts, especially
when the bind mount is subrooted. Also it does not capture propagation state of
the mounts(shared-subtree). The following patch addresses the problem.

The following additional fields to /proc/mounts are added.

propagation-type in the form of <propagation_flag>[:<mntid>][,...]
note: 'shared' flag is followed by the mntid of its peer mount
'slave' flag is followed by the mntid of its master mount
'private' flag stands by itself
'unbindable' flag stands by itself

mntid -- is a unique identifier of the mount
major:minor -- is the major minor number of the device hosting the filesystem
dir -- the subdir in the filesystem which forms the root of this mount
parent -- the id of the parent mount


Here is a sample cat /proc/mounts after execution the following commands:

mount --bind /mnt /mnt
mount --make-shared /mnt
mount --bind /mnt/1 /var
mount --make-slave /var
mount --make-shared /var
mount --bind /var/abc /tmp
mount --make-unbindable /proc

rootfs / rootfs rw 0 0 private 2 0:1 / 2
/dev/root / ext2 rw 0 0 private 16 98:0 / 2
/proc /proc proc rw 0 0 unbindable 17 0:3 / 16
devpts /dev/pts devpts rw 0 0 private 18 0:10 / 16
/dev/root /mnt ext2 rw 0 0 shared:19 19 98:0 /mnt 16
/dev/root /var ext2 rw 0 0 shared:21,slave:19 20 98:0 /mnt/1 16
/dev/root /tmp ext2 rw 0 0 shared:20,slave:19 21 98:0 /mnt/1/abc 16

For example, the last line indicates that :

1) The mount is a shared mount.
2) Its peer mount of mount with id 20
3) It is also a slave mount of the master-mount with the id 19
4) The filesystem on device with major/minor number 98:0 and subdirectory
mnt/1/abc makes the root directory of this mount.
5) And finally the mount with id 16 is its parent.


Testing: symlinked /etc/mtab to /proc/mounts and did some mount and df
commands. They worked normally.

[[email protected]]:

- for mount ID's use IRA instead of a 32bit counter, which could overflow
- print canonical ID's (smallest one within the peer group) for peers
and master, this is more useful, than a random ID within the same namespace
- fix a couple of small bugs
- style fixes

Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
---

Index: linux/fs/dcache.c
===================================================================
--- linux.orig/fs/dcache.c 2008-01-09 21:16:28.000000000 +0100
+++ linux/fs/dcache.c 2008-01-16 20:26:56.000000000 +0100
@@ -1890,6 +1890,60 @@ char *dynamic_dname(struct dentry *dentr
return memcpy(buffer, temp, sz);
}

+static inline int prepend(char **buffer, int *buflen, const char *str,
+ int namelen)
+{
+ *buflen -= namelen;
+ if (*buflen < 0)
+ return 1;
+ *buffer -= namelen;
+ memcpy(*buffer, str, namelen);
+ return 0;
+}
+
+/*
+ * Write full pathname from the root of the filesystem into the buffer.
+ */
+char *dentry_path(struct dentry *dentry, char *buf, int buflen)
+{
+ char *end = buf + buflen;
+ char *retval;
+
+ spin_lock(&dcache_lock);
+ prepend(&end, &buflen, "\0", 1);
+ if (!IS_ROOT(dentry) && d_unhashed(dentry)) {
+ if (prepend(&end, &buflen, "//deleted", 9))
+ goto Elong;
+ }
+ if (buflen < 1)
+ goto Elong;
+ /* Get '/' right */
+ retval = end-1;
+ *retval = '/';
+
+ for (;;) {
+ struct dentry *parent;
+ if (IS_ROOT(dentry))
+ break;
+
+ parent = dentry->d_parent;
+ prefetch(parent);
+
+ if (prepend(&end, &buflen, dentry->d_name.name,
+ dentry->d_name.len) ||
+ prepend(&end, &buflen, "/", 1))
+ goto Elong;
+
+ retval = end;
+ dentry = parent;
+ }
+ spin_unlock(&dcache_lock);
+ return retval;
+Elong:
+ spin_unlock(&dcache_lock);
+ return ERR_PTR(-ENAMETOOLONG);
+}
+
/*
* NOTE! The user-level library version returns a
* character pointer. The kernel system call just
Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c 2008-01-16 18:42:01.000000000 +0100
+++ linux/fs/namespace.c 2008-01-16 22:29:19.000000000 +0100
@@ -27,6 +27,7 @@
#include <linux/mount.h>
#include <linux/ramfs.h>
#include <linux/log2.h>
+#include <linux/idr.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include "pnode.h"
@@ -39,6 +40,7 @@
__cacheline_aligned_in_smp DEFINE_SPINLOCK(vfsmount_lock);

static int event;
+static DEFINE_IDA(mnt_id_ida);

static struct list_head *mount_hashtable __read_mostly;
static struct kmem_cache *mnt_cache __read_mostly;
@@ -58,10 +60,41 @@ static inline unsigned long hash(struct

#define MNT_WRITER_UNDERFLOW_LIMIT -(1<<16)

+static int mnt_alloc_id(struct vfsmount *mnt)
+{
+ int res;
+
+ retry:
+ spin_lock(&vfsmount_lock);
+ res = ida_get_new(&mnt_id_ida, &mnt->mnt_id);
+ spin_unlock(&vfsmount_lock);
+ if (res == -EAGAIN) {
+ if (ida_pre_get(&mnt_id_ida, GFP_KERNEL))
+ goto retry;
+ return -ENOMEM;
+ }
+ return res;
+}
+
+static void mnt_free_id(struct vfsmount *mnt)
+{
+ spin_lock(&vfsmount_lock);
+ ida_remove(&mnt_id_ida, mnt->mnt_id);
+ spin_unlock(&vfsmount_lock);
+}
+
struct vfsmount *alloc_vfsmnt(const char *name)
{
struct vfsmount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
if (mnt) {
+ int err;
+
+ err = mnt_alloc_id(mnt);
+ if (err) {
+ kmem_cache_free(mnt_cache, mnt);
+ return NULL;
+ }
+
atomic_set(&mnt->mnt_count, 1);
INIT_LIST_HEAD(&mnt->mnt_hash);
INIT_LIST_HEAD(&mnt->mnt_child);
@@ -338,6 +371,7 @@ EXPORT_SYMBOL(simple_set_mnt);
void free_vfsmnt(struct vfsmount *mnt)
{
kfree(mnt->mnt_devname);
+ mnt_free_id(mnt);
kmem_cache_free(mnt_cache, mnt);
}

@@ -646,7 +680,23 @@ static int show_vfsmnt(struct seq_file *
}
if (mnt->mnt_sb->s_op->show_options)
err = mnt->mnt_sb->s_op->show_options(m, mnt);
- seq_puts(m, " 0 0\n");
+ seq_puts(m, " 0 0 ");
+ if (IS_MNT_SHARED(mnt)) {
+ seq_printf(m, "shared:%i", get_peer_group_id(mnt));
+ if (IS_MNT_SLAVE(mnt))
+ seq_printf(m, ",slave:%i", get_master_id(mnt));
+ } else if (IS_MNT_SLAVE(mnt)) {
+ seq_printf(m, "slave:%i", get_master_id(mnt));
+ } else if (IS_MNT_UNBINDABLE(mnt)) {
+ seq_printf(m, "unbindable");
+ } else {
+ seq_printf(m, "private");
+ }
+ seq_printf(m, " %i %u:%u ", mnt->mnt_id,
+ MAJOR(mnt->mnt_sb->s_dev),
+ MINOR(mnt->mnt_sb->s_dev));
+ seq_dentry(m, mnt->mnt_root, " \t\n\\");
+ seq_printf(m, " %i\n", mnt->mnt_parent->mnt_id);
return err;
}

Index: linux/fs/seq_file.c
===================================================================
--- linux.orig/fs/seq_file.c 2008-01-09 21:16:28.000000000 +0100
+++ linux/fs/seq_file.c 2008-01-16 21:59:15.000000000 +0100
@@ -349,28 +349,40 @@ int seq_printf(struct seq_file *m, const
}
EXPORT_SYMBOL(seq_printf);

+static inline char *mangle_path(char *s, char *p, char *esc)
+{
+ while (s <= p) {
+ char c = *p++;
+ if (!c) {
+ return s;
+ } else if (!strchr(esc, c)) {
+ *s++ = c;
+ } else if (s + 4 > p) {
+ break;
+ } else {
+ *s++ = '\\';
+ *s++ = '0' + ((c & 0300) >> 6);
+ *s++ = '0' + ((c & 070) >> 3);
+ *s++ = '0' + (c & 07);
+ }
+ }
+ return NULL;
+}
+
+/*
+ * return the absolute path of 'dentry' residing in mount 'mnt'.
+ */
int seq_path(struct seq_file *m, struct path *path, char *esc)
{
if (m->count < m->size) {
char *s = m->buf + m->count;
char *p = d_path(path, s, m->size - m->count);
if (!IS_ERR(p)) {
- while (s <= p) {
- char c = *p++;
- if (!c) {
- p = m->buf + m->count;
- m->count = s - m->buf;
- return s - p;
- } else if (!strchr(esc, c)) {
- *s++ = c;
- } else if (s + 4 > p) {
- break;
- } else {
- *s++ = '\\';
- *s++ = '0' + ((c & 0300) >> 6);
- *s++ = '0' + ((c & 070) >> 3);
- *s++ = '0' + (c & 07);
- }
+ s = mangle_path(s, p, esc);
+ if (s) {
+ p = m->buf + m->count;
+ m->count = s - m->buf;
+ return s - p;
}
}
}
@@ -379,6 +391,28 @@ int seq_path(struct seq_file *m, struct
}
EXPORT_SYMBOL(seq_path);

+/*
+ * returns the path of the 'dentry' from the root of its filesystem.
+ */
+int seq_dentry(struct seq_file *m, struct dentry *dentry, char *esc)
+{
+ if (m->count < m->size) {
+ char *s = m->buf + m->count;
+ char *p = dentry_path(dentry, s, m->size - m->count);
+ if (!IS_ERR(p)) {
+ s = mangle_path(s, p, esc);
+ if (s) {
+ p = m->buf + m->count;
+ m->count = s - m->buf;
+ return s - p;
+ }
+ }
+ }
+ m->count = m->size;
+ return -1;
+}
+EXPORT_SYMBOL(seq_dentry);
+
static void *single_start(struct seq_file *p, loff_t *pos)
{
return NULL + (*pos == 0);
Index: linux/include/linux/dcache.h
===================================================================
--- linux.orig/include/linux/dcache.h 2008-01-09 21:16:29.000000000 +0100
+++ linux/include/linux/dcache.h 2008-01-16 19:00:38.000000000 +0100
@@ -302,6 +302,7 @@ extern int d_validate(struct dentry *, s
extern char *dynamic_dname(struct dentry *, char *, int, const char *, ...);

extern char *d_path(struct path *, char *, int);
+extern char *dentry_path(struct dentry *, char *, int);

/* Allocation counts.. */

Index: linux/include/linux/seq_file.h
===================================================================
--- linux.orig/include/linux/seq_file.h 2008-01-09 21:16:29.000000000 +0100
+++ linux/include/linux/seq_file.h 2008-01-16 19:03:36.000000000 +0100
@@ -10,6 +10,7 @@ struct seq_operations;
struct file;
struct path;
struct inode;
+struct dentry;

struct seq_file {
char *buf;
@@ -43,6 +44,7 @@ int seq_printf(struct seq_file *, const
__attribute__ ((format (printf,2,3)));

int seq_path(struct seq_file *, struct path *, char *);
+int seq_dentry(struct seq_file *, struct dentry *, char *);

int single_open(struct file *, int (*)(struct seq_file *, void *), void *);
int single_release(struct inode *, struct file *);
Index: linux/fs/pnode.c
===================================================================
--- linux.orig/fs/pnode.c 2008-01-16 13:25:17.000000000 +0100
+++ linux/fs/pnode.c 2008-01-16 22:27:33.000000000 +0100
@@ -27,6 +27,41 @@ static inline struct vfsmount *next_slav
return list_entry(p->mnt_slave.next, struct vfsmount, mnt_slave);
}

+static int __peer_group_id(struct vfsmount *mnt)
+{
+ struct vfsmount *m;
+ int id = mnt->mnt_id;
+
+ for (m = next_peer(mnt); m != mnt; m = next_peer(m))
+ id = min(id, m->mnt_id);
+
+ return id;
+}
+
+/* return the smallest ID within the peer group */
+int get_peer_group_id(struct vfsmount *mnt)
+{
+ int id;
+
+ spin_lock(&vfsmount_lock);
+ id = __peer_group_id(mnt);
+ spin_unlock(&vfsmount_lock);
+
+ return id;
+}
+
+/* return the smallest ID within the master's peer group */
+int get_master_id(struct vfsmount *mnt)
+{
+ int id;
+
+ spin_lock(&vfsmount_lock);
+ id = __peer_group_id(mnt->mnt_master);
+ spin_unlock(&vfsmount_lock);
+
+ return id;
+}
+
static int do_make_slave(struct vfsmount *mnt)
{
struct vfsmount *peer_mnt = mnt, *master = mnt->mnt_master;
Index: linux/fs/pnode.h
===================================================================
--- linux.orig/fs/pnode.h 2008-01-16 18:42:01.000000000 +0100
+++ linux/fs/pnode.h 2008-01-16 22:28:35.000000000 +0100
@@ -35,4 +35,6 @@ int propagate_mnt(struct vfsmount *, str
struct list_head *);
int propagate_umount(struct list_head *);
int propagate_mount_busy(struct vfsmount *, int);
+int get_peer_group_id(struct vfsmount *);
+int get_master_id(struct vfsmount *);
#endif /* _LINUX_PNODE_H */
Index: linux/include/linux/mount.h
===================================================================
--- linux.orig/include/linux/mount.h 2008-01-16 18:42:01.000000000 +0100
+++ linux/include/linux/mount.h 2008-01-16 21:20:04.000000000 +0100
@@ -56,6 +56,7 @@ struct vfsmount {
struct list_head mnt_slave; /* slave list entry */
struct vfsmount *mnt_master; /* slave is on master->mnt_slave_list */
struct mnt_namespace *mnt_ns; /* containing namespace */
+ int mnt_id; /* mount identifier */
/*
* We put mnt_count & mnt_expiry_mark at the end of struct vfsmount
* to let these frequently modified fields in a separate cache line


2008-01-16 22:35:29

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

On Wed, 16 Jan 2008 23:12:31 +0100
Miklos Szeredi <[email protected]> wrote:

> The reason, why this patch was dug up, is that if the bdi-sysfs patch
> is going to use device numbers to identify BDIs, then there should be
> a way for the user to map the device number into mount(s).
>
> But it's useful regardless of the bdi-sysfs patch.

Don't know what that is.

> Can this be added to -mm?
>
> In theory it could break userspace, but I think it's very unlikely to
> do so, because stuff is added only at the end of the lines, and
> because most programs probably parse it through the libc interface
> which is not broken by this change. Despite this, it should be tested
> on as many systems as possible.

Seems like a plain bad idea to me. There will be any number of home-made
/proc/mounts parsers and we don't know what they do.

> - for mount ID's use IRA instead of a 32bit counter, which could overflow

don't know what an IRA is.

> - print canonical ID's (smallest one within the peer group) for peers
> and master, this is more useful, than a random ID within the same namespace
> - fix a couple of small bugs
> - style fixes
>
> Signed-off-by: Ram Pai <[email protected]>
> Signed-off-by: Miklos Szeredi <[email protected]>

Both the newly-added inlines in this patch are wrong. They will result in
a larger and slower kernel. This should be very well known by now.

2008-01-16 23:20:13

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

> > The reason, why this patch was dug up, is that if the bdi-sysfs patch
> > is going to use device numbers to identify BDIs, then there should be
> > a way for the user to map the device number into mount(s).
> >
> > But it's useful regardless of the bdi-sysfs patch.
>
> Don't know what that is.

Subject: mm: sysfs: expose the BDI object in sysfs

Provide a place in sysfs for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.

In particular this properly exposes the read-ahead window for all
relevant users and /sys/block/<block>/queue/read_ahead_kb should be
deprecated.

> > Can this be added to -mm?
> >
> > In theory it could break userspace, but I think it's very unlikely to
> > do so, because stuff is added only at the end of the lines, and
> > because most programs probably parse it through the libc interface
> > which is not broken by this change. Despite this, it should be tested
> > on as many systems as possible.
>
> Seems like a plain bad idea to me. There will be any number of home-made
> /proc/mounts parsers and we don't know what they do.

Dunno. I feel, this is quite safe, because even the home-grown
parsers will likely care about any junk at the end of the line. But
of course this cannot be proved.

> > - for mount ID's use IRA instead of a 32bit counter, which could overflow
>
> don't know what an IRA is.

That was meant to be IDA (from the IDR library).

> > - print canonical ID's (smallest one within the peer group) for peers
> > and master, this is more useful, than a random ID within the same namespace
> > - fix a couple of small bugs
> > - style fixes
> >
> > Signed-off-by: Ram Pai <[email protected]>
> > Signed-off-by: Miklos Szeredi <[email protected]>
>
> Both the newly-added inlines in this patch are wrong. They will result in
> a larger and slower kernel. This should be very well known by now.

I'll get rid of them.

Miklos

2008-01-16 23:51:17

by Karel Zak

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

On Wed, Jan 16, 2008 at 02:30:51PM -0800, Andrew Morton wrote:
> On Wed, 16 Jan 2008 23:12:31 +0100
> Miklos Szeredi <[email protected]> wrote:
> >
> > In theory it could break userspace, but I think it's very unlikely to
> > do so, because stuff is added only at the end of the lines, and
> > because most programs probably parse it through the libc interface
> > which is not broken by this change. Despite this, it should be tested
> > on as many systems as possible.
>
> Seems like a plain bad idea to me. There will be any number of home-made
> /proc/mounts parsers and we don't know what they do.

So, let's use /proc/mounts_v2 ;-)

Karel

--
Karel Zak <[email protected]>

2008-01-16 23:58:21

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts


On Jan 17 2008 00:43, Karel Zak wrote:
>>
>> Seems like a plain bad idea to me. There will be any number of home-made
>> /proc/mounts parsers and we don't know what they do.
>
> So, let's use /proc/mounts_v2 ;-)

Was not it like "don't use /proc for new things"?

2008-01-17 00:11:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

On Thu, 17 Jan 2008 00:58:06 +0100 (CET) Jan Engelhardt <[email protected]> wrote:

>
> On Jan 17 2008 00:43, Karel Zak wrote:
> >>
> >> Seems like a plain bad idea to me. There will be any number of home-made
> >> /proc/mounts parsers and we don't know what they do.
> >
> > So, let's use /proc/mounts_v2 ;-)
>
> Was not it like "don't use /proc for new things"?

Well yeah. If we're going to do a brand new mechanism to expose
per-mount data then we should hunker down and get it right.

2008-01-17 00:33:42

by NeilBrown

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

On Thursday January 17, [email protected] wrote:
>
> On Jan 17 2008 00:43, Karel Zak wrote:
> >>
> >> Seems like a plain bad idea to me. There will be any number of home-made
> >> /proc/mounts parsers and we don't know what they do.
> >
> > So, let's use /proc/mounts_v2 ;-)
>
> Was not it like "don't use /proc for new things"?

I thought it was "don't use /proc for new things that aren't process
related".

And as the mount table is per process......

A host has a bunch of mounted filesystems (struct super_block), and
each process has some subset of these stitched together into a mount
tree (struct vfsmount / struct namespace).

There needs to be something in /proc that exposes the vfsmount tree.

Arguably there should be something else - maybe in sysfs - that
provides access to the "struct superblock" object.

And there needs to be a clear way to relate information from one with
information from the other.

In the tradition of stat, statm, status, maybe the former should be
/proc/$PID/mountm
:-)

Hey, I just found /proc/X/mountstats. How does this fit in to the big
picture?

NeilBrown

2008-01-17 01:17:41

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts


On Jan 17 2008 11:33, Neil Brown wrote:
>On Thursday January 17, [email protected] wrote:
>>
>> On Jan 17 2008 00:43, Karel Zak wrote:
>> >>
>> >> Seems like a plain bad idea to me. There will be any number of home-made
>> >> /proc/mounts parsers and we don't know what they do.
>> >
>> > So, let's use /proc/mounts_v2 ;-)
>>
>> Was not it like "don't use /proc for new things"?
>
>I thought it was "don't use /proc for new things that aren't process
>related".
>
>And as the mount table is per process......

You are right. I'm still in the world where CLONE_NEWNS is not used all
that much in the daily routine, either by the distro or by me.

>In the tradition of stat, statm, status, maybe the former should be
> /proc/$PID/mountm

What next - /proc/pid/mountus? :)

2008-01-17 03:04:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

Andrew Morton wrote:
>
> Seems like a plain bad idea to me. There will be any number of home-made
> /proc/mounts parsers and we don't know what they do.
>

There is a lot of precedent for adding fields at the end. Since the
last fields in current /proc/*/mounts are dummy fields anyway, it
doesn't matter if the homegrown parsers concatenate the additional
information to those.

-hpa

2008-01-17 03:33:25

by Al Viro

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

On Wed, Jan 16, 2008 at 11:12:31PM +0100, Miklos Szeredi wrote:
> The alternative (and completely safe) solution is to add another file
> to proc. Me no likey.

Since we need saner layout, I would strongly suggest exactly that.

> major:minor -- is the major minor number of the device hosting the filesystem

Bad description. "Value of st_dev for files on that filesystem", please -
there might be no such thing as "the device hosting the filesystem" _and_
the value here may bloody well be unrelated to device actually holding
all data (for things like ext2meta, etc.).

> 1) The mount is a shared mount.
> 2) Its peer mount of mount with id 20
> 3) It is also a slave mount of the master-mount with the id 19
> 4) The filesystem on device with major/minor number 98:0 and subdirectory
> mnt/1/abc makes the root directory of this mount.
> 5) And finally the mount with id 16 is its parent.

I'd suggest doing a new file that would *not* try to imitate /etc/mtab.
Another thing is, how much of propagation information do we want to
be exposed and what do we intend to do with it? Note that "entire
propagation tree" is out of question - it spans many namespaces and
contains potentially sensitive information. So we won't see all nodes.

What do we want to *do* with the information about propagation?

2008-01-17 03:36:38

by Al Viro

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

On Wed, Jan 16, 2008 at 04:09:30PM -0800, Andrew Morton wrote:
> On Thu, 17 Jan 2008 00:58:06 +0100 (CET) Jan Engelhardt <[email protected]> wrote:
>
> >
> > On Jan 17 2008 00:43, Karel Zak wrote:
> > >>
> > >> Seems like a plain bad idea to me. There will be any number of home-made
> > >> /proc/mounts parsers and we don't know what they do.
> > >
> > > So, let's use /proc/mounts_v2 ;-)
> >
> > Was not it like "don't use /proc for new things"?
>
> Well yeah. If we're going to do a brand new mechanism to expose
> per-mount data then we should hunker down and get it right.

Which automatically means "no sysfs". We are NOT converting vfsmounts
to kobject-based lifetime rules.

2008-01-17 08:36:35

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

> > The alternative (and completely safe) solution is to add another file
> > to proc. Me no likey.
>
> Since we need saner layout, I would strongly suggest exactly that.

I don't think there's all that much wrong with the current layout,
except the two dummy zeroes at the end. Or, something else needs
fixing in there?

> > major:minor -- is the major minor number of the device hosting the filesystem
>
> Bad description. "Value of st_dev for files on that filesystem", please -
> there might be no such thing as "the device hosting the filesystem" _and_
> the value here may bloody well be unrelated to device actually holding
> all data (for things like ext2meta, etc.).

Right.

> > 1) The mount is a shared mount.
> > 2) Its peer mount of mount with id 20
> > 3) It is also a slave mount of the master-mount with the id 19
> > 4) The filesystem on device with major/minor number 98:0 and subdirectory
> > mnt/1/abc makes the root directory of this mount.
> > 5) And finally the mount with id 16 is its parent.
>
> I'd suggest doing a new file that would *not* try to imitate /etc/mtab.
> Another thing is, how much of propagation information do we want to
> be exposed and what do we intend to do with it?

I think the scheme devised by Ram is basically right. It shows the
relationships (slave, peer) and the ID of a master/peer mount.

What I changed, is to always show a canonical peer, because I think
that is more useful in establishing relationships between mounts. Is
this info sensitive? I can't see why it would be.

> Note that "entire
> propagation tree" is out of question - it spans many namespaces and
> contains potentially sensitive information. So we won't see all nodes.

With multiple namespaces, of course you are only allowed to see a part
of the tree, but you could have xterms for all of them, and can put
together the big picture from the pieces.

> What do we want to *do* with the information about propagation?

Just feedback about the state of the thing. It's very annoying, that
after setting up propagation, it's impossible to check the result.

Miklos

2008-01-17 08:55:53

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

> Hey, I just found /proc/X/mountstats. How does this fit in to the big
> picture?

It seems to show some counters for NFS mounts, no other filesystem
uses it. Format looks rather less nice, than /proc/X/mounts (why do
we need long english sentences under /proc?).

Miklos

2008-01-17 10:36:38

by Karel Zak

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

On Thu, Jan 17, 2008 at 09:36:11AM +0100, Miklos Szeredi wrote:
> > I'd suggest doing a new file that would *not* try to imitate /etc/mtab.
> > Another thing is, how much of propagation information do we want to
> > be exposed and what do we intend to do with it?
>
> I think the scheme devised by Ram is basically right. It shows the
> relationships (slave, peer) and the ID of a master/peer mount.

Yes. It also shows the full relationship between source and
destination for bind mounts. Now the /proc/mounts is useless:

# mount --bind /mnt/test /mnt/test2

# cat /proc/mounts | grep test
/dev/root /mnt/test2 ext3 rw,noatime,data=ordered 0 0


> > What do we want to *do* with the information about propagation?
>
> Just feedback about the state of the thing. It's very annoying, that
> after setting up propagation, it's impossible to check the result.

Exactly.

Karel

--
Karel Zak <[email protected]>

2008-01-17 15:38:38

by Chuck Lever III

[permalink] [raw]
Subject: Re: [patch] VFS: extend /proc/mounts

On Jan 17, 2008, at 3:55 AM, Miklos Szeredi wrote:
>> Hey, I just found /proc/X/mountstats. How does this fit in to the
>> big
>> picture?
>
> It seems to show some counters for NFS mounts, no other filesystem
> uses it. Format looks rather less nice, than /proc/X/mounts (why do
> we need long english sentences under /proc?).


I introduced /proc/self/mountstats because we need a way for non-
block-device-based file systems to report I/O statistics. Everything
else I tried was rejected, and apparently what we ended up with was
reviewed by only a handful of people, so no one else likes it or uses
it.

It can go away for all I care, as long as we retain some flexible
mechanism for non-block-based file systems to report I/O stats. As
far as I am aware, there are only two user utilities that understand
and parse this data, and I maintain both.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com