2007-06-20 20:57:46

by H. Peter Anvin

[permalink] [raw]
Subject: Adding subroot information to /proc/mounts, or obtaining that through other means

Right now it is actually impossible to conclusively determine a
filesystem-relative path in the presence of bind (and possibly move)
mounts. This is highly desirable to be able to do in contexts that
involve non-Linux (or not-the-current-instance-of-Linux) accesses to the
filesystem, e.g. other filesystems or bootloaders.

Example:

Let's assume /dev/md6 is mounted on /export. Then /export/users/foo and
/exports/users/bar are bind-mounted to /home/foo and /home/bar respectively.

/proc/mounts will show:

/dev/md6 /export ext3 rw,data=ordered 0 0
/dev/md6 /home/foo ext3 rw,data=ordered 0 0
/dev/md6 /home/bar ext3 rw,data=ordered 0 0

... with no indication that anything is amiss. The latter two fields
are confusing, at best.

We could add a field to /proc/mounts to add this information:

/dev/md6 /export ext3 rw,data=ordered 0 0 /
/dev/md6 /home/foo ext3 rw,data=ordered 0 0 /users/foo
/dev/md6 /home/bar ext3 rw,data=ordered 0 0 /users/bar

... or, alternatively, add a subfield to the first field (which would
entail escaping whatever separator we choose):

/dev/md6 /export ext3 rw,data=ordered 0 0
/dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
/dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0

One could also consider providing a system call (or ioctl, ...) to get
this information, effectively as an augmentation to stat(). If that's
the case, it would probably be a good thing if this "stat-plus" system
call could in the future be expanded to contain additional information
without having to change a structure every time, perhaps using a method
similar to sendmsg/recvmsg, as ugly as those are.

I'm personally leaning toward the second option (/dev/md6:/users/foo).
Although that might confuse current utilities, those utilities are
*already* liable to get confused by the fact that the line doesn't mean
what they think it means.

-hpa


2007-06-20 21:03:53

by Al Viro

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
> ... or, alternatively, add a subfield to the first field (which would
> entail escaping whatever separator we choose):
>
> /dev/md6 /export ext3 rw,data=ordered 0 0
> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0

Hell, no. The first field is in principle impossible to parse unless
you know the fs type.

How about making a new file with sane format? From the very
beginning. E.g. mountpoint + ID + relative path + type + options,
where ID uniquely identifies superblock (e.g. numeric st_dev)
and backing device (if any) is sitting among the options...

2007-06-20 21:24:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Al Viro wrote:
> On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
>> ... or, alternatively, add a subfield to the first field (which would
>> entail escaping whatever separator we choose):
>>
>> /dev/md6 /export ext3 rw,data=ordered 0 0
>> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
>> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
>
> Hell, no. The first field is in principle impossible to parse unless
> you know the fs type.
>
> How about making a new file with sane format? From the very
> beginning. E.g. mountpoint + ID + relative path + type + options,
> where ID uniquely identifies superblock (e.g. numeric st_dev)
> and backing device (if any) is sitting among the options...

Okay, I see there has been some discussion on this earlier, based on a
proposal by Ram Pai, so it pretty much comes down to redesigning this
right. I see some issues with his proposal (device numbers exported to
userspace in text form should be separated into major:minor form, for
one thing.) I know the util-linux-ng people have also had issues with
/proc/mounts that they would like resolved in order to finally nuke
/etc/mtab.

Is Ram still working on this? I'd like to help make this happen so we
can be done with it.

-hpa



2007-06-20 22:06:18

by Karel Zak

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
>
> We could add a field to /proc/mounts to add this information:
>
> /dev/md6 /export ext3 rw,data=ordered 0 0 /
> /dev/md6 /home/foo ext3 rw,data=ordered 0 0 /users/foo
> /dev/md6 /home/bar ext3 rw,data=ordered 0 0 /users/bar

I prefer this format. It's compatible with the mount(8) -- the mount
ignores extra columns.

> ... or, alternatively, add a subfield to the first field (which would
> entail escaping whatever separator we choose):
>
> /dev/md6 /export ext3 rw,data=ordered 0 0
> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0

We needn't a new separator (':') there already is one (' ').

> I'm personally leaning toward the second option (/dev/md6:/users/foo).
> Although that might confuse current utilities, those utilities are
> *already* liable to get confused by the fact that the line doesn't mean
> what they think it means.

Many people use "ln -s /proc/mounts /etc/mtab".

Karel

--
Karel Zak <[email protected]>

2007-06-20 22:08:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Karel Zak wrote:
> On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
>> We could add a field to /proc/mounts to add this information:
>>
>> /dev/md6 /export ext3 rw,data=ordered 0 0 /
>> /dev/md6 /home/foo ext3 rw,data=ordered 0 0 /users/foo
>> /dev/md6 /home/bar ext3 rw,data=ordered 0 0 /users/bar
>
> I prefer this format. It's compatible with the mount(8) -- the mount
> ignores extra columns.
>
>> ... or, alternatively, add a subfield to the first field (which would
>> entail escaping whatever separator we choose):
>>
>> /dev/md6 /export ext3 rw,data=ordered 0 0
>> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
>> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
>
> We needn't a new separator (':') there already is one (' ').
>
>> I'm personally leaning toward the second option (/dev/md6:/users/foo).
>> Although that might confuse current utilities, those utilities are
>> *already* liable to get confused by the fact that the line doesn't mean
>> what they think it means.
>
> Many people use "ln -s /proc/mounts /etc/mtab".
>

Out of curiosity, and trying to better grok the problem, what would
mount(8) do differently with the second format versus the first?

-hpa

2007-06-20 22:25:20

by Karel Zak

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On Wed, Jun 20, 2007 at 10:03:43PM +0100, Al Viro wrote:
> On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
> > ... or, alternatively, add a subfield to the first field (which would
> > entail escaping whatever separator we choose):
> >
> > /dev/md6 /export ext3 rw,data=ordered 0 0
> > /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
> > /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
>
> Hell, no. The first field is in principle impossible to parse unless
> you know the fs type.
>
> How about making a new file with sane format? From the very
> beginning. E.g. mountpoint + ID + relative path + type + options,
> where ID uniquely identifies superblock (e.g. numeric st_dev)
> and backing device (if any) is sitting among the options...

Yeah. How about include propagation trees to this file?

mountpoint + ID + relative path + type + options + propagation-flag +
{peer,master}-mount-id

/ 0xa917800 / ext3 rw PRIVATE
/mnt 0xa917100 / ext3 rw SHARED peer:0xa917100
/tmp 0xa917f00 /1 ext3 rw SLAVE master:0xa917100



Karel


--
Karel Zak <[email protected]>

2007-06-20 22:34:55

by Chuck Lever III

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Al Viro wrote:
> On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
>> ... or, alternatively, add a subfield to the first field (which would
>> entail escaping whatever separator we choose):
>>
>> /dev/md6 /export ext3 rw,data=ordered 0 0
>> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
>> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
>
> Hell, no. The first field is in principle impossible to parse unless
> you know the fs type.
>
> How about making a new file with sane format? From the very
> beginning. E.g. mountpoint + ID + relative path + type + options,
> where ID uniquely identifies superblock (e.g. numeric st_dev)
> and backing device (if any) is sitting among the options...

To support NFS client performance statistics, I recently added
/proc/self/mountstats. That might be a place to add details about
--move and --bind mounts without changing the format of /proc/mounts.


Attachments:
chuck.lever.vcf (291.00 B)

2007-06-20 22:44:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Karel Zak wrote:
>
> Yeah. How about include propagation trees to this file?
>
> mountpoint + ID + relative path + type + options + propagation-flag +
> {peer,master}-mount-id
>
> / 0xa917800 / ext3 rw PRIVATE
> /mnt 0xa917100 / ext3 rw SHARED peer:0xa917100
> /tmp 0xa917f00 /1 ext3 rw SLAVE master:0xa917100
>

I think we're talking about a different meaning of "id" here... you seem
to be talking about the vfsmount pointer, whereas it was originally
proposed as mnt_sb->sb_dev. Both are useful, for different reasons of
course.

We should include mnt_devname as well.

People are a bit nervous about exposing kernel pointers in userspace, I
have noticed; would it be better to add a "mnt_id" field to struct
vfsmount; this can simply be a counter assigned when the structure is
assigned and then never changed (it might have to be a 64-bit counter,
but I don't think that adding 8 bytes to struct vfsmount should be a
huge deal.)

Does that service everyone's needs?

-hpa

2007-06-20 22:45:15

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Chuck Lever wrote:
> To support NFS client performance statistics, I recently added
> /proc/self/mountstats. That might be a place to add details about
> --move and --bind mounts without changing the format of /proc/mounts.

I just looked at /proc/self/mountstats; it seems to have no more
information than /proc/self/mounts, but in an even more annoying format.
Either I'm missing something, this file doesn't add anything at all.

-hpa

2007-06-20 22:47:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Dr. David Alan Gilbert wrote:
>
> What happens with the (sick) case of spaces in directory names?
> Also is it really nicely defined that there is no way to put a space
> in an option in any of the filesystems? I suppose someone
> particularly sick could have a device node in a directory with a space
> in it. It would be nice if new formats for this are being defined
> to make it cover everything.
>

That's already handled just fine:

bash-3.1$ mkdir /tmp/'Jag ?r: \
en liten mask'
bash-3.1$ sudo mount -t tmpfs none '/tmp/Jag ?r: \
en liten mask'/
bash-3.1$ tail -1 /proc/mounts
none /tmp/Jag\040?r:\040\134\012en\040liten\040mask tmpfs rw 0 0
bash-3.1$

-hpa

2007-06-20 22:48:36

by Chuck Lever III

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

H. Peter Anvin wrote:
> Chuck Lever wrote:
>> To support NFS client performance statistics, I recently added
>> /proc/self/mountstats. That might be a place to add details about
>> --move and --bind mounts without changing the format of /proc/mounts.
>
> I just looked at /proc/self/mountstats; it seems to have no more
> information than /proc/self/mounts, but in an even more annoying format.
> Either I'm missing something, this file doesn't add anything at all.

The advantage is that it doesn't have strong user space dependencies on
its format like /proc/mounts does.

If you have NFS mount points, you will see that it includes a great deal
of additional information about each mount.


Attachments:
chuck.lever.vcf (291.00 B)

2007-06-20 22:55:31

by Nix

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On 20 Jun 2007, H. Peter Anvin verbalised:

> Right now it is actually impossible to conclusively determine a
> filesystem-relative path in the presence of bind (and possibly move)
> mounts. This is highly desirable to be able to do in contexts that
> involve non-Linux (or not-the-current-instance-of-Linux) accesses to the
> filesystem, e.g. other filesystems or bootloaders.

It's also highly desirable if you want to be able to run a backup :) one
would desire to back up the filesystem as a whole, not some bind mount
of one directory out of it (and backing up both is needless
duplication).

So I applaud this and would be an immediate user, no matter what format
is chosen, as long as we can tell what is mounted where.

(As an aside, it would be nice if mount(8) could supply (a limited
amount of) extra (arbitrary?) textual options to the kernelq,
specifically so that mount options which are only interpreted by
userspace programs, like `user' and the quota options, could appear in
/proc/mounts. That way we could finally ditch bloody /etc/mtab for good.

(Any other approach requires mount(8) to keep track of these options in
a separate file, which brings back exactly the same synchronization
horrors that we're all so nauseatingly familiar with from /etc/mtab.)

> I'm personally leaning toward the second option (/dev/md6:/users/foo).
> Although that might confuse current utilities, those utilities are
> *already* liable to get confused by the fact that the line doesn't mean
> what they think it means.

Quite so. The output from df(8) in the presence of large numbers of bind
mounts was ludicrous before it started explicitly ignoring filesystems
of type `none', and that was arguably the wrong place to fix it.

--
`... in the sense that dragons logically follow evolution so they would
be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
furiously

2007-06-20 22:57:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Chuck Lever wrote:
> The advantage is that it doesn't have strong user space dependencies on
> its format like /proc/mounts does.
>
> If you have NFS mount points, you will see that it includes a great deal
> of additional information about each mount.

OK, I see now:
device raidtest:/export mounted on /net/raidtest/export with fstype nfs
statvers=1.0
opts:
rw,vers=3,rsize=131072,wsize=131072,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys
age: 5
caps: caps=0x9,wtmult=4096,dtsize=4096,bsize=0,namelen=255
sec: flavor=1,pseudoflavor=1
events: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
bytes: 0 0 0 0 0 0 0 0
RPC iostats version: 1.0 p/v: 100003/3 (nfs)
xprt: tcp 686 0 2 0 5 8 8 0 8 0
per-op statistics
NULL: 0 0 0 0 0 0 0 0
GETATTR: 2 2 0 264 224 1 0 1
SETATTR: 0 0 0 0 0 0 0 0
LOOKUP: 0 0 0 0 0 0 0 0
ACCESS: 1 1 0 116 120 0 0 0
READLINK: 0 0 0 0 0 0 0 0
READ: 0 0 0 0 0 0 0 0
WRITE: 0 0 0 0 0 0 0 0
CREATE: 0 0 0 0 0 0 0 0
MKDIR: 0 0 0 0 0 0 0 0
SYMLINK: 0 0 0 0 0 0 0 0
MKNOD: 0 0 0 0 0 0 0 0
REMOVE: 0 0 0 0 0 0 0 0
RMDIR: 0 0 0 0 0 0 0 0
RENAME: 0 0 0 0 0 0 0 0
LINK: 0 0 0 0 0 0 0 0
READDIR: 0 0 0 0 0 0 0 0
READDIRPLUS: 0 0 0 0 0 0 0 0
FSSTAT: 1 1 0 132 84 0 1 1
FSINFO: 1 1 0 132 80 0 0 0
PATHCONF: 0 0 0 0 0 0 0 0
COMMIT: 0 0 0 0 0 0 0 0

This format is just awful for parsing. It's pretty clearly totally
ad-hoc. It's not even self-consistent (it uses different separators,
etc, in the same file!) It's reasonably compact for human consumption,
but it doesn't show what the arrays mean.

Heck, XML would have been better than this mess...

-hpa

2007-06-20 23:03:16

by Chuck Lever III

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

H. Peter Anvin wrote:
> Chuck Lever wrote:
>> The advantage is that it doesn't have strong user space dependencies on
>> its format like /proc/mounts does.
>>
>> If you have NFS mount points, you will see that it includes a great deal
>> of additional information about each mount.
>
> OK, I see now:
> device raidtest:/export mounted on /net/raidtest/export with fstype nfs
> statvers=1.0
> opts:
> rw,vers=3,rsize=131072,wsize=131072,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys
> age: 5
> caps: caps=0x9,wtmult=4096,dtsize=4096,bsize=0,namelen=255
> sec: flavor=1,pseudoflavor=1
> events: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> bytes: 0 0 0 0 0 0 0 0
> RPC iostats version: 1.0 p/v: 100003/3 (nfs)
> xprt: tcp 686 0 2 0 5 8 8 0 8 0
> per-op statistics
> NULL: 0 0 0 0 0 0 0 0
> GETATTR: 2 2 0 264 224 1 0 1
> SETATTR: 0 0 0 0 0 0 0 0
> LOOKUP: 0 0 0 0 0 0 0 0
> ACCESS: 1 1 0 116 120 0 0 0
> READLINK: 0 0 0 0 0 0 0 0
> READ: 0 0 0 0 0 0 0 0
> WRITE: 0 0 0 0 0 0 0 0
> CREATE: 0 0 0 0 0 0 0 0
> MKDIR: 0 0 0 0 0 0 0 0
> SYMLINK: 0 0 0 0 0 0 0 0
> MKNOD: 0 0 0 0 0 0 0 0
> REMOVE: 0 0 0 0 0 0 0 0
> RMDIR: 0 0 0 0 0 0 0 0
> RENAME: 0 0 0 0 0 0 0 0
> LINK: 0 0 0 0 0 0 0 0
> READDIR: 0 0 0 0 0 0 0 0
> READDIRPLUS: 0 0 0 0 0 0 0 0
> FSSTAT: 1 1 0 132 84 0 1 1
> FSINFO: 1 1 0 132 80 0 0 0
> PATHCONF: 0 0 0 0 0 0 0 0
> COMMIT: 0 0 0 0 0 0 0 0
>
> This format is just awful for parsing. It's pretty clearly totally
> ad-hoc. It's not even self-consistent (it uses different separators,
> etc, in the same file!) It's reasonably compact for human consumption,
> but it doesn't show what the arrays mean.
>
> Heck, XML would have been better than this mess...

Sigh. So where where you when I asked for review time and again?

I have a couple of simple Python scripts that can parse this without any
difficulty.

I resent your tone. Quite a bit.


Attachments:
chuck.lever.vcf (291.00 B)

2007-06-20 23:09:57

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

* Karel Zak ([email protected]) wrote:
> On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
> >

<snip>

> > ... or, alternatively, add a subfield to the first field (which would
> > entail escaping whatever separator we choose):
> >
> > /dev/md6 /export ext3 rw,data=ordered 0 0
> > /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
> > /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
>
> We needn't a new separator (':') there already is one (' ').

What happens with the (sick) case of spaces in directory names?
Also is it really nicely defined that there is no way to put a space
in an option in any of the filesystems? I suppose someone
particularly sick could have a device node in a directory with a space
in it. It would be nice if new formats for this are being defined
to make it cover everything.

Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2007-06-21 10:45:44

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

> > Right now it is actually impossible to conclusively determine a
> > filesystem-relative path in the presence of bind (and possibly move)
> > mounts. This is highly desirable to be able to do in contexts that
> > involve non-Linux (or not-the-current-instance-of-Linux) accesses to the
> > filesystem, e.g. other filesystems or bootloaders.
>
> It's also highly desirable if you want to be able to run a backup :) one
> would desire to back up the filesystem as a whole, not some bind mount
> of one directory out of it (and backing up both is needless
> duplication).
>
> So I applaud this and would be an immediate user, no matter what format
> is chosen, as long as we can tell what is mounted where.
>
> (As an aside, it would be nice if mount(8) could supply (a limited
> amount of) extra (arbitrary?) textual options to the kernelq,
> specifically so that mount options which are only interpreted by
> userspace programs, like `user' and the quota options, could appear in
> /proc/mounts. That way we could finally ditch bloody /etc/mtab for good.

I'm working on this actually. See this (and related patches) in -mm:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm2/broken-out/unprivileged-mounts-add-user-mounts-to-the-kernel.patch

This solves the "user=" thing, but is not a generic solution for other
options. And I'm wondering if there is really a need for that.

Which quota options are you thinking about? Some quota options
(e.g. for ext*) seem to be already present in /proc/mounts.

There's also "loop=", but that's not really a per-mount option, but a
per-loop-device option, so it could be stored separately under /var.

Do you know any other options which are only in /etc/mtab, and need to
be stored along with each mount?

Miklos

2007-06-21 16:17:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Al Viro wrote:
> On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
>> ... or, alternatively, add a subfield to the first field (which would
>> entail escaping whatever separator we choose):
>>
>> /dev/md6 /export ext3 rw,data=ordered 0 0
>> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
>> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
>
> Hell, no. The first field is in principle impossible to parse unless
> you know the fs type.
>
> How about making a new file with sane format? From the very
> beginning. E.g. mountpoint + ID + relative path + type + options,
> where ID uniquely identifies superblock (e.g. numeric st_dev)
> and backing device (if any) is sitting among the options...

The more I'm thinking about this, I think it's simplest to just add
fields to the right of the existing /proc/*/mounts. Yes, the format is
ugly, and it will end up being uglier still, but it's also ugly to have
a bunch of different chunks of information formatted in different ways.

So, the existing fields are:

mnt_devname mnt_path filesystem_type options 0 0

... and we'd want to add ...

mnt_id propagation_info sb_dev path_to_fs_root

As previously stated, in order to avoid having to expose kernel
addresses to userspace, I suggest we simply add a counter field to
struct vfsmount and use that for mnt_id.

I'm not all that up on what is needed for propagation_info. I presume
we want to be able to deduce the full mount lattice. One particularly
important thing in my mind is to be able to distinguish overmounted
filesystems (which I think is possible in the current setup only by
ordering -- the filesystem on top I believe will end up last in
/proc/mounts, but I don't know if there actually is anything that
enforces that.)

-hpa

2007-06-21 16:23:48

by Ram Pai

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On Wed, 2007-06-20 at 14:20 -0700, H. Peter Anvin wrote:
> Al Viro wrote:
> > On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
> >> ... or, alternatively, add a subfield to the first field (which would
> >> entail escaping whatever separator we choose):
> >>
> >> /dev/md6 /export ext3 rw,data=ordered 0 0
> >> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
> >> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
> >
> > Hell, no. The first field is in principle impossible to parse unless
> > you know the fs type.
> >
> > How about making a new file with sane format? From the very
> > beginning. E.g. mountpoint + ID + relative path + type + options,
> > where ID uniquely identifies superblock (e.g. numeric st_dev)
> > and backing device (if any) is sitting among the options...
>
> Okay, I see there has been some discussion on this earlier, based on a
> proposal by Ram Pai, so it pretty much comes down to redesigning this
> right. I see some issues with his proposal (device numbers exported to
> userspace in text form should be separated into major:minor form, for
> one thing.) I know the util-linux-ng people have also had issues with
> /proc/mounts that they would like resolved in order to finally nuke
> /etc/mtab.
>
> Is Ram still working on this? I'd like to help make this happen so we
> can be done with it.

Peter, I am not working on it currently. But i am interested in getting
it done. I have the seed set of patches which had Al Viro's ideas
incorporated. Infact those patches were sent on lkml 2 months back.
Shall we start with those patches?

RP


>
> -hpa
>
>

2007-06-21 16:33:25

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Ram Pai wrote:
>
> Peter, I am not working on it currently. But i am interested in getting
> it done. I have the seed set of patches which had Al Viro's ideas
> incorporated. Infact those patches were sent on lkml 2 months back.
> Shall we start with those patches?
>

Are these the "unprivileged mount syscall" patches?

Otherwise I don't see any patches in my personal LKML cache (apparently
my subscription to fsdevel was dropped at some point, so I don't have a
stash of it.)

-hpa

2007-06-21 16:49:20

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Quoting H. Peter Anvin ([email protected]):
> Al Viro wrote:
> > On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:
> >> ... or, alternatively, add a subfield to the first field (which would
> >> entail escaping whatever separator we choose):
> >>
> >> /dev/md6 /export ext3 rw,data=ordered 0 0
> >> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
> >> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
> >
> > Hell, no. The first field is in principle impossible to parse unless
> > you know the fs type.
> >
> > How about making a new file with sane format? From the very
> > beginning. E.g. mountpoint + ID + relative path + type + options,
> > where ID uniquely identifies superblock (e.g. numeric st_dev)
> > and backing device (if any) is sitting among the options...
>
> The more I'm thinking about this, I think it's simplest to just add
> fields to the right of the existing /proc/*/mounts. Yes, the format is
> ugly, and it will end up being uglier still, but it's also ugly to have
> a bunch of different chunks of information formatted in different ways.

Since we're defining the order "arbitrarily" in any case, I really don't
think it's all that ugly.

Are there any existing tools which would not be able to handle the extra
fields?

(suppose it's easiest to just add the fields, try a few distros, and see
which balk)

> So, the existing fields are:
>
> mnt_devname mnt_path filesystem_type options 0 0
>
> ... and we'd want to add ...
>
> mnt_id propagation_info sb_dev path_to_fs_root
>
> As previously stated, in order to avoid having to expose kernel
> addresses to userspace, I suggest we simply add a counter field to
> struct vfsmount and use that for mnt_id.

Agreed - even if it weren't frowned upon to expose the kernel addresses,
it would just be much nicer to have easier to remember ids. Somehow
with the kernel address, even with just a set of 5 of them printed in
front of me it takes me 2 minutes to figure out which ones are the
same...

> I'm not all that up on what is needed for propagation_info. I presume
> we want to be able to deduce the full mount lattice. One particularly

I think Ram's existing patches just provided "PEER (next-peer-id)" or
"SLAVE (master-id)".

> important thing in my mind is to be able to distinguish overmounted
> filesystems (which I think is possible in the current setup only by

What exactly do you mean here? Do you mean information about stackable
filesystems - i.e. ecryptfs, unionfs, etc?

If so, maybe a last column which the fs itself can fill in with such
information is the best way to go then? Ecryptfs would have just one
pathname to fill in (the location of the encrypted dir), unionfs might
have several (the full stack of unioned directories).

> ordering -- the filesystem on top I believe will end up last in
> /proc/mounts, but I don't know if there actually is anything that
> enforces that.)

Hmm, or do you actually mean that if i'd done

mount --bind /tmp/a /tmp
mount --bind /tmp/b /tmp
mount --bind /tmp/c /tmp

that you would want to see information about the first two mounts?

-serge

2007-06-21 16:55:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Serge E. Hallyn wrote:
>
> Hmm, or do you actually mean that if i'd done
>
> mount --bind /tmp/a /tmp
> mount --bind /tmp/b /tmp
> mount --bind /tmp/c /tmp
>
> that you would want to see information about the first two mounts?
>

Yes. Right now, you see all three without any reliable way of knowing
which one is on top.

-hpa

2007-06-21 17:23:58

by Ram Pai

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On Thu, 2007-06-21 at 09:29 -0700, H. Peter Anvin wrote:
> Ram Pai wrote:
> >
> > Peter, I am not working on it currently. But i am interested in getting
> > it done. I have the seed set of patches which had Al Viro's ideas
> > incorporated. Infact those patches were sent on lkml 2 months back.
> > Shall we start with those patches?
> >
>
> Are these the "unprivileged mount syscall" patches?

no. but those patches were sent in the same thread. Karel had provided
suggestions which I am yet to incorporate.

Give me today. I will send out the patches incorporating the comment
later in the evening.

ok?
RP

>
> Otherwise I don't see any patches in my personal LKML cache (apparently
> my subscription to fsdevel was dropped at some point, so I don't have a
> stash of it.)


>
> -hpa

2007-06-21 17:36:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Ram Pai wrote:
>
> Peter, I am not working on it currently. But i am interested in getting
> it done. I have the seed set of patches which had Al Viro's ideas
> incorporated. Infact those patches were sent on lkml 2 months back.
> Shall we start with those patches?
>

Okay, so what I see in your patches are:

> > path-from-root: mount point of the mount from /
> > path-from-root-of-its-sb: path from its own root dentry.
> > propagation-flag: SHARED, SLAVE, UNBINDABLE, PRIVATE
> > peer-mount-id: the mount-id of its peer mount (if this mount is shared)
> > master-mount-id: the mount-id of its master mount (if this mount is
slave)

Other than cosmetic, I don't see anything terribly wrong with this,
although getting a flag when the directory is overmounted would be nice.

I guess I suggest a single comma-separated field with flags and optional
":argument":

private
shared:<peer>
slave:<master>
unbindable
overmounted

So we could end up with something like:

rootfs / rootfs rw 0 0 0:1 / 1 private,overmounted

... where 1 is the mnt_id (sequence number).

[Please see my other comments in this thread... basically I believe we
should just add fields to /proc/mounts.]

-hpa

2007-06-21 17:47:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

H. Peter Anvin wrote:
>
> I guess I suggest a single comma-separated field with flags and optional
> ":argument":
>
> private
> shared:<peer>
> slave:<master>
> unbindable
> overmounted
>

Just realized: overmounted should presumably have a mount ID associated
with it, too.

-hpa

2007-06-21 19:14:49

by Hans-Peter Jansen

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Am Donnerstag, 21. Juni 2007 00:46 schrieb H. Peter Anvin:
> Dr. David Alan Gilbert wrote:
> > What happens with the (sick) case of spaces in directory names?
> > Also is it really nicely defined that there is no way to put a space
> > in an option in any of the filesystems? I suppose someone
> > particularly sick could have a device node in a directory with a space
> > in it. It would be nice if new formats for this are being defined
> > to make it cover everything.
>
> That's already handled just fine:
>
> bash-3.1$ mkdir /tmp/'Jag ?r: \
> en liten mask'
> bash-3.1$ sudo mount -t tmpfs none '/tmp/Jag ?r: \
> en liten mask'/
> bash-3.1$ tail -1 /proc/mounts
> none /tmp/Jag\040?r:\040\134\012en\040liten\040mask tmpfs rw 0 0
> bash-3.1$

Hmm, and what about the even sicker case: /tmp/\040, parse as /tmp/\\040?
Do userspace cope with this?

Happy parsingly y'rs,
Pete

2007-06-21 19:19:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Hans-Peter Jansen wrote:
>> That's already handled just fine:
>>
>> bash-3.1$ mkdir /tmp/'Jag ?r: \
>> en liten mask'
>> bash-3.1$ sudo mount -t tmpfs none '/tmp/Jag ?r: \
>> en liten mask'/
>> bash-3.1$ tail -1 /proc/mounts
>> none /tmp/Jag\040?r:\040\134\012en\040liten\040mask tmpfs rw 0 0
>> bash-3.1$
>
> Hmm, and what about the even sicker case: /tmp/\040, parse as /tmp/\\040?
> Do userspace cope with this?

Look at the example above, it contains a backslash already. It's
escaped as \134.

"Does userspace cope with this" is of course an impossible question to
answer, since userspace is in theory unbounded. However, if it doesn't,
it is broken and needs to be fixed.

-hpa

2007-06-21 19:42:59

by Nix

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On 21 Jun 2007, Miklos Szeredi said:
> I'm working on this actually. See this (and related patches) in -mm:
>
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm2/broken-out/unprivileged-mounts-add-user-mounts-to-the-kernel.patch
>
> This solves the "user=" thing, but is not a generic solution for other
> options. And I'm wondering if there is really a need for that.

I'm not sure there is, actually...

> Which quota options are you thinking about? Some quota options
> (e.g. for ext*) seem to be already present in /proc/mounts.

... last I checked, usrquota and grpquota weren't being propagated in,
but that's changed sometime in the last, um, wow, has it been two
years already? Perhaps I should have checked again before babbling
nonsense to an audience of thousands, sorry!

> Do you know any other options which are only in /etc/mtab, and need to
> be stored along with each mount?

None presently, but in general it seems strange to have to modify the
kernel in order to be able to reliably associate some new key/value pair
with a mount point, let alone to have to do it on a filesystem-by-
filesystem basis...

--
`... in the sense that dragons logically follow evolution so they would
be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
furiously

2007-06-22 06:47:20

by Ram Pai

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On Thu, 2007-06-21 at 10:31 -0700, H. Peter Anvin wrote:
> Ram Pai wrote:
> >
> > Peter, I am not working on it currently. But i am interested in getting
> > it done. I have the seed set of patches which had Al Viro's ideas
> > incorporated. Infact those patches were sent on lkml 2 months back.
> > Shall we start with those patches?
> >
>
> Okay, so what I see in your patches are:
>
> > > path-from-root: mount point of the mount from /
> > > path-from-root-of-its-sb: path from its own root dentry.
> > > propagation-flag: SHARED, SLAVE, UNBINDABLE, PRIVATE
> > > peer-mount-id: the mount-id of its peer mount (if this mount is shared)
> > > master-mount-id: the mount-id of its master mount (if this mount is
> slave)
>
> Other than cosmetic, I don't see anything terribly wrong with this,
> although getting a flag when the directory is overmounted would be nice.
>
> I guess I suggest a single comma-separated field with flags and optional
> ":argument":
>
> private
> shared:<peer>
> slave:<master>
> unbindable
> overmounted
>
> So we could end up with something like:
>
> rootfs / rootfs rw 0 0 0:1 / 1 private,overmounted
>
> ... where 1 is the mnt_id (sequence number).
>
> [Please see my other comments in this thread... basically I believe we
> should just add fields to /proc/mounts.]

I had two patches. The first patch added a new interface
called /proc/mounts_new and had the following format.

FSID mntpt root-dentry fstype fs-options

where FSID is a filesystem unique id
mntpt is the path to the mountpoint
root-dentry is the path to the dentry with respect to the root dentry of
the same filesystem.
fstype is the filesystem type
fs-options the mount options used.


the second patch made a /proc/propagation interface which had almost the
same fields, but also added fields to show the propagation type of the
mount as well as pointers to its peers and master depending on the type
of the mount.

I think the consensus seems to have a new interface /proc/make-a-name
which extends the interface provided by /proc/mounts but provides the
propagation state of the mounts too as well as disambiguate bind mounts.
Which makes sense.

Why not have something like this?

mnt-id FSID backing-dev mntpt root-dentry fstype
comma-separated-fs-options

and one of the fields in the comma-separated-fs-options indicates the
propagation type of the mount.


BTW: what is the need for overmounted flag? Do you mean two vfsmounts
mounted on the same dentry on the ***same vfsmount*** ?


RP










>
> -hpa

2007-06-22 07:10:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Ram Pai wrote:
>
> the second patch made a /proc/propagation interface which had almost the
> same fields, but also added fields to show the propagation type of the
> mount as well as pointers to its peers and master depending on the type
> of the mount.
>
> I think the consensus seems to have a new interface /proc/make-a-name
> which extends the interface provided by /proc/mounts but provides the
> propagation state of the mounts too as well as disambiguate bind mounts.
> Which makes sense.
>

Why? It seems a lot cleaner to have all the information in the same
place. It is highly unfriendly to userspace to have to gather
information in a lot of places, plus it adds race conditions.

It would be another matter if the format that we have now couldn't be
extended, but we need those fields (well, except the two zeros, but who
cares) *anyway*, so we might as well stick to the existing file, and
reduce the total amount of code and clutter.

>
> BTW: what is the need for overmounted flag? Do you mean two vfsmounts
> mounted on the same dentry on the ***same vfsmount*** ?
>

Maybe I'm not following the uses of your flags well enough to figure out
if that information can already been deduced.

-hpa

2007-06-22 07:37:55

by Ram Pai

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

On Fri, 2007-06-22 at 00:06 -0700, H. Peter Anvin wrote:
> Ram Pai wrote:
> >
> > the second patch made a /proc/propagation interface which had almost the
> > same fields, but also added fields to show the propagation type of the
> > mount as well as pointers to its peers and master depending on the type
> > of the mount.
> >
> > I think the consensus seems to have a new interface /proc/make-a-name
> > which extends the interface provided by /proc/mounts but provides the
> > propagation state of the mounts too as well as disambiguate bind mounts.
> > Which makes sense.
> >
>
> Why? It seems a lot cleaner to have all the information in the same
> place. It is highly unfriendly to userspace to have to gather
> information in a lot of places, plus it adds race conditions.
>
> It would be another matter if the format that we have now couldn't be
> extended, but we need those fields (well, except the two zeros, but who
> cares) *anyway*, so we might as well stick to the existing file, and
> reduce the total amount of code and clutter.

Ok. so you think /proc/mounts can be extended easily without breaking
any userspace commands?

well lets see..
1. to disambiguate bind mounts, we have to add a field that displays the
path to the mount's root dentry from the filesystem's root
dentry. Agree?

2. For filesystems that do not have a backing store, it becomes hard
to disambiguate bind mounts in (1). So we need to add a
filesystem-id field.

3. if we need to add the propagation status of the mount we need a
propagation flag added in the output.

4. To be able to construct the propagation tree, we need a way to refer
to the other mounts, since some mounts are peers and some other
mounts are master. Which means we need a mount-id field.
Agree?

If you agree to the above 4 new fields, it becomes challenging to
extend /proc/mounts to incorporate these new fields without
breaking any existing applications.


> >
> > BTW: what is the need for overmounted flag? Do you mean two vfsmounts
> > mounted on the same dentry on the ***same vfsmount*** ?
> >
>
> Maybe I'm not following the uses of your flags well enough to figure out
> if that information can already been deduced.

With the addition of the above 4 mentioned fields, I think one should be
easily able to decipher which mnt-id is mounted on which mnt-id. no?
maybe not. Well we will have to extend the mountpoint field to indicate
the mnt-id in which the mountpoint resides.

RP

>
> -hpa

2007-06-22 07:56:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Ram Pai wrote:
>
> Ok. so you think /proc/mounts can be extended easily without breaking
> any userspace commands?
>
> well lets see..
> 1. to disambiguate bind mounts, we have to add a field that displays the
> path to the mount's root dentry from the filesystem's root
> dentry. Agree?
>
> 2. For filesystems that do not have a backing store, it becomes hard
> to disambiguate bind mounts in (1). So we need to add a
> filesystem-id field.
>
> 3. if we need to add the propagation status of the mount we need a
> propagation flag added in the output.
>
> 4. To be able to construct the propagation tree, we need a way to refer
> to the other mounts, since some mounts are peers and some other
> mounts are master. Which means we need a mount-id field.
> Agree?
>
> If you agree to the above 4 new fields, it becomes challenging to
> extend /proc/mounts to incorporate these new fields without
> breaking any existing applications.
>

No, I don't think so. I suspect, in fact, that as long as we add the
new fields to the right (obviously) we should be fine. There aren't all
that many users of /proc/mounts, and even fewer that don't use the
library functions (getmntent et al.)

-hpa

2007-06-28 14:53:31

by Pavel Machek

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Hi!

> > ... or, alternatively, add a subfield to the first field (which would
> > entail escaping whatever separator we choose):
> >
> > /dev/md6 /export ext3 rw,data=ordered 0 0
> > /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
> > /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
>
> Hell, no. The first field is in principle impossible to parse unless
> you know the fs type.
>
> How about making a new file with sane format? From the very

Well, what about /sysfs, with its one value per file rule?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-06-28 15:41:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Adding subroot information to /proc/mounts, or obtaining that through other means

Pavel Machek wrote:
> Hi!
>
>>> ... or, alternatively, add a subfield to the first field (which would
>>> entail escaping whatever separator we choose):
>>>
>>> /dev/md6 /export ext3 rw,data=ordered 0 0
>>> /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
>>> /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
>> Hell, no. The first field is in principle impossible to parse unless
>> you know the fs type.
>>
>> How about making a new file with sane format? From the very
>
> Well, what about /sysfs, with its one value per file rule?
>

There are two reasons not to do it that way:

- atomicity
- backwards compatibility

Of these, I would argue the former is the most important.

Additionally, I don't think sysfs has the ability to present different
structures on a per-process basis; keep in mind this isn't really
/proc/mounts, but really /proc/<pid>/mounts.

-hpa