LinuxLists.cc - [RFC] proc: don't show kernel-internal mnt

2016-11-07 16:10:14

Subject: [RFC] proc: don't show kernel-internal mnt_id in fdinfo (if possible)

Some kernel special fs could be mounted by userspace, let's show
userspace mnt_id in those cases.

Otherwise, I got:
[~]# cat /proc/11299/fdinfo/3
pos: 0
flags: 02000002
mnt_id: 14
[~]# cat /proc/11299/mountinfo | grep '^14'
[~]# ls -l /proc/11299/fd/3
lrwx------. 1 root root 64 Nov 7 18:30 /proc/11299/fd/3 -> /test-queue
[~]# ls /dev/mqueue/
test-queue
[~]# cat /proc/11299/mountinfo | grep mqueue
32 18 0:14 / /dev/mqueue rw,relatime shared:17 - mqueue mqueue rw,seclabel

This happens because mqueue fs is mounted twice:
- the first is kernel-internal mnt on init:
kernel_init=>do_one_initcall=>init_mqueue_fs=>mq_init_ns=>vfs_kern_mount
- the second time it's systemd's mount-unit:
entry_SYSCALL_64_fastpath=>SyS_mount=>do_mount=>vfs_kern_mount

For the purpose of userspace parsing, having in-kernel mnt_id is less
useful then mnt_id of mount point: afterwards I'm able to see fs type,
path, etc:
[~]# cat /proc/11152/mountinfo | grep mqueue
32 18 0:14 / /dev/mqueue rw,relatime shared:18 - mqueue mqueue rw,seclabel
[~]# cat /proc/11152/fdinfo/3
pos: 0
flags: 02000002
mnt_id: 32

On a bad side - if we've no userspace mount, we still can't tell a thing
about opened fd..

Cc: Al Viro <[email protected]>
Cc: Andrey Vagin <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Rob Landley <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
---
fs/proc/fd.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index d21dafef3102..bfa8699bcd8e 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -9,6 +9,7 @@
#include <linux/file.h>
#include <linux/seq_file.h>
#include <linux/fs.h>
+#include <linux/nsproxy.h>

#include <linux/proc_fs.h>

@@ -19,9 +20,10 @@
static int seq_show(struct seq_file *m, void *v)
{
struct files_struct *files = NULL;
- int f_flags = 0, ret = -ENOENT;
+ int f_flags = 0, ret = -ENOENT, mnt_id = 0;
struct file *file = NULL;
struct task_struct *task;
+ struct mount *mount;

task = get_proc_task(m->private);
if (!task)
@@ -52,9 +54,25 @@ static int seq_show(struct seq_file *m, void *v)
if (ret)
return ret;

+ mount = real_mount(file->f_path.mnt);
+ if (mount->mnt_ns == MNT_NS_INTERNAL) {
+ struct mount *mnt;
+
+ lock_mount_hash();
+ list_for_each_entry(mnt, &mount->mnt_instance, mnt_instance) {
+ if (current->nsproxy->mnt_ns == mnt->mnt_ns) {
+ mnt_id = mnt->mnt_id;
+ break;
+ }
+ }
+ unlock_mount_hash();
+ }
+
+ if (mnt_id == 0)
+ mnt_id = mount->mnt_id;
+
seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\n",
- (long long)file->f_pos, f_flags,
- real_mount(file->f_path.mnt)->mnt_id);
+ (long long)file->f_pos, f_flags, mnt_id);

show_fd_locks(m, file, files);
if (seq_has_overflowed(m))
--
2.10.2

2016-11-07 17:09:13

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC] proc: don't show kernel-internal mnt_id in fdinfo (if possible)

Dmitry Safonov <[email protected]> writes:

> Some kernel special fs could be mounted by userspace, let's show
> userspace mnt_id in those cases.

You are asking the kernel to lie to userspace in the case when
the truth in inconvenient. That seems blantantly wrong.

In the case of an internal mount you may want to use the device id
of the device the filesystem is mounted on so you can tie all of
the mounts together. That would allow restore and other things
to do something useful.

That information is available with fstat on a file descriptor so I don't
think we need to export it. But if it is not available I can see
the point of a patch.

Outright ling to userpsace. No. That is just horrible.

Nacked-by: "Eric W. Biederman" <[email protected]>

>
> Otherwise, I got:
> [~]# cat /proc/11299/fdinfo/3
> pos: 0
> flags: 02000002
> mnt_id: 14
> [~]# cat /proc/11299/mountinfo | grep '^14'
> [~]# ls -l /proc/11299/fd/3
> lrwx------. 1 root root 64 Nov 7 18:30 /proc/11299/fd/3 -> /test-queue
> [~]# ls /dev/mqueue/
> test-queue
> [~]# cat /proc/11299/mountinfo | grep mqueue
> 32 18 0:14 / /dev/mqueue rw,relatime shared:17 - mqueue mqueue rw,seclabel
>
> This happens because mqueue fs is mounted twice:
> - the first is kernel-internal mnt on init:
> kernel_init=>do_one_initcall=>init_mqueue_fs=>mq_init_ns=>vfs_kern_mount
> - the second time it's systemd's mount-unit:
> entry_SYSCALL_64_fastpath=>SyS_mount=>do_mount=>vfs_kern_mount
>
> For the purpose of userspace parsing, having in-kernel mnt_id is less
> useful then mnt_id of mount point: afterwards I'm able to see fs type,
> path, etc:
> [~]# cat /proc/11152/mountinfo | grep mqueue
> 32 18 0:14 / /dev/mqueue rw,relatime shared:18 - mqueue mqueue rw,seclabel
> [~]# cat /proc/11152/fdinfo/3
> pos: 0
> flags: 02000002
> mnt_id: 32
>
> On a bad side - if we've no userspace mount, we still can't tell a thing
> about opened fd..
>
> Cc: Al Viro <[email protected]>
> Cc: Andrey Vagin <[email protected]>
> Cc: "Eric W. Biederman" <[email protected]>
> Cc: Rob Landley <[email protected]>
> Signed-off-by: Dmitry Safonov <[email protected]>
> ---
> fs/proc/fd.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index d21dafef3102..bfa8699bcd8e 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -9,6 +9,7 @@
> #include <linux/file.h>
> #include <linux/seq_file.h>
> #include <linux/fs.h>
> +#include <linux/nsproxy.h>
>
> #include <linux/proc_fs.h>
>
> @@ -19,9 +20,10 @@
> static int seq_show(struct seq_file *m, void *v)
> {
> struct files_struct *files = NULL;
> - int f_flags = 0, ret = -ENOENT;
> + int f_flags = 0, ret = -ENOENT, mnt_id = 0;
> struct file *file = NULL;
> struct task_struct *task;
> + struct mount *mount;
>
> task = get_proc_task(m->private);
> if (!task)
> @@ -52,9 +54,25 @@ static int seq_show(struct seq_file *m, void *v)
> if (ret)
> return ret;
>
> + mount = real_mount(file->f_path.mnt);
> + if (mount->mnt_ns == MNT_NS_INTERNAL) {
> + struct mount *mnt;
> +
> + lock_mount_hash();
> + list_for_each_entry(mnt, &mount->mnt_instance, mnt_instance) {
> + if (current->nsproxy->mnt_ns == mnt->mnt_ns) {
> + mnt_id = mnt->mnt_id;
> + break;
> + }
> + }
> + unlock_mount_hash();
> + }
> +
> + if (mnt_id == 0)
> + mnt_id = mount->mnt_id;
> +
> seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\n",
> - (long long)file->f_pos, f_flags,
> - real_mount(file->f_path.mnt)->mnt_id);
> + (long long)file->f_pos, f_flags, mnt_id);
>
> show_fd_locks(m, file, files);
> if (seq_has_overflowed(m))

2016-11-07 19:07:18

by Dmitry Safonov

[permalink] [raw]

Subject: Re: [RFC] proc: don't show kernel-internal mnt_id in fdinfo (if possible)

On 11/07/2016 08:06 PM, Eric W. Biederman wrote:
> Dmitry Safonov <[email protected]> writes:
>
>> Some kernel special fs could be mounted by userspace, let's show
>> userspace mnt_id in those cases.
>
> You are asking the kernel to lie to userspace in the case when
> the truth in inconvenient. That seems blantantly wrong.
>
> In the case of an internal mount you may want to use the device id
> of the device the filesystem is mounted on so you can tie all of
> the mounts together. That would allow restore and other things
> to do something useful.
>
> That information is available with fstat on a file descriptor so I don't
> think we need to export it. But if it is not available I can see
> the point of a patch.
>
> Outright ling to userpsace. No. That is just horrible.
>
> Nacked-by: "Eric W. Biederman" <[email protected]>

Ah, ok right, I see your point - then let's just drop the patch.

Thanks,
Dmitry