2009-09-24 14:53:01

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH] vfs: new O_NODE open flag

Reseding this patch which I've forgotten about a bit, but David
Howells reminded me. Thanks, David.

It received positive responses from Linus and others at the last
submission.

Al, could you please take a look?

Thanks,
Miklos

---
From: Miklos Szeredi <[email protected]>

This patch adds a new open flag, O_NODE. This flag means: open just
the filesystem node instead of the object referenced by the node.

Opening a file will not call the driver's ->open() method and will not
have any side effect other than referencing the dentry/vfsmount from
the struct file pointer.

Currently O_NODE can only be used together with O_NOACCESS (value 3),
meaning that read(2) and write(2) will return EBADF and no permission
is necessary to open the file. This is logical for device nodes and
fifos. For directories we could allow O_RDONLY, and for regular files
any access mode, but this is not done yet.

O_NODE used together with O_NOFOLLOW will open a symlink node itself
instead of returning ELOOP. O_NOFOLLOW will not prevent following
mountpoints if used together with O_NODE. In theory we could allow
O_RDONLY for symlinks too and return the contents of the link on
read(2).

Filesystems which want to install special file operations for O_NODE
opens (e.g. ioctl) may set S_OPEN_NODE flag in the inode. This will
cause dentry_open() to call inode->i_fop->open, and the filesystem is
responsible for handling the O_NODE flag and installing the right
filesystem operations in file->f_op.

Signed-off-by: Miklos Szeredi <[email protected]>
Acked-by: David Howells <[email protected]>
---
fs/file_table.c | 6 ++--
fs/locks.c | 3 ++
fs/namei.c | 65 ++++++++++++++++++++++++++------------------
fs/open.c | 12 +++++++-
include/asm-generic/fcntl.h | 4 ++
include/linux/fs.h | 1
6 files changed, 62 insertions(+), 29 deletions(-)

Index: linux-2.6/fs/file_table.c
===================================================================
--- linux-2.6.orig/fs/file_table.c 2009-09-24 15:56:41.000000000 +0200
+++ linux-2.6/fs/file_table.c 2009-09-24 16:36:35.000000000 +0200
@@ -281,8 +281,10 @@ void __fput(struct file *file)
file->f_op->release(inode, file);
security_file_free(file);
ima_file_free(file);
- if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL))
- cdev_put(inode->i_cdev);
+ if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL)) {
+ if (!(file->f_flags & O_NODE))
+ cdev_put(inode->i_cdev);
+ }
fops_put(file->f_op);
put_pid(file->f_owner.pid);
file_kill(file);
Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c 2009-09-24 15:56:41.000000000 +0200
+++ linux-2.6/fs/locks.c 2009-09-24 16:36:35.000000000 +0200
@@ -1183,6 +1183,9 @@ int __break_lease(struct inode *inode, u
unsigned long break_time;
int i_have_this_lease = 0;

+ if (!(mode & (FMODE_READ | FMODE_WRITE)))
+ return 0;
+
new_fl = lease_alloc(NULL, mode & FMODE_WRITE ? F_WRLCK : F_RDLCK);

lock_kernel();
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c 2009-09-24 15:56:41.000000000 +0200
+++ linux-2.6/fs/namei.c 2009-09-24 16:36:36.000000000 +0200
@@ -1160,7 +1160,9 @@ static int path_lookup_open(int dfd, con
nd->intent.open.file = filp;
nd->intent.open.flags = open_flags;
nd->intent.open.create_mode = 0;
- err = do_path_lookup(dfd, name, lookup_flags|LOOKUP_OPEN, nd);
+ if (!(open_flags & O_NODE))
+ lookup_flags |= LOOKUP_OPEN;
+ err = do_path_lookup(dfd, name, lookup_flags, nd);
if (IS_ERR(nd->intent.open.file)) {
if (err == 0) {
err = PTR_ERR(nd->intent.open.file);
@@ -1511,22 +1513,27 @@ int may_open(struct path *path, int acc_
if (!inode)
return -ENOENT;

- switch (inode->i_mode & S_IFMT) {
- case S_IFLNK:
- return -ELOOP;
- case S_IFDIR:
- if (acc_mode & MAY_WRITE)
- return -EISDIR;
- break;
- case S_IFBLK:
- case S_IFCHR:
- if (path->mnt->mnt_flags & MNT_NODEV)
+ if ((flag & O_NODE)) {
+ if (acc_mode & (MAY_READ | MAY_WRITE))
return -EACCES;
- /*FALLTHRU*/
- case S_IFIFO:
- case S_IFSOCK:
- flag &= ~O_TRUNC;
- break;
+ } else {
+ switch (inode->i_mode & S_IFMT) {
+ case S_IFLNK:
+ return -ELOOP;
+ case S_IFDIR:
+ if (acc_mode & MAY_WRITE)
+ return -EISDIR;
+ break;
+ case S_IFBLK:
+ case S_IFCHR:
+ if (path->mnt->mnt_flags & MNT_NODEV)
+ return -EACCES;
+ /*FALLTHRU*/
+ case S_IFIFO:
+ case S_IFSOCK:
+ flag &= ~O_TRUNC;
+ break;
+ }
}

error = inode_permission(inode, acc_mode);
@@ -1639,14 +1646,16 @@ out_unlock:
* 11 - read-write
* for the internal routines (ie open_namei()/follow_link() etc)
* This is more logical, and also allows the 00 "no perm needed"
- * to be used for symlinks (where the permissions are checked
- * later).
+ * to be used for opening a symlink, pipe, socket or device with
+ * O_NODE.
*
*/
static inline int open_to_namei_flags(int flag)
{
if ((flag+1) & O_ACCMODE)
flag++;
+ else if (flag & O_NODE)
+ flag &= ~O_ACCMODE;
return flag;
}

@@ -1734,7 +1743,9 @@ struct file *do_filp_open(int dfd, const
nd.intent.open.create_mode = mode;
dir = nd.path.dentry;
nd.flags &= ~LOOKUP_PARENT;
- nd.flags |= LOOKUP_CREATE | LOOKUP_OPEN;
+ nd.flags |= LOOKUP_CREATE;
+ if (!(open_flag & O_NODE))
+ nd.flags |= LOOKUP_OPEN;
if (flag & O_EXCL)
nd.flags |= LOOKUP_EXCL;
mutex_lock(&dir->d_inode->i_mutex);
@@ -1793,19 +1804,24 @@ do_last:

if (__follow_mount(&path)) {
error = -ELOOP;
- if (flag & O_NOFOLLOW)
+ if ((flag & O_NOFOLLOW) & !(flag & O_NODE))
goto exit_dput;
}

error = -ENOENT;
if (!path.dentry->d_inode)
goto exit_dput;
- if (path.dentry->d_inode->i_op->follow_link)
- goto do_link;
+ if (path.dentry->d_inode->i_op->follow_link) {
+ if (!(flag & O_NOFOLLOW))
+ goto do_link;
+ error = -ELOOP;
+ if (!(flag & O_NODE))
+ goto exit_dput;
+ }

path_to_nameidata(&path, &nd);
error = -EISDIR;
- if (path.dentry->d_inode && S_ISDIR(path.dentry->d_inode->i_mode))
+ if (S_ISDIR(path.dentry->d_inode->i_mode))
goto exit;
ok:
/*
@@ -1859,9 +1875,6 @@ exit_parent:
return ERR_PTR(error);

do_link:
- error = -ELOOP;
- if (flag & O_NOFOLLOW)
- goto exit_dput;
/*
* This is subtle. Instead of calling do_follow_link() we do the
* thing by hands. The reason is that this way we have zero link_count
Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c 2009-09-24 15:56:41.000000000 +0200
+++ linux-2.6/fs/open.c 2009-09-24 16:36:36.000000000 +0200
@@ -804,12 +804,17 @@ static inline int __get_file_write_acces
return error;
}

+static const struct file_operations default_node_fops = {
+ .llseek = no_llseek,
+};
+
static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
int flags, struct file *f,
int (*open)(struct inode *, struct file *),
const struct cred *cred)
{
struct inode *inode;
+ const struct file_operations *fops;
int error;

f->f_flags = flags;
@@ -828,7 +833,12 @@ static struct file *__dentry_open(struct
f->f_path.dentry = dentry;
f->f_path.mnt = mnt;
f->f_pos = 0;
- f->f_op = fops_get(inode->i_fop);
+
+ fops = inode->i_fop;
+ if ((flags & O_NODE) && !(inode->i_flags & S_OPEN_NODE))
+ fops = &default_node_fops;
+ f->f_op = fops_get(fops);
+
file_move(f, &inode->i_sb->s_files);

error = security_dentry_open(f, cred);
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h 2009-09-24 15:56:41.000000000 +0200
+++ linux-2.6/include/asm-generic/fcntl.h 2009-09-24 16:36:36.000000000 +0200
@@ -9,6 +9,7 @@
#define O_RDONLY 00000000
#define O_WRONLY 00000001
#define O_RDWR 00000002
+#define O_NOACCESS 00000003
#ifndef O_CREAT
#define O_CREAT 00000100 /* not fcntl */
#endif
@@ -51,6 +52,9 @@
#ifndef O_CLOEXEC
#define O_CLOEXEC 02000000 /* set close_on_exec */
#endif
+#ifndef O_NODE
+#define O_NODE 04000000 /* open filesystem node, not device */
+#endif
#ifndef O_NDELAY
#define O_NDELAY O_NONBLOCK
#endif
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h 2009-09-24 15:56:41.000000000 +0200
+++ linux-2.6/include/linux/fs.h 2009-09-24 16:36:36.000000000 +0200
@@ -231,6 +231,7 @@ struct inodes_stat_t {
#define S_NOCMTIME 128 /* Do not update file c/mtime */
#define S_SWAPFILE 256 /* Do not truncate: swapon got its bmaps */
#define S_PRIVATE 512 /* Inode is fs-internal */
+#define S_OPEN_NODE 1024 /* Fs is prepared for O_NODE opens */

/*
* Note that nosuid etc flags are inode-specific: setting some file-system


2009-09-25 00:24:38

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Thursday, 24 September 2009 16:51:58 Miklos Szeredi wrote:
> This patch adds a new open flag, O_NODE. This flag means: open just
> the filesystem node instead of the object referenced by the node.

What is the intended use for O_NODE?

Thanks,
Andreas

2009-09-25 05:52:26

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Fri, 25 Sep 2009, Andreas Gruenbacher wrote:
> On Thursday, 24 September 2009 16:51:58 Miklos Szeredi wrote:
> > This patch adds a new open flag, O_NODE. This flag means: open just
> > the filesystem node instead of the object referenced by the node.
>
> What is the intended use for O_NODE?

It lets userspace file descriptors reference a inode without actually
"dereferencing" it to get the underlying object. This allows for a
couple of new things:

- opening a special file (device/socket/fifo) without side effects

- opening a symlink

- opening any type of file without any permission is also possible
(of course the resuling file descriptor may not be read or written)

The above allows fstat(), fchmod(), ioctl(), etc to be used for files
previously not possible. AFS for example wanted a "pioctl()" syscall
for various things, but the same can be done without having to define
a new syscall:

fd = open(path, O_NODE);
ioctl(fd, ...);
close(fd);

Another important use would be opening a directory without read
permission (see O_SEARCH from the POSIX draft for what this is useful
for). O_NODE is not quite equivalent to O_SEARCH, but similar and
equally useful.

Thanks,
Miklos

2009-09-25 12:18:18

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Fri, 25 Sep 2009, Dr. David Alan Gilbert wrote:
> * Miklos Szeredi ([email protected]) wrote:
> > On Fri, 25 Sep 2009, Andreas Gruenbacher wrote:
> > > On Thursday, 24 September 2009 16:51:58 Miklos Szeredi wrote:
> > > > This patch adds a new open flag, O_NODE. This flag means: open just
> > > > the filesystem node instead of the object referenced by the node.
> > >
> > > What is the intended use for O_NODE?
> >
> > It lets userspace file descriptors reference a inode without actually
> > "dereferencing" it to get the underlying object. This allows for a
> > couple of new things:
> >
> > - opening a special file (device/socket/fifo) without side effects
> >
> > - opening a symlink
> >
> > - opening any type of file without any permission is also possible
> > (of course the resuling file descriptor may not be read or written)
> >
> > The above allows fstat(), fchmod(), ioctl(), etc to be used for files
> > previously not possible.
>
> Given an fd opened in this way is it possible to reopen it normally and
> be guarenteed to get the same object?

For directories it should be possible with:

fd = open(path, O_NOACCESS | O_NODE);
fd2 = openat(fd, ".", O_RDWR);

We could implement something similar for non-directories as well, by
treating a NULL path parameter to openat() to mean - reopen:

fd2 = openat(fd, NULL, O_RDWR);

Comments?

Thanks,
Miklos

2009-09-25 11:37:50

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

* Miklos Szeredi ([email protected]) wrote:
> On Fri, 25 Sep 2009, Andreas Gruenbacher wrote:
> > On Thursday, 24 September 2009 16:51:58 Miklos Szeredi wrote:
> > > This patch adds a new open flag, O_NODE. This flag means: open just
> > > the filesystem node instead of the object referenced by the node.
> >
> > What is the intended use for O_NODE?
>
> It lets userspace file descriptors reference a inode without actually
> "dereferencing" it to get the underlying object. This allows for a
> couple of new things:
>
> - opening a special file (device/socket/fifo) without side effects
>
> - opening a symlink
>
> - opening any type of file without any permission is also possible
> (of course the resuling file descriptor may not be read or written)
>
> The above allows fstat(), fchmod(), ioctl(), etc to be used for files
> previously not possible.

Given an fd opened in this way is it possible to reopen it normally and
be guarenteed to get the same object?

Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2009-09-25 16:53:25

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Sep 25, 2009 13:37 +0100, Dr. David Alan Gilbert wrote:
> * Miklos Szeredi ([email protected]) wrote:
> > On Fri, 25 Sep 2009, Andreas Gruenbacher wrote:
> > > On Thursday, 24 September 2009 16:51:58 Miklos Szeredi wrote:
> > > > This patch adds a new open flag, O_NODE. This flag means: open just
> > > > the filesystem node instead of the object referenced by the node.
> > >
> > > What is the intended use for O_NODE?
> >
> > It lets userspace file descriptors reference a inode without actually
> > "dereferencing" it to get the underlying object. This allows for a
> > couple of new things:
> >
> > - opening a special file (device/socket/fifo) without side effects
> >
> > - opening a symlink
> >
> > - opening any type of file without any permission is also possible
> > (of course the resuling file descriptor may not be read or written)
> >
> > The above allows fstat(), fchmod(), ioctl(), etc to be used for files
> > previously not possible.
>
> Given an fd opened in this way is it possible to reopen it normally and
> be guarenteed to get the same object?

That was something I'd be interested in as well.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2009-09-25 17:08:42

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

Miklos Szeredi wrote:
> We could implement something similar for non-directories as well, by
> treating a NULL path parameter to openat() to mean - reopen:
>
> fd2 = openat(fd, NULL, O_RDWR);

It would be nice to make this behavior consistent across all the *at()
functions where it is acceptable.

-- Jamie

2009-09-25 17:21:07

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Fri, 25 Sep 2009 13:37:47 BST, "Dr. David Alan Gilbert" said:

> Given an fd opened in this way is it possible to reopen it normally and
> be guarenteed to get the same object?

It's not possible even without this flag. Consider:

fd1 = open("/tmp/foo",flags);
rc = rename("/tmp/foo","/tmp/bar");
fd2 = open("/tmp/foo",flags);

Or were you asking if *absent that sort of tomfoolery* if it would work?


Attachments:
(No filename) (227.00 B)

2009-09-25 17:35:27

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

* [email protected] ([email protected]) wrote:
> On Fri, 25 Sep 2009 13:37:47 BST, "Dr. David Alan Gilbert" said:
>
> > Given an fd opened in this way is it possible to reopen it normally and
> > be guarenteed to get the same object?
>
> It's not possible even without this flag. Consider:
>
> fd1 = open("/tmp/foo",flags);
> rc = rename("/tmp/foo","/tmp/bar");
> fd2 = open("/tmp/foo",flags);
>
> Or were you asking if *absent that sort of tomfoolery* if it would work?

I know it's not possible without this flag, my interest is whether
it would be possible WITH this flag to promote an fd opened with the
O_NODE to a normal fd, guaranteeing that it's still operating on the
same object. The case I'm (vaguely) thinking of is open with O_NODE,
fstat it, come to the conslusion it's not a device and then proceed
to open it and read from it.

Dave


--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2009-09-25 19:02:26

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Sep 25, 2009 13:20 -0400, [email protected] wrote:
> On Fri, 25 Sep 2009 13:37:47 BST, "Dr. David Alan Gilbert" said:
>
> > Given an fd opened in this way is it possible to reopen it normally and
> > be guarenteed to get the same object?
>
> It's not possible even without this flag. Consider:
>
> fd1 = open("/tmp/foo",flags);
> rc = rename("/tmp/foo","/tmp/bar");
> fd2 = open("/tmp/foo",flags);
>
> Or were you asking if *absent that sort of tomfoolery* if it would work?

No, the point is that we HAVE an fd that points to the original "/tmp/foo"
opened with O_NODE, and now (after an ioctl, stat, etc) we decide it is
safe to open the file read and/or write without releasing the existing
fd. The whole point is to AVOID this kind of tomfoolery.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2009-09-25 21:18:40

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Fri, 25 Sep 2009 19:35:23 BST, "Dr. David Alan Gilbert" said:

> I know it's not possible without this flag, my interest is whether
> it would be possible WITH this flag to promote an fd opened with the
> O_NODE to a normal fd, guaranteeing that it's still operating on the
> same object.

Well, maybe the question is if we should treat "promote" differently than
"re-open"?

(And now I'm wondering what happens if you dup() one of these beasts....)


Attachments:
(No filename) (227.00 B)

2009-09-28 10:25:31

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Fri, 25 Sep 2009, Valdis.Kletnieks wrote:
> On Fri, 25 Sep 2009 19:35:23 BST, "Dr. David Alan Gilbert" said:
>
> > I know it's not possible without this flag, my interest is whether
> > it would be possible WITH this flag to promote an fd opened with the
> > O_NODE to a normal fd, guaranteeing that it's still operating on the
> > same object.
>
> Well, maybe the question is if we should treat "promote" differently than
> "re-open"?
>
> (And now I'm wondering what happens if you dup() one of these beasts....)

dup() only duplicates _references_ to an open file, it does not create
a new file. The same applies to fork().

BTW I just checked, and it is possible to re-open or promote an fd
opened with O_NODE like this:

char tmp[64];

fd = open(filename, O_NODE | O_NOACCESS);
/* ... */
sprintf(tmp, "/proc/self/fd/%i", fd);
fd_rw = open(tmp, O_RDWR);

Now fd_rw is guaranteed to refer to the same inode as fd.

Thanks,
Miklos

2009-09-28 13:28:46

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

Miklos Szeredi wrote:
> BTW I just checked, and it is possible to re-open or promote an fd
> opened with O_NODE like this:
>
> char tmp[64];
>
> fd = open(filename, O_NODE | O_NOACCESS);
> /* ... */
> sprintf(tmp, "/proc/self/fd/%i", fd);
> fd_rw = open(tmp, O_RDWR);
>
> Now fd_rw is guaranteed to refer to the same inode as fd.

If someone passes you a file descriptor opened with O_RDONLY, you
shouldn't be able to upgrade it to O_RDWR unless you have access to
the file and could do a normal open() on the file.

I hope the above cannot convert O_NOACCESS to O_RDWR without checking
that you have access to the file.

Hmm. I have just tried, and you _can _use open("/proc/self/fd/%d",
O_RDWR) to re-open with more permissions when you can't access the
path which /proc/self/fd/%d pretends to link to. It looks a bit
dubious, as you might have been passed an O_RDONLY descriptor with the
intention that you can't write to it... Oh well!

-- Jamie

2009-09-28 14:07:58

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Mon, 28 Sep 2009, Jamie Lokier wrote:
> Miklos Szeredi wrote:
> > BTW I just checked, and it is possible to re-open or promote an fd
> > opened with O_NODE like this:
> >
> > char tmp[64];
> >
> > fd = open(filename, O_NODE | O_NOACCESS);
> > /* ... */
> > sprintf(tmp, "/proc/self/fd/%i", fd);
> > fd_rw = open(tmp, O_RDWR);
> >
> > Now fd_rw is guaranteed to refer to the same inode as fd.
>
> If someone passes you a file descriptor opened with O_RDONLY, you
> shouldn't be able to upgrade it to O_RDWR unless you have access to
> the file and could do a normal open() on the file.
>
> I hope the above cannot convert O_NOACCESS to O_RDWR without checking
> that you have access to the file.

The permissions are checked on the inode at open. It doesn't matter
how the inode came to be looked up, via a normal path or via a file
descriptor in proc, the results are the same.

> Hmm. I have just tried, and you _can _use open("/proc/self/fd/%d",
> O_RDWR) to re-open with more permissions when you can't access the
> path which /proc/self/fd/%d pretends to link to. It looks a bit
> dubious, as you might have been passed an O_RDONLY descriptor with the
> intention that you can't write to it... Oh well!

True, /proc gives you access to the underlying "path" of an open file
descriptor. If you don't want that, don't mount /proc in your limited
namespace.

Thanks,
Miklos

2009-09-28 14:10:25

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

Miklos Szeredi wrote:
> > Hmm. I have just tried, and you _can _use open("/proc/self/fd/%d",
> > O_RDWR) to re-open with more permissions when you can't access the
> > path which /proc/self/fd/%d pretends to link to. It looks a bit
> > dubious, as you might have been passed an O_RDONLY descriptor with the
> > intention that you can't write to it... Oh well!
>
> True, /proc gives you access to the underlying "path" of an open file
> descriptor. If you don't want that, don't mount /proc in your limited
> namespace.

I wasn't using a limited namespace.

Just a directory without permission to be searched, and as a regular
user - which is historically supposed to prevent you from opening
files in it.

-- Jamie

2009-09-28 15:21:57

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Sep 28, 2009 12:25 +0200, Miklos Szeredi wrote:
> BTW I just checked, and it is possible to re-open or promote an fd
> opened with O_NODE like this:
>
> char tmp[64];
>
> fd = open(filename, O_NODE | O_NOACCESS);
> /* ... */
> sprintf(tmp, "/proc/self/fd/%i", fd);
> fd_rw = open(tmp, O_RDWR);
>
> Now fd_rw is guaranteed to refer to the same inode as fd.

It seems very unpleasant to require applications using O_NODE to
reopen files using /proc.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2009-09-28 15:53:53

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

Andreas Dilger wrote:
> > It's not possible even without this flag. Consider:
> >
> > fd1 = open("/tmp/foo",flags);
> > rc = rename("/tmp/foo","/tmp/bar");
> > fd2 = open("/tmp/foo",flags);
> >
> > Or were you asking if *absent that sort of tomfoolery* if it would work?
>
> No, the point is that we HAVE an fd that points to the original "/tmp/foo"
> opened with O_NODE, and now (after an ioctl, stat, etc) we decide it is
> safe to open the file read and/or write without releasing the existing
> fd. The whole point is to AVOID this kind of tomfoolery.

Make sense, and openat() seems like a good way to accomplish it.

-- Jamie

2009-09-28 16:04:17

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Mon, 28 Sep 2009, Andreas Dilger wrote:
> On Sep 28, 2009 12:25 +0200, Miklos Szeredi wrote:
> > BTW I just checked, and it is possible to re-open or promote an fd
> > opened with O_NODE like this:
> >
> > char tmp[64];
> >
> > fd = open(filename, O_NODE | O_NOACCESS);
> > /* ... */
> > sprintf(tmp, "/proc/self/fd/%i", fd);
> > fd_rw = open(tmp, O_RDWR);
> >
> > Now fd_rw is guaranteed to refer to the same inode as fd.
>
> It seems very unpleasant to require applications using O_NODE to
> reopen files using /proc.

The point of the above example was that reopening a file descriptor
with upgraded (or downgraded) access mode is even now possible. Which
either means:

a) the current permission model under /proc/PID/fd has a security
hole (which Jamie is worried about)

b) we can safely implement this with by changing openat() semantics,
or even with a new reopen() syscall

I'm not too worried about the security aspect of this, but it's
something to keep in mind.

Thanks,
Miklos

2009-09-30 08:18:32

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

* Jamie Lokier:

> I hope the above cannot convert O_NOACCESS to O_RDWR without checking
> that you have access to the file.

It doesn't.

Here's what I did to reproduce:

$ mkdir /tmp/xyz
$ touch /tmp/xyz/123
$ tail -f /tmp/xyz/123

And in another terminal:

$ chmod 000 /tmp/xyz
$ echo foo > /tmp/xyz/123
bash: /tmp/xyz/123: Permission denied
$ $ echo foo > /proc/$pid_of_tail/fd/5

And the first terminal prints "foo". It fails if the file it self is
not writeable, only the access check on the path is bypassed. I still
think this is wrong.

FWIW, fcntl(F_SETFL) is documented to ignore O_RDWR etc. flags. For
/proc/PID/fd, it probably makes sense to check the current access
flags on the object, and the original open mode. Rechecking the path
seems impossible because it has unclear semantics.

The whole thing is a bit worrisome because it may turn file descriptor
information leaks into something worse.

--
Florian Weimer <[email protected]>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstra?e 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

2009-10-04 19:03:56

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Mon 2009-09-28 18:04:10, Miklos Szeredi wrote:
> On Mon, 28 Sep 2009, Andreas Dilger wrote:
> > On Sep 28, 2009 12:25 +0200, Miklos Szeredi wrote:
> > > BTW I just checked, and it is possible to re-open or promote an fd
> > > opened with O_NODE like this:
> > >
> > > char tmp[64];
> > >
> > > fd = open(filename, O_NODE | O_NOACCESS);
> > > /* ... */
> > > sprintf(tmp, "/proc/self/fd/%i", fd);
> > > fd_rw = open(tmp, O_RDWR);
> > >
> > > Now fd_rw is guaranteed to refer to the same inode as fd.
> >
> > It seems very unpleasant to require applications using O_NODE to
> > reopen files using /proc.
>
> The point of the above example was that reopening a file descriptor
> with upgraded (or downgraded) access mode is even now possible. Which
> either means:
>
> a) the current permission model under /proc/PID/fd has a security
> hole (which Jamie is worried about)

I believe its bugtraq time. Being able to reopen file with additional
permissions looks like a security problem...

Jamie, do you have some test script? And do you want your 15 minutes
of bugtraq fame? ;-).

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-10-04 22:59:33

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

Pavel Machek wrote:
> On Mon 2009-09-28 18:04:10, Miklos Szeredi wrote:
> > a) the current permission model under /proc/PID/fd has a security
> > hole (which Jamie is worried about)
>
> I believe its bugtraq time. Being able to reopen file with additional
> permissions looks like a security problem...
>
> Jamie, do you have some test script? And do you want your 15 minutes
> of bugtraq fame? ;-).

$ mkdir secret
$ exec 3>> secret/appendonly.txt
$ chmod 000 secret # This is not changed during do_stuff
jamie@amile:~/test$ echo START OF LOG 1>&3
$ do_stuff 1>&3
cat: secret/appendonly.txt: Permission denied # A good sign
$ chmod 755 secret
$ cat secret/appendonly.txt # Let's see our log
nothing to see here # What's that doing there??!

You can re-open a deleted file with increased permisssions. That's
probably more subversive:

$ exec >>appendonlydeleted.txt
$ exec 4<appendonlydeleted.txt # I'll read it later.
$ echo START OF LOG
$ ./do_stuff
$ cat 0<&4 >/dev/tty
nothing to see here # How did they do that?!

How'd it happen?

do_stuff:
#!/bin/sh
echo some text getting logged...
echo more text...
echo no wait, let\'s subvert the append flag!
echo nothing to see here >/proc/self/fd/1

If /proc/self/fd/1 were a _real_ symbolic link, that wouldn't work.

The reopen does check the inode permission, but it does not require
you have any reachable path to the file. Someone _might_ use that as
a traditional unix security mechanism, but if so it's probably quite rare.

If converting append-only to writable doesn't sound too bad, you can
convert read-only to writable and write-only to readable. Gaining
write access to a deleted file which you only received a read-only
descriptor for sounds dodgy to me:

$ echo secret5948043853048 >secret_readonly_password.txt
$ exec 3<secret_readonly_password.txt
$ rm secret_readonly_password.txt # Now I'm sure nobody can change it!
$ echo all your base ha ha >/proc/self/fd/3
$ cat 0<&3
all your base ha ha # Oh dear, my assumption was broken.

Did you really think you had to "chmod 444" before deleting the file?

-- Jamie

2009-10-14 13:17:34

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [PATCH] vfs: new O_NODE open flag

On Monday 28 September 2009 18:04:10 Miklos Szeredi wrote:
> The point of the above example was that reopening a file descriptor
> with upgraded (or downgraded) access mode is even now possible. Which
> either means:
>
> a) the current permission model under /proc/PID/fd has a security
> hole (which Jamie is worried about)

No worries -- access to /proc/PID/fd/* requires ptrace access to PID, so we do
not have a security hole here. The ptrace checks were introduced here:

778c1144771f0064b6f51bee865cceb0d996f2f9
df26c40e567356caeefe2861311e19c54444d917

Cheers,
Andreas

2009-10-23 17:10:33

by Pavel Machek

[permalink] [raw]
Subject: /proc filesystem allows bypassing directory permissions on Linux (was Re: [PATCH] vfs: new O_NODE open flag)

Hi!

> > > a) the current permission model under /proc/PID/fd has a security
> > > hole (which Jamie is worried about)
> >
> > I believe its bugtraq time. Being able to reopen file with additional
> > permissions looks like a security problem...
> >
> > Jamie, do you have some test script? And do you want your 15 minutes
> > of bugtraq fame? ;-).

> The reopen does check the inode permission, but it does not require
> you have any reachable path to the file. Someone _might_ use that as
> a traditional unix security mechanism, but if so it's probably quite rare.

Ok, I got this, with two users. I guess it is real (but obscure)
security hole.

So, we have this scenario. pavel is not doing anything interesting in
the background.

pavel@toy:/tmp$ uname -a
Linux toy.ucw.cz 2.6.32-rc3 #21 Mon Oct 19 07:32:02 CEST 2009 armv5tel GNU/Linux
pavel@toy:/tmp mkdir my_priv; cd my_priv
pavel@toy:/tmp/my_priv$ echo this file should never be writable > unwritable_file
# lock down directory
pavel@toy:/tmp/my_priv$ chmod 700 .
# relax file permissions, directory is private, so this is safe
# check link count on unwritable_file. We would not want someone
# to have a hard link to work around our permissions, would we?
pavel@toy:/tmp/my_priv$ chmod 666 unwritable_file
pavel@toy:/tmp/my_priv$ cat unwritable_file
this file should never be writable
pavel@toy:/tmp/my_priv$ cat unwritable_file
got you
# Security problem here

[Please pause here for a while before reading how guest did it.]

Unexpected? Well, yes, to me anyway. Linux specific? Yes, I think so.

So what did happen? User guest was able to work around directory
permissions in the background, using /proc filesystem.

guest@toy:~$ bash 3< /tmp/my_priv/unwritable_file
# Running inside nested shell
guest@toy:~$ read A <&3
guest@toy:~$ echo $A
this file should never be writable

guest@toy:~$ cd /tmp/my_priv
guest@toy:/tmp/my_priv$ ls
unwritable_file

# pavel did chmod 000, chmod 666 here
guest@toy:/tmp/my_priv$ ls
ls: cannot open directory .: Permission denied

# Linux correctly prevents guest from writing to that file
guest@toy:/tmp/my_priv$ cat unwritable_file
cat: unwritable_file: Permission denied
guest@toy:/tmp/my_priv$ echo got you >&3
bash: echo: write error: Bad file descriptor

# ...until we take a way around it with /proc filesystem. Oops.
guest@toy:/tmp/my_priv$ echo got you > /proc/self/fd/3

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html