LinuxLists.cc - [PATCH v3] vfs: new O

2009-12-02 16:17:34

Subject: [PATCH v3] vfs: new O_NODE open flag

v2->v3 slightly updated patch description

Thanks to Alan for the feedback. The main points raised were I think:

1) There's a security hole with dynamicly allocated devices if
permissions on new device are difference than on old device.

The issue is valid, but also exists if hard links are created to
device nodes. udev already defends against this by setting
permissions on device to zero before unlinking it.

2) We should go through each filesystem and allow O_NODE on a case by
case basis along the way.

I don't see the reason, open(.., O_NODE) like chdir() doesn't have a
filesystem callback, because it doesn't do anything to the filesystem,
neither does it create any state in the filesystem itself. The only
state is a reference to the dentry/vfsmount for the file.

3) There's an alleged security hole (commonly referred to as "Pavel's
issue" :) with reopening for write (or truncating) a file desciptor
through /proc/P/fd for a file descriptor opened for read-only.

This patch doens't change any of that except the file opened without
any permission can also be re-opened with increased permissions, as
long as i_mode allows. I think this is an othogonal issue and so this
patch doesn't deal with it.

Comments? Any chance of this being accepted into -mm?

Thanks,
Miklos
---

Subject: [PATCH v2] vfs: new O_NODE open flag
From: Miklos Szeredi <[email protected]>

This patch adds a new open flag, O_NODE. This flag means: open just
the filesystem node instead of the object referenced by the node.

This is in some ways similar to creating a hard link, the file
descriptor allows alternateive access to the inode, except no actual
link is created in the filesystem (and so works on read-only mounts as
well).

This also allows for a couple of new things:

- opening a special file (device/socket/fifo) without side effects

- opening a symlink

- opening any type of file without any permission is also possible,
of course the resuling file descriptor may not be read or written

The above allows fstat(), fchmod(), ioctl(), etc to be used for files
previously not possible. AFS for example wanted a "pioctl()" syscall
for various things, but the same can be done without having to define
a new syscall:

fd = open(path, O_NODE);
ioctl(fd, ...);
close(fd);

Another important use would be opening a directory without read
permission (see O_SEARCH from the POSIX draft for what this is useful
for). O_NODE is not quite equivalent to O_SEARCH, but similar and
equally useful.

Opening a file will not call the driver's ->open() method and will not
have any side effect other than referencing the dentry/vfsmount from
the struct file pointer.

Currently O_NODE can only be used together with O_NOACCESS (value 3),
meaning that read(2) and write(2) will return EBADF and no permission
is necessary to open the file. This is logical for device nodes and
fifos. For directories we could allow O_RDONLY, and for regular files
any access mode, but this is not done yet.

O_NODE used together with O_NOFOLLOW will open a symlink node itself
instead of returning ELOOP. O_NOFOLLOW will not prevent following
mountpoints if used together with O_NODE. In theory we could allow
O_RDONLY for symlinks too and return the contents of the link on
read(2). Permissions on symlinks cannot be changed, so make fchmod()
fail with ELOOP on such a file descriptor.

Filesystems which want to install special file operations for O_NODE
opens (e.g. ioctl) may set S_OPEN_NODE flag in the inode. This will
cause dentry_open() to call inode->i_fop->open, and the filesystem is
responsible for handling the O_NODE flag and installing the right
filesystem operations in file->f_op.

Signed-off-by: Miklos Szeredi <[email protected]>
Acked-by: David Howells <[email protected]>
---
fs/file_table.c | 6 ++--
fs/locks.c | 3 ++
fs/namei.c | 65 ++++++++++++++++++++++++++------------------
fs/open.c | 17 ++++++++++-
include/asm-generic/fcntl.h | 4 ++
include/linux/fs.h | 1
6 files changed, 67 insertions(+), 29 deletions(-)

Index: linux-2.6/fs/file_table.c
===================================================================
--- linux-2.6.orig/fs/file_table.c 2009-12-02 16:37:15.000000000 +0100
+++ linux-2.6/fs/file_table.c 2009-12-02 16:37:22.000000000 +0100
@@ -281,8 +281,10 @@ void __fput(struct file *file)
file->f_op->release(inode, file);
security_file_free(file);
ima_file_free(file);
- if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL))
- cdev_put(inode->i_cdev);
+ if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL)) {
+ if (!(file->f_flags & O_NODE))
+ cdev_put(inode->i_cdev);
+ }
fops_put(file->f_op);
put_pid(file->f_owner.pid);
file_kill(file);
Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c 2009-12-02 16:37:15.000000000 +0100
+++ linux-2.6/fs/locks.c 2009-12-02 16:37:22.000000000 +0100
@@ -1183,6 +1183,9 @@ int __break_lease(struct inode *inode, u
unsigned long break_time;
int i_have_this_lease = 0;

+ if (!(mode & (FMODE_READ | FMODE_WRITE)))
+ return 0;
+
new_fl = lease_alloc(NULL, mode & FMODE_WRITE ? F_WRLCK : F_RDLCK);

lock_kernel();
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c 2009-12-02 16:37:15.000000000 +0100
+++ linux-2.6/fs/namei.c 2009-12-02 16:37:22.000000000 +0100
@@ -1160,7 +1160,9 @@ static int path_lookup_open(int dfd, con
nd->intent.open.file = filp;
nd->intent.open.flags = open_flags;
nd->intent.open.create_mode = 0;
- err = do_path_lookup(dfd, name, lookup_flags|LOOKUP_OPEN, nd);
+ if (!(open_flags & O_NODE))
+ lookup_flags |= LOOKUP_OPEN;
+ err = do_path_lookup(dfd, name, lookup_flags, nd);
if (IS_ERR(nd->intent.open.file)) {
if (err == 0) {
err = PTR_ERR(nd->intent.open.file);
@@ -1511,22 +1513,27 @@ int may_open(struct path *path, int acc_
if (!inode)
return -ENOENT;

- switch (inode->i_mode & S_IFMT) {
- case S_IFLNK:
- return -ELOOP;
- case S_IFDIR:
- if (acc_mode & MAY_WRITE)
- return -EISDIR;
- break;
- case S_IFBLK:
- case S_IFCHR:
- if (path->mnt->mnt_flags & MNT_NODEV)
+ if ((flag & O_NODE)) {
+ if (acc_mode & (MAY_READ | MAY_WRITE))
return -EACCES;
- /*FALLTHRU*/
- case S_IFIFO:
- case S_IFSOCK:
- flag &= ~O_TRUNC;
- break;
+ } else {
+ switch (inode->i_mode & S_IFMT) {
+ case S_IFLNK:
+ return -ELOOP;
+ case S_IFDIR:
+ if (acc_mode & MAY_WRITE)
+ return -EISDIR;
+ break;
+ case S_IFBLK:
+ case S_IFCHR:
+ if (path->mnt->mnt_flags & MNT_NODEV)
+ return -EACCES;
+ /*FALLTHRU*/
+ case S_IFIFO:
+ case S_IFSOCK:
+ flag &= ~O_TRUNC;
+ break;
+ }
}

error = inode_permission(inode, acc_mode);
@@ -1639,14 +1646,16 @@ out_unlock:
* 11 - read-write
* for the internal routines (ie open_namei()/follow_link() etc)
* This is more logical, and also allows the 00 "no perm needed"
- * to be used for symlinks (where the permissions are checked
- * later).
+ * to be used for opening a symlink, pipe, socket or device with
+ * O_NODE.
*
*/
static inline int open_to_namei_flags(int flag)
{
if ((flag+1) & O_ACCMODE)
flag++;
+ else if (flag & O_NODE)
+ flag &= ~O_ACCMODE;
return flag;
}

@@ -1734,7 +1743,9 @@ struct file *do_filp_open(int dfd, const
nd.intent.open.create_mode = mode;
dir = nd.path.dentry;
nd.flags &= ~LOOKUP_PARENT;
- nd.flags |= LOOKUP_CREATE | LOOKUP_OPEN;
+ nd.flags |= LOOKUP_CREATE;
+ if (!(open_flag & O_NODE))
+ nd.flags |= LOOKUP_OPEN;
if (flag & O_EXCL)
nd.flags |= LOOKUP_EXCL;
mutex_lock(&dir->d_inode->i_mutex);
@@ -1793,19 +1804,24 @@ do_last:

if (__follow_mount(&path)) {
error = -ELOOP;
- if (flag & O_NOFOLLOW)
+ if ((flag & O_NOFOLLOW) & !(flag & O_NODE))
goto exit_dput;
}

error = -ENOENT;
if (!path.dentry->d_inode)
goto exit_dput;
- if (path.dentry->d_inode->i_op->follow_link)
- goto do_link;
+ if (path.dentry->d_inode->i_op->follow_link) {
+ if (!(flag & O_NOFOLLOW))
+ goto do_link;
+ error = -ELOOP;
+ if (!(flag & O_NODE))
+ goto exit_dput;
+ }

path_to_nameidata(&path, &nd);
error = -EISDIR;
- if (path.dentry->d_inode && S_ISDIR(path.dentry->d_inode->i_mode))
+ if (S_ISDIR(path.dentry->d_inode->i_mode))
goto exit;
ok:
/*
@@ -1859,9 +1875,6 @@ exit_parent:
return ERR_PTR(error);

do_link:
- error = -ELOOP;
- if (flag & O_NOFOLLOW)
- goto exit_dput;
/*
* This is subtle. Instead of calling do_follow_link() we do the
* thing by hands. The reason is that this way we have zero link_count
Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c 2009-12-02 16:37:15.000000000 +0100
+++ linux-2.6/fs/open.c 2009-12-02 16:37:22.000000000 +0100
@@ -611,6 +611,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd
dentry = file->f_path.dentry;
inode = dentry->d_inode;

+ /* fchmod not allowed for symlinks opened with O_NODE */
+ err = -ELOOP;
+ if (S_ISLNK(inode->i_mode))
+ goto out_putf;
+
audit_inode(NULL, dentry);

err = mnt_want_write_file(file);
@@ -804,12 +809,17 @@ static inline int __get_file_write_acces
return error;
}

+static const struct file_operations default_node_fops = {
+ .llseek = no_llseek,
+};
+
static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
int flags, struct file *f,
int (*open)(struct inode *, struct file *),
const struct cred *cred)
{
struct inode *inode;
+ const struct file_operations *fops;
int error;

f->f_flags = flags;
@@ -828,7 +838,12 @@ static struct file *__dentry_open(struct
f->f_path.dentry = dentry;
f->f_path.mnt = mnt;
f->f_pos = 0;
- f->f_op = fops_get(inode->i_fop);
+
+ fops = inode->i_fop;
+ if ((flags & O_NODE) && !(inode->i_flags & S_OPEN_NODE))
+ fops = &default_node_fops;
+ f->f_op = fops_get(fops);
+
file_move(f, &inode->i_sb->s_files);

error = security_dentry_open(f, cred);
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h 2009-12-02 16:37:15.000000000 +0100
+++ linux-2.6/include/asm-generic/fcntl.h 2009-12-02 16:37:22.000000000 +0100
@@ -9,6 +9,7 @@
#define O_RDONLY 00000000
#define O_WRONLY 00000001
#define O_RDWR 00000002
+#define O_NOACCESS 00000003
#ifndef O_CREAT
#define O_CREAT 00000100 /* not fcntl */
#endif
@@ -51,6 +52,9 @@
#ifndef O_CLOEXEC
#define O_CLOEXEC 02000000 /* set close_on_exec */
#endif
+#ifndef O_NODE
+#define O_NODE 04000000 /* open filesystem node, not device */
+#endif
#ifndef O_NDELAY
#define O_NDELAY O_NONBLOCK
#endif
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h 2009-11-30 16:38:23.000000000 +0100
+++ linux-2.6/include/linux/fs.h 2009-12-02 16:37:22.000000000 +0100
@@ -231,6 +231,7 @@ struct inodes_stat_t {
#define S_NOCMTIME 128 /* Do not update file c/mtime */
#define S_SWAPFILE 256 /* Do not truncate: swapon got its bmaps */
#define S_PRIVATE 512 /* Inode is fs-internal */
+#define S_OPEN_NODE 1024 /* Fs is prepared for O_NODE opens */

/*
* Note that nosuid etc flags are inode-specific: setting some file-system

2009-12-02 19:14:29

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

> 1) There's a security hole with dynamicly allocated devices if
> permissions on new device are difference than on old device.
>
> The issue is valid, but also exists if hard links are created to
> device nodes. udev already defends against this by setting
> permissions on device to zero before unlinking it.

udev defends against it with the specific knowledge that any existing
open means the device is open and cannot be unloaded. The combination is
required (and some other happenstance properties).

For O_NODE you must implement revoke() as well and get it into tools like
udev before you are safe. I appreciate "you need revoke" is a bit like
saying "there is one small problem, you just need to reimplement a major
subsystem while you are at it", but from a security perspective I don't
see any other way to make O_NODE safe in this situation.

Alan

2009-12-02 20:14:16

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Wed, 2 Dec 2009, Alan Cox wrote:
> > 1) There's a security hole with dynamicly allocated devices if
> > permissions on new device are difference than on old device.
> >
> > The issue is valid, but also exists if hard links are created to
> > device nodes. udev already defends against this by setting
> > permissions on device to zero before unlinking it.
>
> udev defends against it with the specific knowledge that any existing
> open means the device is open and cannot be unloaded. The combination is
> required (and some other happenstance properties).

You're still missing the point. O_NODE is like a hard link, except
the reference doesn't come from the filesystem but from a file
descriptor. From udev's perspective there's no difference.

Miklos

2009-12-02 20:47:02

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

[Ridiculous cc list cleaned]

> > udev defends against it with the specific knowledge that any existing
> > open means the device is open and cannot be unloaded. The combination is
> > required (and some other happenstance properties).
>
> You're still missing the point. O_NODE is like a hard link, except
> the reference doesn't come from the filesystem but from a file
> descriptor. From udev's perspective there's no difference.

I don't think I am missing the point here. You have a reference to an
object in the fs but you don't have a reference to the driver underneath
s the driver can change on you *while* you have the O_NODE open and fd
live. That cannot happen with a hard link and open.

It isn't the same thing as far as I can see. You don't have the barrier
between the operations that occurs in the real open/close case because
they lock the driver.

2009-12-03 05:46:34

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Wed, 2 Dec 2009, Alan Cox wrote:
> > You're still missing the point. O_NODE is like a hard link, except
> > the reference doesn't come from the filesystem but from a file
> > descriptor. From udev's perspective there's no difference.
>
> I don't think I am missing the point here. You have a reference to an
> object in the fs but you don't have a reference to the driver underneath
> s the driver can change on you *while* you have the O_NODE open and fd
> live. That cannot happen with a hard link and open.
>
> It isn't the same thing as far as I can see. You don't have the barrier
> between the operations that occurs in the real open/close case because
> they lock the driver.

The file descriptor opened with O_NODE allows exaclactly the same
operations that a hard link to the device would, nothing more. It's
just a link to the *node*, except it doesn't increment the link count,
the driver is irrelevant.

Miklos

2009-12-05 14:50:41

by Andrew Lutomirski

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

Miklos Szeredi wrote:
> On Wed, 2 Dec 2009, Alan Cox wrote:
>>> You're still missing the point. O_NODE is like a hard link, except
>>> the reference doesn't come from the filesystem but from a file
>>> descriptor. From udev's perspective there's no difference.
>> I don't think I am missing the point here. You have a reference to an
>> object in the fs but you don't have a reference to the driver underneath
>> s the driver can change on you *while* you have the O_NODE open and fd
>> live. That cannot happen with a hard link and open.
>>
>> It isn't the same thing as far as I can see. You don't have the barrier
>> between the operations that occurs in the real open/close case because
>> they lock the driver.
>
> The file descriptor opened with O_NODE allows exaclactly the same
> operations that a hard link to the device would, nothing more. It's
> just a link to the *node*, except it doesn't increment the link count,
> the driver is irrelevant.
>

I don't know what that means. Do you mean that if:

root creates /dev/foo with 0666 perms
eviluser opens /dev/foo with O_NODE
root chmods /dev/foo to 0000
root unlinks /dev/foo

then eviluser can't open /proc/self/fd/whatever for O_RDRW

Because if eviluser could still open /proc/self/fd/whatever for O_RDRW
(or anything else for that matter if O_NODE isn't set) then you have a
security problem.

--Andy

2009-12-05 19:40:49

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Sat, 05 Dec 2009, Andy Lutomirski wrote:
> I don't know what that means. Do you mean that if:
>
> root creates /dev/foo with 0666 perms
> eviluser opens /dev/foo with O_NODE

More precisely, O_NODE | O_NOACCESS

> root chmods /dev/foo to 0000
> root unlinks /dev/foo
>
> then eviluser can't open /proc/self/fd/whatever for O_RDRW

Yes.

Maybe alan was worried about the O_NODE | O_RDWR, etc. case? That
simply doesn't make any sense for special files.

Current patch only allows O_NOACCESS for any file type, but other
access modes may make sense for regular files, directories, and maybe
even symlinks.

Thanks,
Miklos

2009-12-05 20:27:12

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

> Maybe alan was worried about the O_NODE | O_RDWR, etc. case? That
> simply doesn't make any sense for special files.

I am concerned primarily about the lack of ability to get rid of such a
handle in a controlled fashion. The udev/device unload case is simply one
obvious way to exploit it.

I still don't believe O_NODE makes sense without revoke(), and even then
needs huge amounts of care on the device side of things.

2009-12-05 20:36:02

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Sat, 5 Dec 2009, Alan Cox wrote:
> I am concerned primarily about the lack of ability to get rid of such a
> handle in a controlled fashion. The udev/device unload case is simply one
> obvious way to exploit it.

I don't understand your concern. Can you please ellaborate on the way
to exploit O_NODE?

Thanks,
Miklos

2009-12-05 23:11:13

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Sat, 05 Dec 2009 21:35:55 +0100
Miklos Szeredi <[email protected]> wrote:

> On Sat, 5 Dec 2009, Alan Cox wrote:
> > I am concerned primarily about the lack of ability to get rid of such a
> > handle in a controlled fashion. The udev/device unload case is simply one
> > obvious way to exploit it.
>
> I don't understand your concern. Can you please ellaborate on the way
> to exploit O_NODE?

You end up with a handle to an object which then changes meaning if a
device is unloaded and something else loaded (or consider a pty
recreation)

In the normal udev course of things this is ok because even without
revoke udev can just about get away with it for the sole reason it knows
that the handle cannot be open in any form during the driver unload
(because of the device refcounting). You seem to break that.

2009-12-06 20:25:55

by Pavel Machek

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Wed 2009-12-02 17:16:57, Miklos Szeredi wrote:
> v2->v3 slightly updated patch description
>
> Thanks to Alan for the feedback. The main points raised were I think:
>
> 1) There's a security hole with dynamicly allocated devices if
> permissions on new device are difference than on old device.
>
> The issue is valid, but also exists if hard links are created to
> device nodes. udev already defends against this by setting
> permissions on device to zero before unlinking it.

Perhaps machine has /dev on separate filesystem, not writeable to
users?

Adding new security holes is bad...

> 3) There's an alleged security hole (commonly referred to as "Pavel's
> issue" :) with reopening for write (or truncating) a file desciptor
> through /proc/P/fd for a file descriptor opened for read-only.
>
> This patch doens't change any of that except the file opened without
> any permission can also be re-opened with increased permissions, as
> long as i_mode allows. I think this is an othogonal issue and so this
> patch doesn't deal with it.

You just made the hole way more common and easier to exploit.

> Comments? Any chance of this being accepted into -mm?

With adding 2 new security problems?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-12-07 06:08:44

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Sat, 5 Dec 2009, Alan Cox wrote:
> On Sat, 05 Dec 2009 21:35:55 +0100
> Miklos Szeredi <[email protected]> wrote:
>
> > On Sat, 5 Dec 2009, Alan Cox wrote:
> > > I am concerned primarily about the lack of ability to get rid of such a
> > > handle in a controlled fashion. The udev/device unload case is simply one
> > > obvious way to exploit it.
> >
> > I don't understand your concern. Can you please ellaborate on the way
> > to exploit O_NODE?
>
> You end up with a handle to an object which then changes meaning if a
> device is unloaded and something else loaded (or consider a pty
> recreation)

OK.

> In the normal udev course of things this is ok because even without
> revoke udev can just about get away with it for the sole reason it knows
> that the handle cannot be open in any form during the driver unload
> (because of the device refcounting). You seem to break that.

No. Udev is ok, because it already does revoke access to the device
on unloading:

:/* Reset permissions on the device node, before unlinking it to make sure,
: * that permissions of possible hard links will be removed too.
: */
:int util_unlink_secure(struct udev *udev, const char *filename)
:{
: int err;
:
: chmod(filename, 0000);
...

So I think we agree, that some sort of revoke is needed. But just
resetting the permissions is fine, there's no need to actually revoke
access for the file descriptor opened with O_NODE.

Do you agree?

Thanks,
Miklos

2009-12-07 12:21:53

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

The standard udev unload is a true open barrier so has an implicit
revoke() caused by the fact you cannot keep a handle to the filename open
during the udev sequence (or the old driver would be pinned by a refcount
and not unload).

This isn't about hard links, its about object and handle lifetimes.
Ownership is also involved in the case of things like a tty device (so if
you can fchmod down the handle you can break the security model).

It only works because you have a true revoke (by virtue of refcounting in
the kernel driver modules)

Alan

2009-12-07 12:41:20

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon, 7 Dec 2009, Alan Cox wrote:
> The standard udev unload is a true open barrier so has an implicit
> revoke() caused by the fact you cannot keep a handle to the filename open
> during the udev sequence (or the old driver would be pinned by a refcount
> and not unload).

True, udev unload is an open barrier (modulo races), but O_NODE opens
simply don't matter in this respect, because they don't have anything
to do with the driver.

ln /dev/foo /dev/shm/my_secret_device_link
(foo is removed)
open("/dev/shm/my_secret_device_link", O_RDWR)

How is this different than keeping the device open with O_NODE?

Thanks,
Miklos

2009-12-07 12:47:58

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon, 07 Dec 2009, Miklos Szeredi wrote:
> ln /dev/foo /dev/shm/my_secret_device_link
> (foo is removed)
> open("/dev/shm/my_secret_device_link", O_RDWR)
>
> How is this different than keeping the device open with O_NODE?

In other words, revoking file handles is not enough, we really need to
revoke the _inode_. And if we do that then O_NODE handles are
perfectly harmless.

Miklos

2009-12-07 13:01:39

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

> In other words, revoking file handles is not enough, we really need to
> revoke the _inode_. And if we do that then O_NODE handles are
> perfectly harmless.

If you have revoke() you are half way there, you also the need to make
sure any user cases are updated and well established before you change
anything under them. It's not good adding a kernel feature which makes an
old udev version insecure.

2009-12-07 13:08:17

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon, 7 Dec 2009, Alan Cox wrote:
> > In other words, revoking file handles is not enough, we really need to
> > revoke the _inode_. And if we do that then O_NODE handles are
> > perfectly harmless.
>
> If you have revoke() you are half way there, you also the need to make
> sure any user cases are updated and well established before you change
> anything under them. It's not good adding a kernel feature which makes an
> old udev version insecure.

It doesn't, see example with hard link two mails up.

Alan, you are just ignoring facts and trying to push revoke(2) which
in fact doesn't have much to do with this issue. revoke(2) is about
*open devices*, O_NODE doesn't produce an open device. Don't you
understand that?

Thanks,
Miklos

2009-12-07 13:13:45

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

> Alan, you are just ignoring facts and trying to push revoke(2) which
> in fact doesn't have much to do with this issue. revoke(2) is about
> *open devices*, O_NODE doesn't produce an open device. Don't you
> understand that?

That is *exactly* the problem, which is clearly what you are missing here.

2009-12-07 13:16:42

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon, 7 Dec 2009, Alan Cox wrote:
> > Alan, you are just ignoring facts and trying to push revoke(2) which
> > in fact doesn't have much to do with this issue. revoke(2) is about
> > *open devices*, O_NODE doesn't produce an open device. Don't you
> > understand that?
>
> That is *exactly* the problem, which is clearly what you are missing here.

I don't think so, but maybe I'm wrong. Could you describe your attack
scenario in detail then, please?

Thanks,
Miklos

2009-12-07 14:11:21

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

> > That is *exactly* the problem, which is clearly what you are missing here.
>
> I don't think so, but maybe I'm wrong. Could you describe your attack
> scenario in detail then, please?

First obvious attack: get an O_NODE handle to a device you have assigned
to your ownership

while(1)
fchmod(fd, 0666);

wait for device to unload, reload and be intended for another user
Race udev to a real open. You have a similar problem with vhangup() and
ttys.

This cannot happen with the existing kernel because there cannot be an
open handle when the original device unload occurs[1] and it cannot happen
with vhangup because the hangup is basically a special case revoke()
implementation for tty devices.

O_NODE changes the whole lifetime semantics for inodes. It's not
something you can do casually. pioctl() gets this right although for the
same reason as path based chmod/chown/etc all get it right, O_NODE breaks
it all horribly.

Alan
[1] If you think about it a wait for no references is the same barrier as
a revoke but a blocking one.

2009-12-07 14:26:08

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon, 7 Dec 2009, Alan Cox wrote:
> > > That is *exactly* the problem, which is clearly what you are missing here.
> >
> > I don't think so, but maybe I'm wrong. Could you describe your attack
> > scenario in detail then, please?
>
> First obvious attack: get an O_NODE handle to a device you have assigned
> to your ownership
>
> while(1)
> fchmod(fd, 0666);
>
> wait for device to unload, reload and be intended for another user
> Race udev to a real open. You have a similar problem with vhangup() and
> ttys.

If this was a udev device, the same attack is possible with a hard
link to the device. Except the attacker simply does link() instad of
open(O_NODE) and chmod() instead of fchmod().

See?

Thanks,
Miklos

2009-12-07 14:45:04

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

> > First obvious attack: get an O_NODE handle to a device you have assigned
> > to your ownership
> >
> > while(1)
> > fchmod(fd, 0666);
> >
> > wait for device to unload, reload and be intended for another user
> > Race udev to a real open. You have a similar problem with vhangup() and
> > ttys.
>
> If this was a udev device, the same attack is possible with a hard
> link to the device. Except the attacker simply does link() instad of
> open(O_NODE) and chmod() instead of fchmod().

If your distribution does not use a separate tmpfs for devices then yes -
but that's old news - and also needs revoke() to fix. It's a requirement
of udev as it stands that it runs on its own file system and I would hope
that all distributions using udev get this right. With O_NODE I don't
need to make the hardlink and all the "same as a hardlink" justifications
come crashing down.

This always comes back to the same thing - you need revoke() on device
files. The general case revoke is much more questionable (and indeed
almost every Unixen with revoke does not support revoke of files)

So we are back where we started - a prerequisite for O_NODE is revoke()
at least for device files and possibly for O_NODE opens on normal files.
O_NODE isn't per-se the problem, the problem is the lack of revoke() - it
just gets caught up in the fallout along with many other things including
all sorts of stuff the GUI folks want to do but cannot currently provide.

Alan

2009-12-07 15:03:49

by Andrew Lutomirski

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon, Dec 7, 2009 at 9:13 AM, Alan Cox <[email protected]> wrote:
>> > That is *exactly* the problem, which is clearly what you are missing here.
>>
>> I don't think so, but maybe I'm wrong. ?Could you describe your attack
>> scenario in detail then, please?
>
> First obvious attack: get an O_NODE handle to a device you have assigned
> to your ownership
>
> ? ? ? ?while(1)
> ? ? ? ? ? ? ? ?fchmod(fd, 0666);
>
> wait for device to unload, reload and be intended for another user
> Race udev to a real open. You have a similar problem with vhangup() and
> ttys.

Huh? I would've thought that udev would (and already does?), on
device unload, chown to 0:0, then chmod to 0000, then unlink, in which
case that attack doesn't work.

If you mean that someone could have an O_NODE handle open, then the
device unregisters, then, before udev has a chance to do anything at
all, a new device w/ the same major/minor numbers appears, then the
O_NODE handle owner upgrades to a real fd, then we have worse worries:
the attacker could just open the device node on disk without O_NODE,
hard links, or any other funny tricks. revoke() wouldn't fix that
either.

I'll admit that O_NODE scares me a bit from a security perspective,
but so do hard links and /proc/self/fd in general, and I'm not really
convinced that there are any new attacks here.

Would you be okay with a patch that prevented opening
/proc/self/fd/xxx on O_NODE handles? I personally don't care about
O_NODE all that much, but I'd like a decent in-kernel AFS
implementation (and a decent revoke() implementation, and especially
the ability to revoke whole filesystems would be really nice too).

--Andy

>
> This cannot happen with the existing kernel because there cannot be an
> open handle when the original device unload occurs[1] and it cannot happen
> with vhangup because the hangup is basically a special case revoke()
> implementation for tty devices.
>
> O_NODE changes the whole lifetime semantics for inodes. It's not
> something you can do casually. pioctl() gets this right although for the
> same reason as path based chmod/chown/etc all get it right, O_NODE breaks
> it all horribly.
>
> Alan
> [1] If you think about it a wait for no references is the same barrier as
> a revoke but a blocking one.
>

2009-12-07 15:11:12

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon, 7 Dec 2009, Alan Cox wrote:
> > > First obvious attack: get an O_NODE handle to a device you have assigned
> > > to your ownership
> > >
> > > while(1)
> > > fchmod(fd, 0666);
> > >
> > > wait for device to unload, reload and be intended for another user
> > > Race udev to a real open. You have a similar problem with vhangup() and
> > > ttys.
> >
> > If this was a udev device, the same attack is possible with a hard
> > link to the device. Except the attacker simply does link() instad of
> > open(O_NODE) and chmod() instead of fchmod().
>
> If your distribution does not use a separate tmpfs for devices then yes -
> but that's old news - and also needs revoke() to fix.

How does revoke() fix the hard link attack? It revokes file
descriptors, doesn't it?

> It's a requirement
> of udev as it stands that it runs on its own file system and I would hope
> that all distributions using udev get this right.

Not the one I'm running.

> With O_NODE I don't
> need to make the hardlink and all the "same as a hardlink" justifications
> come crashing down.

Well, yes. That's true. But I still don't think revoke() is the
answer here. For example even without the possibility of hard links
there's still a race in udev in the following course of events:

open("/dev/foo", O_RDWR)
... open passes permission checks
... driver gets unloaded
... driver intended for other user gets loaded
... open finds new driver

revoke() isn't going to help there, there's no open file descriptor
yet, it's only in the making.

What we really need is to revoke the *inode*, so that it cannot be
opened any more. Doing it with unlink() and revoke() and requiring
that link() does not work on the filesystem is a poor and racy
substitute for that.

Miklos

2009-12-07 15:51:06

by Miklos Szeredi

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon, 7 Dec 2009, Andrew Lutomirski wrote:
> On Mon, Dec 7, 2009 at 9:13 AM, Alan Cox <[email protected]> wrote:
> > First obvious attack: get an O_NODE handle to a device you have assigned
> > to your ownership
> >
> > while(1)
> > fchmod(fd, 0666);
> >
> > wait for device to unload, reload and be intended for another user
> > Race udev to a real open. You have a similar problem with vhangup() and
> > ttys.
>
> Huh? I would've thought that udev would (and already does?), on
> device unload, chown to 0:0, then chmod to 0000, then unlink, in which
> case that attack doesn't work.

Git version of udev does:

chmod(filename, 0000);
chown(filename, 0, 0);
err = unlink(filename);

It should probably do it the other way round, which is how it was
originally, until this commit messed it up:

commit 39087d3bdd0b5195c2570a4f858b88a82d42a066
Author: Kay Sievers <[email protected]>
Date: Sat Aug 29 16:10:24 2009 +0200

util_unlink_secure(): chmod() before chown()

Suggested by Florian Zumbiehl <[email protected]>.

And the thread where it came from:

http://markmail.org/thread/ozwcbju52yb3qs5d

where the poster actually warned Kay that it was wrong...

> Would you be okay with a patch that prevented opening
> /proc/self/fd/xxx on O_NODE handles?

We can't sanely do that.

Thanks,
Miklos

2009-12-07 17:14:59

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

> > ? ? ? ?while(1)
> > ? ? ? ? ? ? ? ?fchmod(fd, 0666);
> >
> > wait for device to unload, reload and be intended for another user
> > Race udev to a real open. You have a similar problem with vhangup() and
> > ttys.
>
> Huh? I would've thought that udev would (and already does?), on
> device unload, chown to 0:0, then chmod to 0000, then unlink, in which
> case that attack doesn't work.

udev doesn't control the device unload/reload. It responds to messages
from the kernel which are to some extent asynchronous to actual events.
It may be ok if udev is very careful but the fact it requires a close
inspection of the kernel and user space sides doesn't bode well (with or
without O_NODE). The fact we currently have an implied revoke by the
device refcounts is a big helper at the moment.

The tty cases using vhangup() assume that the handle is killed and would
also need addressing.

> Would you be okay with a patch that prevented opening
> /proc/self/fd/xxx on O_NODE handles? I personally don't care about

I'd like to see what Al Viro has to say on the subject first.
The /proc/self stuff bothers me less - I've not seen a convincing
description of it being misuable where ptrace wouldn't allow the same
actions. Even the constructed scenarios share that property.

> O_NODE all that much, but I'd like a decent in-kernel AFS
> implementation (and a decent revoke() implementation, and especially
> the ability to revoke whole filesystems would be really nice too).

The AFS case is probably the easier one - its things like device files
where one handle can change completely what it references (due to device
loads/unloads and dynamic major/minor assignment) that make it evil.

CIFS/SMB is horrible for different reasons (a handle open on some piece
of namespace isn't going to always been the same actual file) but you
could simply decide CIFS/SMB and any other problematic cases don't
support it.

I don't really have a problem with it providing its restricted to
ordinary files on a file system where having a local inode reference
means you have a stable reference to an object on the remote system or
the local media.

The way to start this is firstly to convince Al Viro (always a good
sanity check), and then to start with the obviously safe cases only -
regular files, only file systems with stable inode references.

Devices are hard - why do we need O_NODE on devices anyway ?

2009-12-07 17:39:34

by Alan

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

> Well, yes. That's true. But I still don't think revoke() is the
> answer here. For example even without the possibility of hard links
> there's still a race in udev in the following course of events:
>
> open("/dev/foo", O_RDWR)
> ... open passes permission checks
> ... driver gets unloaded
> ... driver intended for other user gets loaded
> ... open finds new driver

> What we really need is to revoke the *inode*, so that it cannot be
> opened any more. Doing it with unlink() and revoke() and requiring
> that link() does not work on the filesystem is a poor and racy
> substitute for that.

Can't argue with that and going through the kernel logic I don't see
anything preventing an exploit based on that from working.

Alan

2009-12-10 07:39:38

by Pavel Machek

[permalink] [raw]

Subject: Re: [PATCH v3] vfs: new O_NODE open flag

On Mon 2009-12-07 13:41:09, Miklos Szeredi wrote:
> On Mon, 7 Dec 2009, Alan Cox wrote:
> > The standard udev unload is a true open barrier so has an implicit
> > revoke() caused by the fact you cannot keep a handle to the filename open
> > during the udev sequence (or the old driver would be pinned by a refcount
> > and not unload).
>
> True, udev unload is an open barrier (modulo races), but O_NODE opens
> simply don't matter in this respect, because they don't have anything
> to do with the driver.
>
> ln /dev/foo /dev/shm/my_secret_device_link
> (foo is removed)
> open("/dev/shm/my_secret_device_link", O_RDWR)
>
> How is this different than keeping the device open with O_NODE?

First version needs writable directory on same fs as /dev, which may
not be there in all configs. (Plus note various 'security enhanced'
linuxes that do not allow hardlinks on files you don't own. Maybe
someone uses subterfugue.sf.net to disallow hardlinks.)

Plus, you can see hardlinks on ta-daa 'link count'.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html