2008-03-17 23:26:42

by Michael Tokarev

[permalink] [raw]
Subject: RFC: /dev/stdin, symlinks & permissions

I just come across an.. interesting (to me anyway) issue.

There are files - /dev/stdin, /dev/stdout & /dev/stderr -
which are handy sometimes and are available on several
*nix variants (including at least Solaris).

On Linux they're usually "implemented" as symlinks pointed
to /proc/self/fd/{0,1,2}, respectively. Which, in turn,
are symlinks pointing to the actual files.

For example, in a root ssh session, /dev/stdin may look
like (omitting details):

# ls -l /dev/stdin
/dev/stdin -> /proc/self/fd/0
# ls -l /proc/self/fd/0
/proc/self/fd/0 -> /dev/pts/0
# ls -l /dev/pts/0
crw--w---- 1 root tty 136, 0 Mar 18 02:19 /dev/pts/0

So far so good. Now, I change uid to something else, --
doing su(8) to "mjt". /proc/self changed obviously,
but stdin &Co is still here, and points to the same
/dev/pts/0. But *its* permissions/ownership did not
change! So now I can't, for example,

$ echo x > /dev/stdout
bash: /dev/stdout: Permission denied

which is quite unexpected - I for one expect /dev/stdout
to work the way very similar to /dev/tty, to mean "current
standard output regardless of any permissions etc".

For example in Solaris the whole /proc/self/fd (equivalent)
is in /dev/, and all the files in there has permissions
similar to /dev/tty:

# ls -l /dev/stdout
/dev/stdin -> ./fd/0
#ls -l /dev/fd/0
crw-rw-rw- 1 root root 306, 0 Mar 17 18:03 /dev/fd/0

To summarize. I understand where the whole thing comes from.
I understand kernel does not provide /dev/stdin &Co, this
interface is provided by distributions.

But the current way is half-broken, and it can't be corrected
from userspace.

Should kernel support something similar to other systems, less
broken than current /dev/stdin&Co symlinks?

Thanks!

/mjt


2008-03-17 23:54:53

by Andreas Schwab

[permalink] [raw]
Subject: Re: RFC: /dev/stdin, symlinks & permissions

Michael Tokarev <[email protected]> writes:

> # ls -l /dev/pts/0
> crw--w---- 1 root tty 136, 0 Mar 18 02:19 /dev/pts/0
>
> So far so good. Now, I change uid to something else, --
> doing su(8) to "mjt". /proc/self changed obviously,
> but stdin &Co is still here, and points to the same
> /dev/pts/0. But *its* permissions/ownership did not
> change! So now I can't, for example,
>
> $ echo x > /dev/stdout
> bash: /dev/stdout: Permission denied
>
> which is quite unexpected - I for one expect /dev/stdout
> to work the way very similar to /dev/tty, to mean "current
> standard output regardless of any permissions etc".

This has nothing to do with /dev/stdout. Your terminal simply does not
allow access by anyone except user root or group tty. You need to open
it up first, or mount /dev/pts with broader permissions (which is a bad
idea however).

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2008-03-18 07:24:17

by Michael Tokarev

[permalink] [raw]
Subject: Re: RFC: /dev/stdin, symlinks & permissions

Andreas Schwab wrote:
> Michael Tokarev <[email protected]> writes:
>
>> # ls -l /dev/pts/0
>> crw--w---- 1 root tty 136, 0 Mar 18 02:19 /dev/pts/0
>>
>> So far so good. Now, I change uid to something else, --
>> doing su(8) to "mjt". /proc/self changed obviously,
>> but stdin &Co is still here, and points to the same
>> /dev/pts/0. But *its* permissions/ownership did not
>> change! So now I can't, for example,
>>
>> $ echo x > /dev/stdout
>> bash: /dev/stdout: Permission denied
>>
>> which is quite unexpected - I for one expect /dev/stdout
>> to work the way very similar to /dev/tty, to mean "current
>> standard output regardless of any permissions etc".
>
> This has nothing to do with /dev/stdout. Your terminal simply does not
> allow access by anyone except user root or group tty. You need to open
> it up first, or mount /dev/pts with broader permissions (which is a bad
> idea however).

No, you don't understand. It has nothing to do with permissions
to the actual file /dev/stdin refers to, be it a tty or something
else.

In the exact same situation, I don't have to do anything with
/dev/tty (which refers to the same tty at the end) - it
Just Works (tm). Ditto on Solaris for example, --
/dev/stdin etc Just Works too. But not on Linux.

Without digging into implementation details (in this case it's
the fact that /dev/stdin in linux is implemented using a symlink
to - f.e. - /dev/tty/0 etc), I expect /dev/stdin to always work
as far as filedescriptor 0 is open, regardless of any permissions
on the actual tty (if it's a tty in the first place, which is
not necessary the case) -- exactly the same as /dev/tty works.

It's more: I can redirect stdout of a process to some file,
and at that point /dev/stdout will return ENOENT on open.

The question is:

Are /dev/stdin &Co on linux are just convenience symlinks to
know where your std* files are (if so, why they're in /dev
and why they're so INconvenient - so many levels of symlinks), --
ie, the only thing I can do with them is ls/readlink them,
or are they supposed to actually WORK, ie, I'm able to open
them, as on other systems?

Thanks.

/mjt

2008-03-18 12:55:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: RFC: /dev/stdin, symlinks & permissions

On Tue, Mar 18, 2008 at 10:24:03AM +0300, Michael Tokarev wrote:
> Without digging into implementation details (in this case it's
> the fact that /dev/stdin in linux is implemented using a symlink
> to - f.e. - /dev/tty/0 etc), I expect /dev/stdin to always work
> as far as filedescriptor 0 is open, regardless of any permissions
> on the actual tty (if it's a tty in the first place, which is
> not necessary the case) -- exactly the same as /dev/tty works.

Actually, /dev/stdin is not a symlink to the tty. It's a symlink to
/proc/self/fd/0:

% ls -lL /dev/stdin
0 crw--w---- 1 tytso tty 136, 1 2008-03-18 08:30 /dev/stdin

The problem is that /proc/self/fd/0 is a symlink open file in
question, and so *it* is a symlink to /dev/pts/0.

The main issue is that at the moment, when you open /proc/self/fd/X,
what you get is a new struct file, since the inode is opened a second
time. That is why you have to go through the access control checks a
second time, and why there are issues when you have /dev/stdin
pointing to a tty which was owned by user 1, and then when you su to
user 2, you get a "permission denied" error.

On other operating systems, opening /proc/self/fd/X gives you a
duplicate of the file descriptor. That means that the seek pointer is
also duplicated. This has been remarked upon before. Linux 1.2 did
things "right" (as in, the same as Plan 9 and Solaris), but it was
changed in Linux 2.0. Please see:

http://www.ussg.iu.edu/hypermail/linux/kernel/9609.2/0371.html

and four years later:

http://www.ussg.iu.edu/hypermail/linux/kernel/0002.3/1022.html
http://www.ussg.iu.edu/hypermail/linux/kernel/0002.3/1250.html

I don't see a mention of it in 2004, so I guess that broke the 4 year
cycle, but here it is once again in 2008. :-)

- Ted

2008-03-19 19:29:45

by Theodore Ts'o

[permalink] [raw]
Subject: Re: RFC: /dev/stdin, symlinks & permissions

On Tue, Mar 18, 2008 at 02:32:22PM +0000, Al Viro wrote:
> The real issue is that it was not Plan 9 semantics to start with.
>
> See 9/port/devproc.c and 9/port/devdup.c; the former is procfs and
> while it does have <pid>/fd, the sucker is not a directory - it's
> a text file containing (more or less) the pathnames of opened files
> of that process. The latter is an entirely different thing - it's
> a separate filesystem (#d instead of #p, FWIW). There you have
> per-descriptor files to open and yes, that'll give you dup(). What
> you do not have there is per-process part.
>
> IOW, you can get pathnames of opened files for other processes via
> procfs *AND* you can get open-that-does-only-dup for files in your
> descriptor table - on a separate filesystem.

Well, what we did was to make readlink() return the pathname, and
open() return a dup of the filesystem. I thought that was pretty
clever at the time, actually.

Yes, it wasn't completely the plan 9 semantics, but in terms of what
happened with when you opened the filesystem, it was equivalent to
dupfs.

Maybe our mistake was to make /dev/fd a symlink to /proc/self/fd, and
/dev/stdin a symlink to /proc/self/fd/0, et. al, since we don't get
the semantics exactly right compard to other operating systems.

> 1.2 tried to mix both. I'm not actually sure that it was a good idea wrt
> security, while we are at it...

What is the security problem that you are worried about? That it
might leak the pathname to someone who had an open file handle to the
file? That doesn't seem like a huge deal to me....

> We could implement Plan 9 style dupfs, but to do that without excessive
> ugliness we'd need to change prototype of ->open() - it must be able to
> return a reference to struct file different from anything it got from
> caller; probably the least painful way would be to make it return
> NULL => success, use struct file passed to ->open()
> ERR_PTR(-err) => error
> pointer to struct file => success, caller should drop the
> reference to struct file it had passed to ->open() and use the return value.
> Still a mind-boggling amount of churn - probably too much to bother with.

Yeah, ouch. The only other way to do it would be to add a new
function pointer to the file_operations() field which would only be
used filled in by procfs inodes, and then have the sys_open() routine
call that function pointer if open() was zero. But that would be
quite ugly....

- Ted

2008-03-19 21:21:36

by Al Viro

[permalink] [raw]
Subject: Re: RFC: /dev/stdin, symlinks & permissions

On Tue, Mar 18, 2008 at 08:54:45AM -0400, Theodore Tso wrote:

> The main issue is that at the moment, when you open /proc/self/fd/X,
> what you get is a new struct file, since the inode is opened a second
> time. That is why you have to go through the access control checks a
> second time, and why there are issues when you have /dev/stdin
> pointing to a tty which was owned by user 1, and then when you su to
> user 2, you get a "permission denied" error.
>
> On other operating systems, opening /proc/self/fd/X gives you a
> duplicate of the file descriptor. That means that the seek pointer is
> also duplicated. This has been remarked upon before. Linux 1.2 did
> things "right" (as in, the same as Plan 9 and Solaris), but it was
> changed in Linux 2.0. Please see:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/9609.2/0371.html

The real issue is that it was not Plan 9 semantics to start with.

See 9/port/devproc.c and 9/port/devdup.c; the former is procfs and
while it does have <pid>/fd, the sucker is not a directory - it's
a text file containing (more or less) the pathnames of opened files
of that process. The latter is an entirely different thing - it's
a separate filesystem (#d instead of #p, FWIW). There you have
per-descriptor files to open and yes, that'll give you dup(). What
you do not have there is per-process part.

IOW, you can get pathnames of opened files for other processes via
procfs *AND* you can get open-that-does-only-dup for files in your
descriptor table - on a separate filesystem.

1.2 tried to mix both. I'm not actually sure that it was a good idea wrt
security, while we are at it...

We could implement Plan 9 style dupfs, but to do that without excessive
ugliness we'd need to change prototype of ->open() - it must be able to
return a reference to struct file different from anything it got from
caller; probably the least painful way would be to make it return
NULL => success, use struct file passed to ->open()
ERR_PTR(-err) => error
pointer to struct file => success, caller should drop the
reference to struct file it had passed to ->open() and use the return value.
Still a mind-boggling amount of churn - probably too much to bother with.

PS: from Plan 9 proc(3) [they use section 3 for kernel filesystems]:
The read-only fd file lists the open file descriptors of the process.
The first line of the file is its current directory; subsequent lines
list, one per line, the open files, giving the decimal file descriptor
number; whether the file is open for read (r), write, (w), or both (rw);
the type, device number, and qid of the file; its I/O unit (the amount
of data that may be transferred on the file as a contiguous piece; see
iounit(2)), its I/O offset; and its name at the time it was opened.

2008-03-23 04:37:24

by Denys Vlasenko

[permalink] [raw]
Subject: Re: RFC: /dev/stdin, symlinks & permissions

On Tuesday 18 March 2008 15:32, Al Viro wrote:
> On Tue, Mar 18, 2008 at 08:54:45AM -0400, Theodore Tso wrote:
>
> > The main issue is that at the moment, when you open /proc/self/fd/X,
> > what you get is a new struct file, since the inode is opened a second
> > time. That is why you have to go through the access control checks a
> > second time, and why there are issues when you have /dev/stdin
> > pointing to a tty which was owned by user 1, and then when you su to
> > user 2, you get a "permission denied" error.
> >
> > On other operating systems, opening /proc/self/fd/X gives you a
> > duplicate of the file descriptor. That means that the seek pointer is
> > also duplicated. This has been remarked upon before. Linux 1.2 did
> > things "right" (as in, the same as Plan 9 and Solaris), but it was
> > changed in Linux 2.0. Please see:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/9609.2/0371.html
>
> The real issue is that it was not Plan 9 semantics to start with.
>
> See 9/port/devproc.c and 9/port/devdup.c; the former is procfs and
> while it does have <pid>/fd, the sucker is not a directory - it's
> a text file containing (more or less) the pathnames of opened files
> of that process. The latter is an entirely different thing - it's
> a separate filesystem (#d instead of #p, FWIW). There you have
> per-descriptor files to open and yes, that'll give you dup(). What
> you do not have there is per-process part.

/me puts his admin hat on

This issue (that /proc/self/fd/0,1,2 don't always work)
is a real problem. I was bitten by it more than once, thrying to do
something like:

setuidgid http_user httpd --log-to-file /proc/self/fd/2

Doesn't work. Which is sort of stupid - I _already_
have fd 2 open, what's the point in prohibiting me from
opening it again?

(As to why: there are lots of software which insist of logging
either to syslog or the file, whereas I really prefer to log
to stdout/stderr.)

> We could implement Plan 9 style dupfs, but to do that without excessive
> ugliness we'd need to change prototype of ->open() - it must be able to
> return a reference to struct file different from anything it got from
> caller; probably the least painful way would be to make it return

I am not an expert, so my question might be stupid, but:
can open("/proc/PID/fd/N") be special-cased to always succeed
if PID = current process' PID and fd N is already open?
--
vda

2008-03-23 16:51:35

by H. Peter Anvin

[permalink] [raw]
Subject: Re: RFC: /dev/stdin, symlinks & permissions

Theodore Tso wrote:
>
> Maybe our mistake was to make /dev/fd a symlink to /proc/self/fd, and
> /dev/stdin a symlink to /proc/self/fd/0, et. al, since we don't get
> the semantics exactly right compard to other operating systems.
>

No, our mistake was doing broken semantics and thinking they were good
enough.

>> 1.2 tried to mix both. I'm not actually sure that it was a good idea wrt
>> security, while we are at it...
>
> What is the security problem that you are worried about? That it
> might leak the pathname to someone who had an open file handle to the
> file? That doesn't seem like a huge deal to me....
>
>> We could implement Plan 9 style dupfs, but to do that without excessive
>> ugliness we'd need to change prototype of ->open() - it must be able to
>> return a reference to struct file different from anything it got from
>> caller; probably the least painful way would be to make it return
>> NULL => success, use struct file passed to ->open()
>> ERR_PTR(-err) => error
>> pointer to struct file => success, caller should drop the
>> reference to struct file it had passed to ->open() and use the return value.
>> Still a mind-boggling amount of churn - probably too much to bother with.
>
> Yeah, ouch. The only other way to do it would be to add a new
> function pointer to the file_operations() field which would only be
> used filled in by procfs inodes, and then have the sys_open() routine
> call that function pointer if open() was zero. But that would be
> quite ugly....
>

There is, at least theoretically speaking, another reason to do this: it
would allow a device driver that makes userspace upcalls a much cleaner
way to say "you really want this thing over there" by simply opening in
userspace and passing down the file descriptor.

My suggestion for how to implement this would be to librarize the
allocation of a new file structure, and make it a new ->alloc_open()
method. The default implementation of ->alloc_open() would be (VERY
VERY simplified, obviously):

alloc_open(inode)
{
struct file *file = allocate_new_file();
inode->ops->open(file);
return file;
}

-hpa