2008-06-03 11:06:17

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

Hi Miklos,

> 2) I've found yet another divergence from the spec -- but this
> was in the original implementation, rather than being
> something that has been introduced. In do_futimes() there is
>
> if (!times && !(file->f_mode & FMODE_WRITE))
> write_error = -EACCES;
>
> However, the check here should not be against the f_mode (file access
> mode), but the against actual permission of the file referred to by
> the underlying descriptor. This means that for the do_futimes() +
> times==NULL case, a set-user-ID root program could open a file
> descriptor O_RDWR/O_WRONLY for which the real UID does not have write
> access, and then even after reverting the the effective UID, the real
> user could still update file.
>
> I'm not sure of the correct way to get the required nameidata (to do a
> vfs_permission() call) from the file descriptor. Can you give me a
> tip there?

Could you point me at the right way of doing this?

Cheers,

Michael


2008-06-03 11:13:29

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

> > I'm not sure of the correct way to get the required nameidata (to do a
> > vfs_permission() call) from the file descriptor. Can you give me a
> > tip there?
>
> Could you point me at the right way of doing this?

You don't need nameidata for this at all. Just call permission() with
a NULL nameidata.

Ugly API? Yes, will be cleaned up if we manage to find some common
ground with the VFS maintainers.

Miklos

2008-06-03 11:22:45

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

On Tue, Jun 03, 2008 at 01:13:00PM +0200, Miklos Szeredi wrote:
> > > I'm not sure of the correct way to get the required nameidata (to do a
> > > vfs_permission() call) from the file descriptor. Can you give me a
> > > tip there?
> >
> > Could you point me at the right way of doing this?
>
> You don't need nameidata for this at all. Just call permission() with
> a NULL nameidata.
>
> Ugly API? Yes, will be cleaned up if we manage to find some common
> ground with the VFS maintainers.

As soon as I'm done with sysctls...

FWIW, I very much doubt that you are right wrt required permissions, though.
AFAICS, intent here is "if you can write to file, you can touch the timestamps
anyway" and having descriptor opened for write gives that, current permissions
be damned.

2008-06-03 11:27:55

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

On Tue, Jun 3, 2008 at 1:22 PM, Al Viro <[email protected]> wrote:
> On Tue, Jun 03, 2008 at 01:13:00PM +0200, Miklos Szeredi wrote:
>> > > I'm not sure of the correct way to get the required nameidata (to do a
>> > > vfs_permission() call) from the file descriptor. Can you give me a
>> > > tip there?
>> >
>> > Could you point me at the right way of doing this?
>>
>> You don't need nameidata for this at all. Just call permission() with
>> a NULL nameidata.
>>
>> Ugly API? Yes, will be cleaned up if we manage to find some common
>> ground with the VFS maintainers.
>
> As soon as I'm done with sysctls...
>
> FWIW, I very much doubt that you are right wrt required permissions, though.
> AFAICS, intent here is "if you can write to file, you can touch the timestamps
> anyway" and having descriptor opened for write gives that, current permissions
> be damned.

The standard is pretty clear on this point:

[[
Only a process with the effective user ID equal to the user ID of the
file, or with write access to the file, or with appropriate privileges
may use futimens( ) or utimensat( ) with a null pointer as the times
argument or with both tv_nsec fields set to the special value
UTIME_NOW.
]]

The crucial words here are "a process ... with write access to the
file" -- in other words, the permissions are determined by the
process's credentials, not by the access mode of the file descriptor.
I was not 100% sure on that to start with, so I did check it out with
one of the folk at The Open Group, to make sure of my understanding.

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2008-06-03 11:30:41

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

Michael Kerrisk wrote:
> > FWIW, I very much doubt that you are right wrt required
> > permissions, though. AFAICS, intent here is "if you can write to
> > file, you can touch the timestamps anyway" and having descriptor
> > opened for write gives that, current permissions be damned.
>
> The standard is pretty clear on this point:
>
> [[
> Only a process with the effective user ID equal to the user ID of the
> file, or with write access to the file, or with appropriate privileges
> may use futimens( ) or utimensat( ) with a null pointer as the times
> argument or with both tv_nsec fields set to the special value
> UTIME_NOW.
> ]]
>
> The crucial words here are "a process ... with write access to the
> file" -- in other words, the permissions are determined by the
> process's credentials, not by the access mode of the file descriptor.
> I was not 100% sure on that to start with, so I did check it out with
> one of the folk at The Open Group, to make sure of my understanding.

Is there anything else where the file descriptor's access mode allows
doing things on Linux, but the standard requires a permissions check
each time?

-- Jamie

2008-06-03 11:39:21

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

On Tue, Jun 3, 2008 at 1:30 PM, Jamie Lokier <[email protected]> wrote:
> Michael Kerrisk wrote:
>> > FWIW, I very much doubt that you are right wrt required
>> > permissions, though. AFAICS, intent here is "if you can write to
>> > file, you can touch the timestamps anyway" and having descriptor
>> > opened for write gives that, current permissions be damned.
>>
>> The standard is pretty clear on this point:
>>
>> [[
>> Only a process with the effective user ID equal to the user ID of the
>> file, or with write access to the file, or with appropriate privileges
>> may use futimens( ) or utimensat( ) with a null pointer as the times
>> argument or with both tv_nsec fields set to the special value
>> UTIME_NOW.
>> ]]
>>
>> The crucial words here are "a process ... with write access to the
>> file" -- in other words, the permissions are determined by the
>> process's credentials, not by the access mode of the file descriptor.
>> I was not 100% sure on that to start with, so I did check it out with
>> one of the folk at The Open Group, to make sure of my understanding.
>
> Is there anything else where the file descriptor's access mode allows
> doing things on Linux, but the standard requires a permissions check
> each time?

Jamie,

I can't think of examples offhand -- but I'm also not quite sure what
your question is about. Could you say a little more?

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2008-06-03 11:49:42

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

On Tue, Jun 03, 2008 at 01:39:07PM +0200, Michael Kerrisk wrote:

> > Is there anything else where the file descriptor's access mode allows
> > doing things on Linux, but the standard requires a permissions check
> > each time?
>
> Jamie,
>
> I can't think of examples offhand -- but I'm also not quite sure what
> your question is about. Could you say a little more?

"Is anything else equally stupid?", I suspect... AFAICS, behaviour in
question is inherited from futimes(2) in one of the *BSD - nothing to
do about that now (at least 10 years too late). It's rather inconsistent
with a lot of things, starting with "why utimes(2) has weaker requirements
with NULL argument", but we are far too late to fix that.

2008-06-03 11:52:32

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

On Tue, Jun 3, 2008 at 1:13 PM, Miklos Szeredi <[email protected]> wrote:
>> > I'm not sure of the correct way to get the required nameidata (to do a
>> > vfs_permission() call) from the file descriptor. Can you give me a
>> > tip there?
>>
>> Could you point me at the right way of doing this?
>
> You don't need nameidata for this at all. Just call permission() with
> a NULL nameidata.

Thanks Miklos!

2008-06-03 11:58:20

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

On Tue, Jun 03, 2008 at 12:49:21PM +0100, Al Viro wrote:
> On Tue, Jun 03, 2008 at 01:39:07PM +0200, Michael Kerrisk wrote:
>
> > > Is there anything else where the file descriptor's access mode allows
> > > doing things on Linux, but the standard requires a permissions check
> > > each time?
> >
> > Jamie,
> >
> > I can't think of examples offhand -- but I'm also not quite sure what
> > your question is about. Could you say a little more?
>
> "Is anything else equally stupid?", I suspect... AFAICS, behaviour in
> question is inherited from futimes(2) in one of the *BSD - nothing to
> do about that now (at least 10 years too late). It's rather inconsistent
> with a lot of things, starting with "why utimes(2) has weaker requirements
> with NULL argument", but we are far too late to fix that.

PS: as far as I can reconstruct what had happened there, they've got
these checks buried directly in ufs_setattr() and its ilk, which worked
for utimes(2), but had bitten them when they tried to do descriptor-based
analog...

2008-06-03 12:02:05

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

Al Viro wrote:
> On Tue, Jun 03, 2008 at 01:39:07PM +0200, Michael Kerrisk wrote:
>
> > > Is there anything else where the file descriptor's access mode allows
> > > doing things on Linux, but the standard requires a permissions check
> > > each time?
> >
> > Jamie,
> >
> > I can't think of examples offhand -- but I'm also not quite sure what
> > your question is about. Could you say a little more?
>
> "Is anything else equally stupid?", I suspect... AFAICS, behaviour in
> question is inherited from futimes(2) in one of the *BSD - nothing to
> do about that now (at least 10 years too late). It's rather inconsistent
> with a lot of things, starting with "why utimes(2) has weaker requirements
> with NULL argument", but we are far too late to fix that.

To be fair, having a writable file descriptor only lets you change the
mtime to "now", and having a readable file descriptor only lets you
change the atime to "now".

Changing the times _in general_ can be seen as over-reaching those
capabilities and arguably justifies more strict checks.

E.g. setting times in the past, you can break some caching systems,
Make, etc. Setting times to "now" will not break those things.

(A bit analogous with O_APPEND vs. O_WRITE. Someone hands you an
O_APPEND descriptor and they can continue to assume you won't clobber
earlier records in their file.)

-- Jamie

2008-06-03 12:09:20

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

On Tue, Jun 03, 2008 at 01:01:35PM +0100, Jamie Lokier wrote:
> > with a lot of things, starting with "why utimes(2) has weaker requirements
> > with NULL argument", but we are far too late to fix that.
>
> To be fair, having a writable file descriptor only lets you change the
> mtime to "now", and having a readable file descriptor only lets you
> change the atime to "now".
>
> Changing the times _in general_ can be seen as over-reaching those
> capabilities and arguably justifies more strict checks.

Which is what all questions about writability apply only to NULL case
anyway...

2008-06-03 12:11:04

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

Al Viro wrote:
> On Tue, Jun 03, 2008 at 01:01:35PM +0100, Jamie Lokier wrote:
> > > with a lot of things, starting with "why utimes(2) has weaker requirements
> > > with NULL argument", but we are far too late to fix that.
> >
> > To be fair, having a writable file descriptor only lets you change the
> > mtime to "now", and having a readable file descriptor only lets you
> > change the atime to "now".
> >
> > Changing the times _in general_ can be seen as over-reaching those
> > capabilities and arguably justifies more strict checks.
>
> Which is what all questions about writability apply only to NULL case
> anyway...

Oh!

So if I have the file descriptor, I can just as well change the mtime
by read a byte and write it back. Or even do a zero-length write, on
some OSes.

-- Jamie

2008-06-03 12:17:29

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

> > > > I'm not sure of the correct way to get the required nameidata (to do a
> > > > vfs_permission() call) from the file descriptor. Can you give me a
> > > > tip there?
> > >
> > > Could you point me at the right way of doing this?
> >
> > You don't need nameidata for this at all. Just call permission() with
> > a NULL nameidata.
> >
> > Ugly API? Yes, will be cleaned up if we manage to find some common
> > ground with the VFS maintainers.
>
> As soon as I'm done with sysctls...

Can't you just do that independently (for now just put a
d_find_alias() in there and be done with it)? If you fix every piece
of horrid code that you come across, it'll never be done...

Miklos

2008-06-03 13:05:57

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] utimensat() non-conformances and fixes [v3]

On Tue, Jun 03, 2008 at 02:16:33PM +0200, Miklos Szeredi wrote:
> > > > > I'm not sure of the correct way to get the required nameidata (to do a
> > > > > vfs_permission() call) from the file descriptor. Can you give me a
> > > > > tip there?
> > > >
> > > > Could you point me at the right way of doing this?
> > >
> > > You don't need nameidata for this at all. Just call permission() with
> > > a NULL nameidata.
> > >
> > > Ugly API? Yes, will be cleaned up if we manage to find some common
> > > ground with the VFS maintainers.
> >
> > As soon as I'm done with sysctls...
>
> Can't you just do that independently (for now just put a
> d_find_alias() in there and be done with it)? If you fix every piece
> of horrid code that you come across, it'll never be done...

There's not much left to do, actually... FWIW, solution goes like this:
* introduce structure on the classes of sysctls
(currently - root and per-network-namespace). Namely "X is parent of Y",
with "if task T sees Y, it also sees X" as defining property.
* when adding a sysctl table, find a "parent" one. Which is to say,
find the deepest node on its stem that already is present in one of the
tables from our class or its ancestor classes. That table will be our
parent and that node in it - attachment point.
* delay freeing the table headers; have them refcounted and instead
of unconditionally freeing the sucker on unregistration just drop the refcount.

Now we can keep a pair (reference to header, pointer to ctl table entry)
as long as we hold refcount on header. It won't affect unregistration
in any way. And at any point we can try to acquire "active" (use) reference
to header. If that succeeds, we know that
+ unregistration hadn't been started
+ unregistration won't be finished until we unuse the sucker
+ table entry is alive and will stay alive until then.

So we can hold references to those puppies from inodes under /proc/sys
without blocking unregistration, etc.

What's more, we can associate such pair with each node in sysctl tree.
For non-directories that's obvious. For directories, take the tree such
that directory belongs to tree \setminus parent of tree.

That's pretty much it. Filesystem side is simple - we keep a pointer to
class of tree responsible for a node (see directly above) in dentry.
And ->d_compare() checks that class of candidate match should be visible
for task doing the lookup. ->lookup() tries finding an entry with requested
name in sysctl table (found by directory inode) and in case of miss it goes
through the list of tables attached at that node, searching in those that
ought to be visible to us.

As the result, we have direct access to sysctl table entry right from inode,
maintain these references accross lookups without going through the contortions
done by current code and we do *NOT* use the same dentry for flipping between
unrelated sysctl nodes with different visibility...