2014-10-30 09:08:57

by Eric Rannaud

[permalink] [raw]
Subject: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:

Note that this mode applies only to future accesses of the newly
created file; the open() call that creates a read-only file
may well return a read/write file descriptor.

The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.

O_TMPFILE, however, behaves differently:

int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
assert(fd == -1);
assert(errno == EACCES);

int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
assert(fd > 0);

For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:

if (*opened & FILE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
path_to_nameidata(path, nd);
goto finish_open_created;
}

But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().

This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().

A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523

The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.

On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.

But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.

Signed-off-by: Eric Rannaud <[email protected]>
---
fs/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 42df664e95e5..78512898d3ba 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3154,7 +3154,8 @@ static int do_tmpfile(int dfd, struct filename *pathname,
if (error)
goto out2;
audit_inode(pathname, nd->path.dentry, 0);
- error = may_open(&nd->path, op->acc_mode, op->open_flag);
+ /* Don't check for other permissions, the inode was just created */
+ error = may_open(&nd->path, MAY_OPEN, op->open_flag);
if (error)
goto out2;
file->f_path.mnt = nd->path.mnt;
--
2.1.2


2014-10-30 21:48:54

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On 10/30/2014 02:08 AM, Eric Rannaud wrote:
> The man page for open(2) indicates that when O_CREAT is specified, the
> 'mode' argument applies only to future accesses to the file:
>
> Note that this mode applies only to future accesses of the newly
> created file; the open() call that creates a read-only file
> may well return a read/write file descriptor.
>
> The man page for open(2) implies that 'mode' is treated identically by
> O_CREAT and O_TMPFILE.
>
> O_TMPFILE, however, behaves differently:
>
> int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
> assert(fd == -1);
> assert(errno == EACCES);
>
> int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
> assert(fd > 0);
>
> For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
>
> if (*opened & FILE_CREATED) {
> /* Don't check for write permission, don't truncate */
> open_flag &= ~O_TRUNC;
> will_truncate = false;
> acc_mode = MAY_OPEN;
> path_to_nameidata(path, nd);
> goto finish_open_created;
> }
>
> But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
> may_open().
>
> This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
> inode is created, may_open() is called with acc_mode = MAY_OPEN, in
> do_tmpfile().
>
> A different, but related glibc bug revealed the discrepancy:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17523
>
> The glibc lazily loads the 'mode' argument of open() and openat() using
> va_arg() only if O_CREAT is present in 'flags' (to support both the 2
> argument and the 3 argument forms of open; same idea for openat()).
> However, the glibc ignores the 'mode' argument if O_TMPFILE is in
> 'flags'.
>
> On x86_64, for open(), it magically works anyway, as 'mode' is in
> RDX when entering open(), and is still in RDX on SYSCALL, which is where
> the kernel looks for the 3rd argument of a syscall.
>
> But openat() is not quite so lucky: 'mode' is in RCX when entering the
> glibc wrapper for openat(), while the kernel looks for the 4th argument
> of a syscall in R10. Indeed, the syscall calling convention differs from
> the regular calling convention in this respect on x86_64. So the kernel
> sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
> fails with EACCES.

Looks sensible. Should this be Cc: stable?

--Andy

>
> Signed-off-by: Eric Rannaud <[email protected]>
> ---
> fs/namei.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 42df664e95e5..78512898d3ba 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3154,7 +3154,8 @@ static int do_tmpfile(int dfd, struct filename *pathname,
> if (error)
> goto out2;
> audit_inode(pathname, nd->path.dentry, 0);
> - error = may_open(&nd->path, op->acc_mode, op->open_flag);
> + /* Don't check for other permissions, the inode was just created */
> + error = may_open(&nd->path, MAY_OPEN, op->open_flag);
> if (error)
> goto out2;
> file->f_path.mnt = nd->path.mnt;
>

2014-10-30 22:48:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Thu, Oct 30, 2014 at 2:48 PM, Andy Lutomirski <[email protected]> wrote:
>
> Looks sensible. Should this be Cc: stable?

Agreed. Will apply and add the stable cc.

Linus

2014-10-30 22:58:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Thu, Oct 30, 2014 at 3:48 PM, Linus Torvalds
<[email protected]> wrote:
>
> Agreed. Will apply and add the stable cc.

Ho humm. Thinking about this some more, I'm starting to wonder. Not
about this patch per se (open on a newly created file should indeed
succeed regardless), but about the horrible glibc behavior of screwing
up the third argument.

If you want to do O_TMPFILE + linkat() (or some eventual future
flink()), the mode really matters. So this idiotic glibc behavior of
only forwarding the third argument if O_CREAT is set seems to be a
bug.

Why the hell does glibc think it's a good idea to intersect system
call semantics? It's not a good idea - it's just stupid in the
extreme. And in this case it seems to actively breaks things.

Linus

2014-10-31 00:01:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Thu, Oct 30, 2014 at 3:58 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Oct 30, 2014 at 3:48 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Agreed. Will apply and add the stable cc.
>
> Ho humm. Thinking about this some more, I'm starting to wonder. Not
> about this patch per se (open on a newly created file should indeed
> succeed regardless), but about the horrible glibc behavior of screwing
> up the third argument.
>
> If you want to do O_TMPFILE + linkat() (or some eventual future
> flink()), the mode really matters. So this idiotic glibc behavior of
> only forwarding the third argument if O_CREAT is set seems to be a
> bug.

We could bite the bullet and add a tmpfile syscall. /me ducks

>
> Why the hell does glibc think it's a good idea to intersect system
> call semantics? It's not a good idea - it's just stupid in the
> extreme. And in this case it seems to actively breaks things.

Uh, because it's glibc? Or because it's trying not to screw up and on
some system where overrunning va_arg is terrible?

>
> Linus



--
Andy Lutomirski
AMA Capital Management, LLC

2014-10-31 00:57:47

by Eric Rannaud

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Thu, Oct 30, 2014 at 3:58 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Oct 30, 2014 at 3:48 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Agreed. Will apply and add the stable cc.
>
> Ho humm. Thinking about this some more, I'm starting to wonder. Not
> about this patch per se (open on a newly created file should indeed
> succeed regardless), but about the horrible glibc behavior of screwing
> up the third argument.
>
> If you want to do O_TMPFILE + linkat() (or some eventual future
> flink()), the mode really matters. So this idiotic glibc behavior of
> only forwarding the third argument if O_CREAT is set seems to be a
> bug.

Yes, there definitely is a glibc bug: a fix is being worked on and it
looks like it will go in. The change replaces the test for O_CREAT by
a test for either O_CREAT or O_TMPFILE.

> Why the hell does glibc think it's a good idea to intersect system
> call semantics? It's not a good idea - it's just stupid in the
> extreme. And in this case it seems to actively breaks things.

What Andy said: it's the most portable, because of the optional last
argument, if you have a bunch of wrappers for open and openat written
in C, which they do.

Some of these wrappers are for statically checking that the mode
argument is present when necessary (using __builtin_constant_p,
__builtin_va_arg_pack, and friends), and to "fortify" the code against
a malicious (?) injection of O_CREAT in a code path that didn't have
it at compile time (where flags was a build constant). Other wrappers
add O_LARGEFILE. One wrapper for openat checks that dirfd is a
directory (no idea why, as the kernel does the same check -- without a
race; maybe to try to guarantee ENOTDIR on some old kernel version?
It's been here since 2005, when openat was added to glibc).

--
Eric Rannaud <[email protected]>
Nanocritical, CEO

2014-10-31 00:59:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Thu, Oct 30, 2014 at 5:01 PM, Andy Lutomirski <[email protected]> wrote:
>
> Uh, because it's glibc?

Yeah. Bloated, over-engineered, and stupid.

> Or because it's trying not to screw up and on
> some system where overrunning va_arg is terrible?

No. On 99% of architectures the third argument is in a register
anyway, and traditionally it's not even va_arg, although glibc has
made it so (traditionally it's just pre-ANSI C with three arguments
and one of them might be missing - gcc has had hacks for avoiding
warnings for traditional C things like that: look at the whole
transparent union thing for another traditional "C without prototypes"
calling convention case).

But even if you make it va_arg, I can't think of a single architecture
where that makes sense. Outside of assembly trampolines, you *always*
have enough stack space that you can just access a word under the
stack anyway.

But yes, I could imagine some well-meaning - but not overly smart -
glibc developer deciding that doing the va_arg thing conditionally
would be a "feature". Despite making the code slower, bigger, and
buggier.

I guess I'll fetch the git tree and see if they document this braindamage..

[ time passes ]

Ugh. It seems to predate even the imported history (going back all the
way to 1995 - I don't know if that was SVN or CVS and whether there is
some even older historical archives that were never imported).

Anyway, since the beginning of time, the "stub/open.c" file is a True
Work of Art (TM)(also sarcasm), and has an old-style C declaration
(not ANSI) for __libc_open(), and uses a conditional va_arg() to get
the third parameter *despite* not even being a variadic function (not
varargs, not stdarg). So it's not even portable or correct *anyway*,
and it unnecessarily generates bad code and seems to have been
mindlessly copied into all the actual real non-stub implementations.
Most of them seem to have made their definitions match the declaration
in the process, so they then really do have the variadic part. Goodie,
I guess, except for this all being unnecessary crap and stupid.

Oh well. What a cock-up.

The code is insane in other ways too. The actual real Linux version of
__libc_open() ends up (for no good reason except to compete with
cat-ladies in the "crazy person of the year" award) using
"openat(AT_FDCWD)", just so you can make sure that the result doesn't
possibly work on old versions of the kernel even by mistake. I
certainly cannot possibly see any actual *advantage* to using
"openat()", but them I'm not a homeless cat-lady. It also has some
magic "LIBC_CANCEL_ASYNC()/LIBC_CANCEL_RESET()" stuff around it, which
I'm sure is entirely sane.

I can't take it any more. That code is crazy.

Linus

2014-10-31 01:04:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Thu, Oct 30, 2014 at 5:57 PM, Eric Rannaud <[email protected]> wrote:
>
> Yes, there definitely is a glibc bug: a fix is being worked on and it
> looks like it will go in. The change replaces the test for O_CREAT by
> a test for either O_CREAT or O_TMPFILE.

Why not just do it unconditionally? There really is no downside. Doing
it conditionally only makes the generated code slower and mode
complex. For absolutely zero gain, as far as I can tell. Does any
architecture actually do anything wrong?

It's actually closer in spirit to the original "open()" model than the
existing code.

Linus

2014-10-31 02:12:59

by Eric Rannaud

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Thu, Oct 30, 2014 at 6:04 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Oct 30, 2014 at 5:57 PM, Eric Rannaud <[email protected]> wrote:
>> Yes, there definitely is a glibc bug: a fix is being worked on and it
>> looks like it will go in. The change replaces the test for O_CREAT by
>> a test for either O_CREAT or O_TMPFILE.
>
> Why not just do it unconditionally? There really is no downside. Doing
> it conditionally only makes the generated code slower and mode
> complex. For absolutely zero gain, as far as I can tell. Does any
> architecture actually do anything wrong?

The immediate answer is "to keep the diff minimal". But I'll ask.

--
Eric Rannaud <[email protected]>
Nanocritical, CEO

2014-10-31 08:42:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Thu, Oct 30, 2014 at 05:01:30PM -0700, Andy Lutomirski wrote:
> > flink()), the mode really matters. So this idiotic glibc behavior of
> > only forwarding the third argument if O_CREAT is set seems to be a
> > bug.
>
> We could bite the bullet and add a tmpfile syscall. /me ducks

I've got another use case for that: Samba. It wants to inherit
all kinds of attributes (xattrs, modes, etc) during file creation,
and right now it's doing it in a racy way. I've come up with a patch
to use O_TEMPFILE + flink, but it turns out that Samba may as well
be asked to create read-only files, which we can't create using
O_TMPFILE.

2014-10-31 18:45:10

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Oct 31, 2014 1:42 AM, "Christoph Hellwig" <[email protected]> wrote:
>
> On Thu, Oct 30, 2014 at 05:01:30PM -0700, Andy Lutomirski wrote:
> > > flink()), the mode really matters. So this idiotic glibc behavior of
> > > only forwarding the third argument if O_CREAT is set seems to be a
> > > bug.
> >
> > We could bite the bullet and add a tmpfile syscall. /me ducks
>
> I've got another use case for that: Samba. It wants to inherit
> all kinds of attributes (xattrs, modes, etc) during file creation,
> and right now it's doing it in a racy way. I've come up with a patch
> to use O_TEMPFILE + flink, but it turns out that Samba may as well
> be asked to create read-only files, which we can't create using
> O_TMPFILE.

Does the patch in this thread not fix that?

2014-10-31 18:53:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Fri, Oct 31, 2014 at 11:44 AM, Andy Lutomirski <[email protected]> wrote:
>
> Does the patch in this thread not fix that?

It should. Modulo the glibc problem that makes it hard to actually set
the mode properly.

Linus

2014-11-03 08:34:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Fri, Oct 31, 2014 at 11:53:09AM -0700, Linus Torvalds wrote:
> On Fri, Oct 31, 2014 at 11:44 AM, Andy Lutomirski <[email protected]> wrote:
> >
> > Does the patch in this thread not fix that?
>
> It should. Modulo the glibc problem that makes it hard to actually set
> the mode properly.

That doesn't help because we explicitly reject O_RDONLY when combined
with O_TMPFILE.

2014-11-03 17:05:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Mon, Nov 3, 2014 at 12:34 AM, Christoph Hellwig <[email protected]> wrote:
>
> That doesn't help because we explicitly reject O_RDONLY when combined
> with O_TMPFILE.

You obviously cannot actually have a read-only file descriptor with
O_TMPFILE, unless all you care about is a zero-sized file with no
contents.

That's why people are talking about things like downgrading the file
*after* filling it, using sealing.

Linus

2014-11-03 17:06:55

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Nov 3, 2014 12:34 AM, "Christoph Hellwig" <[email protected]> wrote:
>
> On Fri, Oct 31, 2014 at 11:53:09AM -0700, Linus Torvalds wrote:
> > On Fri, Oct 31, 2014 at 11:44 AM, Andy Lutomirski <[email protected]> wrote:
> > >
> > > Does the patch in this thread not fix that?
> >
> > It should. Modulo the glibc problem that makes it hard to actually set
> > the mode properly.
>
> That doesn't help because we explicitly reject O_RDONLY when combined
> with O_TMPFILE.

I think I'm missing something. How is an O_RDONLY temporary file
useful? Wouldn't you want an O_RDWR tempfile with mode 0400 or
something like that?

--Andy

2014-11-03 18:49:50

by Eric Rannaud

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Mon, Nov 3, 2014 at 9:06 AM, Andy Lutomirski <[email protected]> wrote:
>> That doesn't help because we explicitly reject O_RDONLY when combined
>> with O_TMPFILE.
>
> I think I'm missing something. How is an O_RDONLY temporary file
> useful? Wouldn't you want an O_RDWR tempfile with mode 0400 or
> something like that?

Isn't it because they are essentially emulating an atomic open()
capable of creating a file with inherited ACLs, according to
relatively complex rules? open *can* be used with O_CREAT|O_RDONLY
(touch(1) might do that), which would naively translate into:

fd = open(dir, O_TMPFILE|O_RDONLY, 0600)
fsetxattr(fd, "...")
fsetxattr(fd, "...")
linkat(AT_FDCWD, "/proc/self/fd/...", ..., AT_SYMLINK_FOLLOW)
return fd;

Now this would be happening on the server, and the only reason why it
would be important to ensure that fd is O_RDONLY, is that smbd does
not do its own bookkeeping of how each file handle was opened, and
would rather have the kernel enforce O_RDONLY?

With O_TMPFILE as implemented now, smbd would have to do open(dir,
O_TMPFILE|O_RDWR, 0600), but internally keep track that O_RDONLY was
requested by the client on that fd, and block any writes to fd itself.

2014-11-03 19:04:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Mon, Nov 3, 2014 at 10:49 AM, Eric Rannaud <[email protected]> wrote:
>
> Isn't it because they are essentially emulating an atomic open()
> capable of creating a file with inherited ACLs, according to
> relatively complex rules? open *can* be used with O_CREAT|O_RDONLY
> (touch(1) might do that), which would naively translate into:

Oh, so you don't actually need any file contents at all?

If that is actually a real usage, then maybe we should just say that
"O_TMPFILE|O_RDONLY" is fine, and remove the check that it has to be
writable.

That check was always a sanity-check, because people felt that a
temp-file you can't write to is an insane concept. But if there is a
real use case for it, then clearly it's not completely insane. Just
odd.

It's just that single

if (!(acc_mode & MAY_WRITE))
return -EINVAL;

test in build_open_flags(), right?

I'd take a tested patch to remove that (where "tested" means: "yes, I
actually did that unwritable file descriptor thing, and it actually
solved the problem and worked for samba or whatever")

Linus

2014-11-03 22:18:37

by Jeremy Allison

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Mon, Nov 03, 2014 at 10:49:24AM -0800, Eric Rannaud wrote:
> On Mon, Nov 3, 2014 at 9:06 AM, Andy Lutomirski <[email protected]> wrote:
> >> That doesn't help because we explicitly reject O_RDONLY when combined
> >> with O_TMPFILE.
> >
> > I think I'm missing something. How is an O_RDONLY temporary file
> > useful? Wouldn't you want an O_RDWR tempfile with mode 0400 or
> > something like that?
>
> Isn't it because they are essentially emulating an atomic open()
> capable of creating a file with inherited ACLs, according to
> relatively complex rules? open *can* be used with O_CREAT|O_RDONLY
> (touch(1) might do that), which would naively translate into:
>
> fd = open(dir, O_TMPFILE|O_RDONLY, 0600)
> fsetxattr(fd, "...")
> fsetxattr(fd, "...")
> linkat(AT_FDCWD, "/proc/self/fd/...", ..., AT_SYMLINK_FOLLOW)
> return fd;
>
> Now this would be happening on the server, and the only reason why it
> would be important to ensure that fd is O_RDONLY, is that smbd does
> not do its own bookkeeping of how each file handle was opened, and
> would rather have the kernel enforce O_RDONLY?
>
> With O_TMPFILE as implemented now, smbd would have to do open(dir,
> O_TMPFILE|O_RDWR, 0600), but internally keep track that O_RDONLY was
> requested by the client on that fd, and block any writes to fd itself.

Which we already do, actually..

Although the atomic open emulation is
a very interesting idea for us. That's
something we currently don't do correctly
across different protocols (although we
do it between smbd's themselves).

Jeremy.

2014-11-05 08:25:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Mon, Nov 03, 2014 at 02:07:07PM -0800, Jeremy Allison wrote:
> Which we already do, actually..

Hmm, I didn't notice that part when implementing the atomic open,
guess I need to dig deeper into the code.

I still think supporting full open semantics with O_TMFILE + flink is
useful, so that others can make use of it it without such an additional
enforcement layer.

2014-11-05 08:27:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Mon, Nov 03, 2014 at 11:04:27AM -0800, Linus Torvalds wrote:
> Oh, so you don't actually need any file contents at all?
>
> If that is actually a real usage, then maybe we should just say that
> "O_TMPFILE|O_RDONLY" is fine, and remove the check that it has to be
> writable.

Wasn't this disallowed to prevent problems on old kernels that don't use
O_TMPFILE? In that case we'd ignore the flag and would just get a file
handle for the directory instead.

2014-11-05 15:21:56

by Eric Rannaud

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Wed, Nov 5, 2014 at 12:27 AM, Christoph Hellwig <[email protected]> wrote:
> On Mon, Nov 03, 2014 at 11:04:27AM -0800, Linus Torvalds wrote:
>> Oh, so you don't actually need any file contents at all?
>>
>> If that is actually a real usage, then maybe we should just say that
>> "O_TMPFILE|O_RDONLY" is fine, and remove the check that it has to be
>> writable.
>
> Wasn't this disallowed to prevent problems on old kernels that don't use
> O_TMPFILE? In that case we'd ignore the flag and would just get a file
> handle for the directory instead.

Yes, that was the idea at first, as discussed at
http://article.gmane.org/gmane.linux.file-systems/76273, in commit
bb458c644. But soon after that, O_WRONLY was allowed (ba57ea64cb1),
and O_RDWR was removed from O_TMPFILE. Now only remain the explicit
checks in build_open_flags() that Linus mentioned.

If we allow
fd = open("/tmp", O_TMPFILE|O_RDONLY, 0600)
it would be seen by an old kernel as
fd = open("/tmp", O_DIRECTORY|O_RDONLY, 0600)
which will succeed.

But unlike the other cases the creative definition of O_TMPFILE was
meant to prevent, this does not create a security risk for anyone
implementing a secure tmpfile, as they would be asking for a writable
fd.

To implement an atomic open() with O_TMPFILE+flink, if neither
O_WRONLY nor O_RDWR is in flags, you would have to manually check with
fstat that fd is indeed a regular file and not a directory. At least
if you need to run on old kernels.

If such a changes goes in, the man page for open(2) should talk about
what happens on old kernels (it already has an explanation for the
writable case).

2014-11-05 15:32:58

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Wed, Nov 5, 2014 at 7:21 AM, Eric Rannaud <[email protected]> wrote:
> On Wed, Nov 5, 2014 at 12:27 AM, Christoph Hellwig <[email protected]> wrote:
>> On Mon, Nov 03, 2014 at 11:04:27AM -0800, Linus Torvalds wrote:
>>> Oh, so you don't actually need any file contents at all?
>>>
>>> If that is actually a real usage, then maybe we should just say that
>>> "O_TMPFILE|O_RDONLY" is fine, and remove the check that it has to be
>>> writable.
>>
>> Wasn't this disallowed to prevent problems on old kernels that don't use
>> O_TMPFILE? In that case we'd ignore the flag and would just get a file
>> handle for the directory instead.
>
> Yes, that was the idea at first, as discussed at
> http://article.gmane.org/gmane.linux.file-systems/76273, in commit
> bb458c644. But soon after that, O_WRONLY was allowed (ba57ea64cb1),
> and O_RDWR was removed from O_TMPFILE. Now only remain the explicit
> checks in build_open_flags() that Linus mentioned.
>
> If we allow
> fd = open("/tmp", O_TMPFILE|O_RDONLY, 0600)
> it would be seen by an old kernel as
> fd = open("/tmp", O_DIRECTORY|O_RDONLY, 0600)
> which will succeed.
>
> But unlike the other cases the creative definition of O_TMPFILE was
> meant to prevent, this does not create a security risk for anyone
> implementing a secure tmpfile, as they would be asking for a writable
> fd.

I have no idea whether it's a security risk, but it's a serious "wtf?!?" risk.

>
> To implement an atomic open() with O_TMPFILE+flink, if neither
> O_WRONLY nor O_RDWR is in flags, you would have to manually check with
> fstat that fd is indeed a regular file and not a directory. At least
> if you need to run on old kernels.
>
> If such a changes goes in, the man page for open(2) should talk about
> what happens on old kernels (it already has an explanation for the
> writable case).

With my occasional-API-reviewer hat on, NAK NAK NAK. This is even
more insane than the rest of the O_TMPFILE interface. If you want to
support this kind of use case (which seems entirely reasonable to me),
then just add a syscall, please.

Also, you can, mostly, downgrade from O_RDWR to O_RDONLY. We could
easily add a way to downgrade all the way, I think.

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC

2014-11-05 17:10:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

On Wed, Nov 5, 2014 at 7:32 AM, Andy Lutomirski <[email protected]> wrote:
>
> I have no idea whether it's a security risk, but it's a serious "wtf?!?" risk.

Yeah. And considering that the only apparently user of this read-only
O_TMPFILE interface doesn't actually need it anyway because it carries
around the whole "read-only to the client" information independently,
I think the issue is moot. We don't need zero-sized read-only
O_TMPFILE files.

Linus