2007-05-31 18:09:29

by Ulrich Drepper

[permalink] [raw]
Subject: [PATCH] Introduce O_CLOEXEC (take >2)

I've brought this topic up before but didn't provide a patch. Well, here
we go again, this time with a patch. I even throw in a test program.

The problem is as follows: in multi-threaded code (or more correctly: all
code using clone() with CLONE_FILES) we have a race when exec'ing.

thread #1 thread #2

fd=open()

fork + exec

fcntl(fd,F_SETFD,FD_CLOEXEC)

In some applications this can happen frequently. Take a web browser. One
thread opens a file and another thread starts, say, an external PDF viewer.
The result can even be a security issue if that open file descriptor refers
to a sensitive file and the external program can somehow be tricked into
using that descriptor.

Just adding O_CLOEXEC support to open() doesn't solve the whole set of
problems. There are other ways to create file descriptors (socket,
epoll_create, Unix domain socket transfer, etc). These can and should
be addressed separately though. open() is such an easy case that it makes
not much sense putting the fix off.

The test program:

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif

int
main (int argc, char *argv[])
{
int fd;
if (argc > 1)
{
fd = atol (argv[1]);
printf ("child: fd = %d\n", fd);
if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
{
puts ("file descriptor valid in child");
return 1;
}
return 0;
}

fd = open ("/proc/self/exe", O_RDONLY | O_CLOEXEC);
printf ("in parent: new fd = %d\n", fd);
char buf[20];
snprintf (buf, sizeof (buf), "%d", fd);
execl ("/proc/self/exe", argv[0], buf, NULL);
puts ("execl failed");
return 1;
}



Signed-off-by: Ulrich Drepper <[email protected]>
Acked-by: Ingo Molnar <[email protected]>

diff --git a/fs/open.c b/fs/open.c
index 0d515d1..e6991c1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -855,7 +855,7 @@ EXPORT_SYMBOL(dentry_open);
/*
* Find an empty file descriptor entry, and mark it busy.
*/
-int get_unused_fd(void)
+static int get_unused_fd_flags(int flags)
{
struct files_struct * files = current->files;
int fd, error;
@@ -891,7 +891,10 @@ repeat:
}

FD_SET(fd, fdt->open_fds);
- FD_CLR(fd, fdt->close_on_exec);
+ if (flags & O_CLOEXEC)
+ FD_SET(fd, fdt->close_on_exec);
+ else
+ FD_CLR(fd, fdt->close_on_exec);
files->next_fd = fd + 1;
#if 1
/* Sanity check */
@@ -907,6 +910,11 @@ out:
return error;
}

+int get_unused_fd(void)
+{
+ return get_unused_fd_flags(0);
+}
+
EXPORT_SYMBOL(get_unused_fd);

static void __put_unused_fd(struct files_struct *files, unsigned int fd)
@@ -959,7 +967,7 @@ long do_sys_open(int dfd, const char __user *filename, int flags, int mode)
int fd = PTR_ERR(tmp);

if (!IS_ERR(tmp)) {
- fd = get_unused_fd();
+ fd = get_unused_fd_flags(flags);
if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, flags, mode);
if (IS_ERR(f)) {
diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index c154b9d..b847741 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -48,6 +48,9 @@
#ifndef O_NOATIME
#define O_NOATIME 01000000
#endif
+#ifndef O_CLOEXEC
+#define O_CLOEXEC 02000000 /* set close_on_exec */
+#endif
#ifndef O_NDELAY
#define O_NDELAY O_NONBLOCK
#endif


2007-05-31 18:27:19

by Nicholas Miell

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 2007-05-31 at 14:09 -0400, Ulrich Drepper wrote:
> diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
> index c154b9d..b847741 100644
> --- a/include/asm-generic/fcntl.h
> +++ b/include/asm-generic/fcntl.h
> @@ -48,6 +48,9 @@
> #ifndef O_NOATIME
> #define O_NOATIME 01000000
> #endif
> +#ifndef O_CLOEXEC
> +#define O_CLOEXEC 02000000 /* set close_on_exec */
> +#endif
> #ifndef O_NDELAY
> #define O_NDELAY O_NONBLOCK
> #endif

O_CLOSEONEXEC, perhaps?

We don't want to create another "creat" here... :)

--
Nicholas Miell <[email protected]>

2007-05-31 18:32:47

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nicholas Miell wrote:
> O_CLOSEONEXEC, perhaps?

Then nobody would know this is equivalent to FD_CLOEXEC. I think the
name I propose is fine.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFGXxSv2ijCOnn/RHQRAhELAJ9/7pTQeZShV6NDfEx+iVSfD5g02gCeM/y+
E0GvuEg2EBtOmDdI5MEzOG0=
=W6hQ
-----END PGP SIGNATURE-----

2007-05-31 18:35:37

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 31 May 2007, Ulrich Drepper wrote:

> I've brought this topic up before but didn't provide a patch. Well, here
> we go again, this time with a patch. I even throw in a test program.
>
> The problem is as follows: in multi-threaded code (or more correctly: all
> code using clone() with CLONE_FILES) we have a race when exec'ing.
>
> thread #1 thread #2
>
> fd=open()
>
> fork + exec
>
> fcntl(fd,F_SETFD,FD_CLOEXEC)
>
> In some applications this can happen frequently. Take a web browser. One
> thread opens a file and another thread starts, say, an external PDF viewer.
> The result can even be a security issue if that open file descriptor refers
> to a sensitive file and the external program can somehow be tricked into
> using that descriptor.
>
> Just adding O_CLOEXEC support to open() doesn't solve the whole set of
> problems. There are other ways to create file descriptors (socket,
> epoll_create, Unix domain socket transfer, etc). These can and should
> be addressed separately though. open() is such an easy case that it makes
> not much sense putting the fix off.

Isn't this better be a global process flag? Default should be, for legacy
reasons, !FD_CLOEXEC. But then you can call a sys_task_set_fflags(FD_CLOEXEC)
and all newly created files get that behavior by default. Then, in case
you want some of them to cross the exec boundary, you explicitly
fcntl(fd, F_SETFD, !FD_CLOEXEC).
Most the MT+exec apps I write, would like FD_CLOEXEC for everything anyway.



- Davide


2007-05-31 18:43:19

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Davide Libenzi wrote:
> Isn't this better be a global process flag? Default should be, for legacy
> reasons,

No. Policies are always wrong since it means code that cannot change
the policy (e.g, all runtime libraries) have no access to the
functionality. I cannot set the policy to default to close-on-exit in
glibc all the while the application assumes this is not the case.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFGXxcK2ijCOnn/RHQRAmM9AKC5BEIEBsgwhWx09aAMA0CwsesMWQCfScGr
vCF+Ih1lDP5rliY8D2Om5vU=
=ZShC
-----END PGP SIGNATURE-----

2007-05-31 18:46:41

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 31 May 2007, Ulrich Drepper wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Davide Libenzi wrote:
> > Isn't this better be a global process flag? Default should be, for legacy
> > reasons,
>
> No. Policies are always wrong since it means code that cannot change
> the policy (e.g, all runtime libraries) have no access to the
> functionality. I cannot set the policy to default to close-on-exit in
> glibc all the while the application assumes this is not the case.

I was talking for a broader usage, not only glibc centric. Most ppl
writing MT+exec apps wants all but (eventually) and handfull of files
leaking across the exec boundary.



- Davide


2007-05-31 18:54:47

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/31/2007 Davide Libenzi wrote:
> I was talking for a broader usage, not only glibc centric. Most ppl
> writing MT+exec apps wants all but (eventually) and handfull of files
> leaking across the exec boundary.

Policies are not the answer. Using such a flag is no big issue. *Iff*
we should end up with some form of private file descriptors changing the
default for them would be OK, too. But the existing semantics mustn't
be touched, even if it is selectable. Trying to fix existing buggy code
this way is foolish since it might just as well break assumptions in the
same program.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFGXxm/2ijCOnn/RHQRAm+wAJ9YD6KsS6T96xChZvS/3yCwCo6+0gCgl0IY
y6OxxUcuQ8yGJsiYebkuIMY=
=tkxV
-----END PGP SIGNATURE-----

2007-05-31 19:03:24

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, May 31, 2007 at 11:46:31AM -0700, Davide Libenzi wrote:
> On Thu, 31 May 2007, Ulrich Drepper wrote:
> > Davide Libenzi wrote:
> > > Isn't this better be a global process flag? Default should be, for legacy
> > > reasons,
> >
> > No. Policies are always wrong since it means code that cannot change
> > the policy (e.g, all runtime libraries) have no access to the
> > functionality. I cannot set the policy to default to close-on-exit in
> > glibc all the while the application assumes this is not the case.
>
> I was talking for a broader usage, not only glibc centric. Most ppl
> writing MT+exec apps wants all but (eventually) and handfull of files
> leaking across the exec boundary.

If open (and all other syscalls that create fds) have O_CLOEXEC (and
something similar for other syscalls), then such a policy can be easily
implemented on the userland, if desired.

Jakub

2007-05-31 19:18:03

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 31 May 2007, Jakub Jelinek wrote:

> On Thu, May 31, 2007 at 11:46:31AM -0700, Davide Libenzi wrote:
> > On Thu, 31 May 2007, Ulrich Drepper wrote:
> > > Davide Libenzi wrote:
> > > > Isn't this better be a global process flag? Default should be, for legacy
> > > > reasons,
> > >
> > > No. Policies are always wrong since it means code that cannot change
> > > the policy (e.g, all runtime libraries) have no access to the
> > > functionality. I cannot set the policy to default to close-on-exit in
> > > glibc all the while the application assumes this is not the case.
> >
> > I was talking for a broader usage, not only glibc centric. Most ppl
> > writing MT+exec apps wants all but (eventually) and handfull of files
> > leaking across the exec boundary.
>
> If open (and all other syscalls that create fds) have O_CLOEXEC (and
> something similar for other syscalls), then such a policy can be easily
> implemented on the userland, if desired.

That, unless you change all the syscalls creating files to accept new
parameters, would require a syscall+fcntl operation. That is not atomic
(ie, another thread might do exec between the syscall and the fcntl).
Again, mine was a general comment, not directed into magically fixing
existing buggy code. What I meant, was that the vast majority of MT+exec
apps wants all their fds (but an handfull, maybe) to be O_CLOEXEC. So a
global, non-inheritable, per-process flag seemed the most straightforward
solution.



- Davide


2007-05-31 19:50:59

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Davide Libenzi wrote:
> What I meant, was that the vast majority of MT+exec
> apps wants all their fds (but an handfull, maybe)

You never listen when you are told something. Such a statement cannot
be made. The application doesn't know what the runtime libraries need
(not just libc, libstdc++, libxml, whatever) and vice versa. Policies
are always wrong for somebody and changing the standardized behavior is
not acceptable.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFGXycL2ijCOnn/RHQRAvPfAJ9r77oMkReyf4OG2rVPaFNb5TIYfgCgwp8e
mvswKBbe1UlrzlG3oOWYswM=
=viNa
-----END PGP SIGNATURE-----

2007-05-31 19:59:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)



On Thu, 31 May 2007, Davide Libenzi wrote:
>
> What I meant, was that the vast majority of MT+exec apps wants all their
> fds (but an handfull, maybe) to be O_CLOEXEC. So a global,
> non-inheritable, per-process flag seemed the most straightforward
> solution.

I'm with Uli on this one. "Stateful" stuff is bad. It's essentially
impossible to handle with libraries - either the library would have to
explciitly always turn the state the way _it_ needs it, or the library
will do the wrogn thing.

For example, what about libraries that are used to set up stdin/stdout for
the forker?

This is not unlike floating point rounding. Having stateful rounding (like
the i387) is *stupid*. You want per-operation rounding (where *one* of the
choices may be "use default"). Exactly because libraries etc will need to
control their _own_ internal choices.

Linus

2007-05-31 20:23:38

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 31 May 2007, Ulrich Drepper wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Davide Libenzi wrote:
> > What I meant, was that the vast majority of MT+exec
> > apps wants all their fds (but an handfull, maybe)
>
> You never listen when you are told something. Such a statement cannot
> be made. The application doesn't know what the runtime libraries need
> (not just libc, libstdc++, libxml, whatever) and vice versa. Policies
> are always wrong for somebody and changing the standardized behavior is
> not acceptable.

I do always listen when you make a good point. And this is a good one.
Some libraries might not like a global behaviour change, I agree.



- Davide


2007-05-31 23:21:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 31 May 2007 14:09:15 -0400
Ulrich Drepper <[email protected]> wrote:

> I've brought this topic up before but didn't provide a patch. Well, here
> we go again, this time with a patch. I even throw in a test program.
>
> The problem is as follows: in multi-threaded code (or more correctly: all
> code using clone() with CLONE_FILES) we have a race when exec'ing.
>
> thread #1 thread #2
>
> fd=open()
>
> fork + exec
>
> fcntl(fd,F_SETFD,FD_CLOEXEC)
>
> In some applications this can happen frequently. Take a web browser. One
> thread opens a file and another thread starts, say, an external PDF viewer.
> The result can even be a security issue if that open file descriptor refers
> to a sensitive file and the external program can somehow be tricked into
> using that descriptor.
>
> Just adding O_CLOEXEC support to open() doesn't solve the whole set of
> problems. There are other ways to create file descriptors (socket,
> epoll_create, Unix domain socket transfer, etc). These can and should
> be addressed separately though. open() is such an easy case that it makes
> not much sense putting the fix off.
>
> ...
>
> diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
> index c154b9d..b847741 100644
> --- a/include/asm-generic/fcntl.h
> +++ b/include/asm-generic/fcntl.h
> @@ -48,6 +48,9 @@
> #ifndef O_NOATIME
> #define O_NOATIME 01000000
> #endif
> +#ifndef O_CLOEXEC
> +#define O_CLOEXEC 02000000 /* set close_on_exec */
> +#endif
> #ifndef O_NDELAY
> #define O_NDELAY O_NONBLOCK
> #endif

This will break xtensa, because that architecture (and only that
architecture) doesn't include asm-generic/fcntl.h from asm/fcntl.h.

But let's leave this patch as-is: it's xtensa which needs fixing.

2007-06-01 01:05:36

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 31 May 2007 16:20:21 -0700 Andrew Morton <[email protected]> wrote:
>
> This will break xtensa, because that architecture (and only that
> architecture) doesn't include asm-generic/fcntl.h from asm/fcntl.h.

It used to (for a whole 15 months). Consolidation and cleanup is hard
enough to do once ...

> But let's leave this patch as-is: it's xtensa which needs fixing.

See my other patch ...

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (514.00 B)
(No filename) (189.00 B)
Download all attachments

2007-06-01 01:38:55

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 31 May 2007 16:20:21 -0700 Andrew Morton <[email protected]> wrote:
>
> > diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
> > index c154b9d..b847741 100644
> > --- a/include/asm-generic/fcntl.h
> > +++ b/include/asm-generic/fcntl.h
> > @@ -48,6 +48,9 @@
> > #ifndef O_NOATIME
> > #define O_NOATIME 01000000
> > #endif
> > +#ifndef O_CLOEXEC
> > +#define O_CLOEXEC 02000000 /* set close_on_exec */
> > +#endif
> > #ifndef O_NDELAY
> > #define O_NDELAY O_NONBLOCK
> > #endif
>
> This will break xtensa, because that architecture (and only that
> architecture) doesn't include asm-generic/fcntl.h from asm/fcntl.h.

This also breaks Alpha (which uses 02000000 for O_DIRECT) and parisc
(which uses 02000000 for O_RSYNC). So you ether need to choose a
different value or define O_CLOEXEC for those two architectures.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (963.00 B)
(No filename) (189.00 B)
Download all attachments

2007-06-01 03:08:08

by Kyle McMartin

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Fri, Jun 01, 2007 at 11:38:40AM +1000, Stephen Rothwell wrote:
> This also breaks Alpha (which uses 02000000 for O_DIRECT) and parisc
> (which uses 02000000 for O_RSYNC). So you ether need to choose a
> different value or define O_CLOEXEC for those two architectures.
>

That's easy enough to fix...

Signed-off-by: Kyle McMartin <[email protected]>

diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h
index 317851f..4ca0fb0 100644
--- a/include/asm-parisc/fcntl.h
+++ b/include/asm-parisc/fcntl.h
@@ -14,6 +14,7 @@
#define O_DSYNC 01000000 /* HPUX only */
#define O_RSYNC 02000000 /* HPUX only */
#define O_NOATIME 04000000
+#define O_CLOEXEC 08000000 /* set close_on_exec */

#define O_DIRECTORY 00010000 /* must be a directory */
#define O_NOFOLLOW 00000200 /* don't follow links */

2007-06-01 19:50:37

by Byron Stanoszek

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Thu, 31 May 2007, Kyle McMartin wrote:

> On Fri, Jun 01, 2007 at 11:38:40AM +1000, Stephen Rothwell wrote:
>> This also breaks Alpha (which uses 02000000 for O_DIRECT) and parisc
>> (which uses 02000000 for O_RSYNC). So you ether need to choose a
>> different value or define O_CLOEXEC for those two architectures.
>>
>
> That's easy enough to fix...
>
> Signed-off-by: Kyle McMartin <[email protected]>
>
> diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h
> index 317851f..4ca0fb0 100644
> --- a/include/asm-parisc/fcntl.h
> +++ b/include/asm-parisc/fcntl.h
> @@ -14,6 +14,7 @@
> #define O_DSYNC 01000000 /* HPUX only */
> #define O_RSYNC 02000000 /* HPUX only */
> #define O_NOATIME 04000000
> +#define O_CLOEXEC 08000000 /* set close_on_exec */
>
> #define O_DIRECTORY 00010000 /* must be a directory */
> #define O_NOFOLLOW 00000200 /* don't follow links */

These are octal values, so you really want to use 010000000 instead of
08000000. :-)

While looking at that file further, I noticed these two flags share the same
value. I don't know DMAPI/XDSM, but could they potentially conflict?

#define O_NOATIME 04000000
#define O_INVISIBLE 04000000 /* invisible I/O, for DMAPI/XDSM */

Regards,
-Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [email protected]

2007-06-01 22:56:00

by Kyle McMartin

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

On Fri, Jun 01, 2007 at 03:17:03PM -0400, Byron Stanoszek wrote:
> These are octal values, so you really want to use 010000000 instead of
> 08000000. :-)
>

Wow. I am totally a dumbass, I saw a 'x' there. Sigh.

diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h
index 317851f..7089507 100644
--- a/include/asm-parisc/fcntl.h
+++ b/include/asm-parisc/fcntl.h
@@ -14,6 +14,7 @@
#define O_DSYNC 01000000 /* HPUX only */
#define O_RSYNC 02000000 /* HPUX only */
#define O_NOATIME 04000000
+#define O_CLOEXEC 010000000 /* set close_on_exec */

#define O_DIRECTORY 00010000 /* must be a directory */
#define O_NOFOLLOW 00000200 /* don't follow links */

2007-06-10 02:29:52

by dean gaudet

[permalink] [raw]
Subject: Re: [PATCH] Introduce O_CLOEXEC (take >2)

nice. i proposed something like this 8 or so years ago... the problem is
that you've also got to deal with socket(2), socketpair(2), accept(2),
pipe(2), dup(2), dup2(2), fcntl(F_DUPFD)... everything which creates new
fds.

really what is desired is fork/clone with selective duping of fds. i.e.
you supply the list of what will become fd 0,1,2 in the child.

-dean