2005-11-09 19:16:53

by Ulrich Drepper

[permalink] [raw]
Subject: openat()

Can we please get the openat() syscall implemented? I know Linus
already declared this is a good idea and I can only stress that it is
really essential for some things. It is today impossible to write
correct code which uses long pathnames since all these operations would
require the use of chdir() which affect the whole POSIX process and not
just one thread. In addition we have the reduction of race conditions.

I remember having seen an implementation at some time. Can somebody dig
it up? If there is nothing available I'll try to get some code submitted.

I'm ignoring the discussions about alternative streams for files here,
so don't bother arguing with the.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


2005-11-09 21:04:01

by Miklos Szeredi

[permalink] [raw]
Subject: Re: openat()

> Can we please get the openat() syscall implemented?

What's wrong with using '/proc/self/fd/N' to implement it?

Miklos

2005-11-09 21:29:04

by Ulrich Drepper

[permalink] [raw]
Subject: Re: openat()

Miklos Szeredi wrote:
> What's wrong with using '/proc/self/fd/N' to implement it?

I thought the intention was to have file descriptors referring to files,
not directories, to represent the directories they are in. In those
cases simply using /proc/PID/fd/N/some/more/dirs wouldn't work and
neither does /proc/PID/fd/N/../some/more/dirs.

Looking at the Sol man page again it seems they don't allow this case
but this has to be guessed from the error codes, not the description.
In this case the /rpco approach should be OK.

But there are always people questioning the use of /proc. We already
have quite a few such cases and adding more is no issue for me, but not
relying on /proc would appease some people.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2005-11-09 21:42:04

by dean gaudet

[permalink] [raw]
Subject: Re: openat()

On Wed, 9 Nov 2005, Ulrich Drepper wrote:

> Can we please get the openat() syscall implemented? I know Linus already
> declared this is a good idea and I can only stress that it is really essential
> for some things. It is today impossible to write correct code which uses long
> pathnames since all these operations would require the use of chdir() which
> affect the whole POSIX process and not just one thread. In addition we have
> the reduction of race conditions.

oh sweet i've always wanted this for perf improvements in multithreaded
programs which have to deal with lots of lookups deep in a directory tree
(especially over NFS).

would this include other related syscalls such as link, unlink, rename,
chown, chmod... so that the the virtualization of the "current working
directory" concept is more complete?

-dean

2005-11-09 21:55:37

by Nicholas Miell

[permalink] [raw]
Subject: Re: openat()

On Wed, 2005-11-09 at 13:42 -0800, dean gaudet wrote:
> On Wed, 9 Nov 2005, Ulrich Drepper wrote:
>
> > Can we please get the openat() syscall implemented? I know Linus already
> > declared this is a good idea and I can only stress that it is really essential
> > for some things. It is today impossible to write correct code which uses long
> > pathnames since all these operations would require the use of chdir() which
> > affect the whole POSIX process and not just one thread. In addition we have
> > the reduction of race conditions.
>
> oh sweet i've always wanted this for perf improvements in multithreaded
> programs which have to deal with lots of lookups deep in a directory tree
> (especially over NFS).
>
> would this include other related syscalls such as link, unlink, rename,
> chown, chmod... so that the the virtualization of the "current working
> directory" concept is more complete?
>
> -dean

I think that the full suite of "pathname lookups relative to a fd"
functions was implied.

Note that you could always introduce pthread_attr_setsharedfs(3) and
pthread_attr_getsharedfs(3) (or whatever you want to call them) which
control the passing of CLONE_FS to clone(2) in pthread_create(). This
would allow you to create threads which have their own pwd and umask
(and even chroot, but I don't think that would be very useful) without
any kernel changes.

--
Nicholas Miell <[email protected]>

2005-11-10 07:40:24

by Jeff Garzik

[permalink] [raw]
Subject: Re: openat()

dean gaudet wrote:
> On Wed, 9 Nov 2005, Ulrich Drepper wrote:
>
>
>>Can we please get the openat() syscall implemented? I know Linus already
>>declared this is a good idea and I can only stress that it is really essential
>>for some things. It is today impossible to write correct code which uses long
>>pathnames since all these operations would require the use of chdir() which
>>affect the whole POSIX process and not just one thread. In addition we have
>>the reduction of race conditions.
>
>
> oh sweet i've always wanted this for perf improvements in multithreaded
> programs which have to deal with lots of lookups deep in a directory tree
> (especially over NFS).
>
> would this include other related syscalls such as link, unlink, rename,
> chown, chmod... so that the the virtualization of the "current working
> directory" concept is more complete?

You already have fchown(2) and fchmod(2), that's covered.

I'm interested in openat(2) for the race-free implications. I've been
working on a race-free coreutils replacement[1], targetted mainly at
Linux. Being able to key an operation off of an open file descriptor
eliminates the few remaining races inherent in the Linux filesystem ABI.

The remaining race cases are all cases where the the syscall takes a
pathname, when it really should take a pathname and an fd.

Jeff



2005-11-10 07:50:26

by Ulrich Drepper

[permalink] [raw]
Subject: Re: openat()

Jeff Garzik wrote:
> I'm interested in openat(2) for the race-free implications.

Given the limitation that only directory descriptors can be used without
the O_XATTR flag I've already added openat to glibc. It has no O_XATTR
support but I don't consider this important.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2005-11-10 10:01:47

by Jeff Garzik

[permalink] [raw]
Subject: Re: openat()

Jeff Garzik wrote:
> I'm interested in openat(2) for the race-free implications. I've been
> working on a race-free coreutils replacement[1], targetted mainly at

Whoops, the referenced [1] is:
http://www.kernel.org/pub/scm/linux/kernel/git/jgarzik/posixutils.git/

Key utils cp/mv/chmod remain unwritten, and rm needs to be updated per
review from Al. But it's a start...

Jeff


2005-11-27 22:06:46

by Jim Meyering

[permalink] [raw]
Subject: another reason to add openat in the kernel: efficiency

Miklos Szeredi wrote:
> What's wrong with using '/proc/self/fd/N' to implement [openat et al]?

It's great that we can emulate openat and related fd-relative
functions using /proc/self/fd/N/FILE, but that is markedly less
efficient than a native implementation.

Here's some real data for comparison.
The problem: remove a just-created hierarchy named
/t/z/z/.../z (1,000,000 levels deep) residing on a tmpfs file system.

Using GNU rm -rf (from coreutils-5.93[1]), that takes about 14s wall clock
time on an otherwise idle system running 2.6.14. The 5.93 implementation
uses open, fchdir, fstat, opendir/readdir, unlink, etc. to do its job:
i.e., no openat-related functions.

Compare that with GNU rm from the latest CVS sources[2], now f?chdir-free,
using /proc-based openat emulation (including emulation of fdopendir[3],
fstatat, and unlinkat). Here, the time required about 35 seconds:
more than double. Even after rewriting the emulation code not to use
snprintf, the resulting times were still about 30s.

Contrast that with Solaris 9 (with kernel-provided openat, fstatat,
fdopendir, etc.), where the openat-based implementation takes
20% *less* time than the 5.93 implementation.

Sure, there may well be other factors that explain some of the difference,
but it'd be nice to avoid the added time and space(stack) overhead of
encoding and decoding each /proc-relative file name. Of course,
syscall-based interfaces also have the advantage of working even if
/proc is not accessible.


Jim

[1] ftp://ftp.gnu.org/gnu/coreutils/coreutils-5.93.tar.bz2
[2] http://savannah.gnu.org/projects/coreutils/
[3] It's a shame to have to emulate fdopendir via `opendir ("/proc/...',
but that's only temporary, while we wait for glibc-with-fdopendir
to become more mainstream.