2008-08-14 17:15:51

by Ulrich Drepper

[permalink] [raw]
Subject: AT_EXECFN not useful

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've just removed the support for AT_EXECFN again from glibc. The
information isn't useful because the path name isn't normalized. I.e.,
it's not the actual binary path if symlinks are followed during the
resolution. This makes it unusable for the $ORIGIN handling. This is
on top of the problem with relative paths.

Unless somebody has another use case where this is useful I suggest
removing AT_EXECFN support again. It's just superfluous work and memory
use.

Of course I wouldn't object to a real implementation which always gives
me the full, normalized path name of the executable...

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkikaDEACgkQ2ijCOnn/RHSFvACgoWYhqrfJZfRe4ypFUQQR9soJ
km8AnRchCWP+irhGcgoSzd2S8iYQi7zm
=E4tz
-----END PGP SIGNATURE-----


2008-08-15 05:11:27

by John Reiser

[permalink] [raw]
Subject: Re: AT_EXECFN not useful

Ulrich Drepper wrote:
> I've just removed the support for AT_EXECFN again from glibc. The
> information isn't useful because the path name isn't normalized. I.e.,
> it's not the actual binary path if symlinks are followed during the
> resolution. This makes it unusable for the $ORIGIN handling.

AT_EXECFN does provide one actual path, namely the one that was passed
to namei. This path works for some uses and some cases, including some
uses and some cases of $ORIGIN. The path which sometimes is revealed
by readlink("/proc/self/exe",) also works for some uses and some cases,
including some uses and some cases of $ORIGIN. Neither /proc/self/exe
nor AT_EXECFN works for all uses and all cases.

> This is on top of the problem with relative paths.

Please provide a statement or citation which describes "the problem with
relative paths."

> Unless somebody has another use case where this is useful I suggest
> removing AT_EXECFN support again. It's just superfluous work and memory
> use.

AT_EXECFN is useful when readlink("/proc/self/exe",) disappears
yet the actual pathname that was passed to execve() still is accessible.
Users of the UPX program compressor (http://upx.sourceforge.net) have
asked for such a feature several times. The UPX runtime decompressor
creates an address space which is difficult to distinguish from
the address space which would result from an execve() of the original,
uncompressed file. In particular, the UPX runtime decompressor unmaps
all pages of the compressed file, which triggers the removal of the /proc/self/exe
symlink. As a palliative, UPX records readlink("/proc/self/exe",) in an environment
variable. If the UPX decompressor is not the first to run after execve(),
then this palliative can fail in the same way.

One time-honored use of "the path that was specified to execve"
is to lookup the debug symbol table for the current main program
(for example in catching SIGSEGV and filing an automatic bug report with
traceback information) under some time-honored schemes for static binding.
Another use is to consult a default static database which has been appended
to the executable itself (usually with an index near the end of the file.)
Although such usage can be defeated, nevertheless in practice many installations,
applications, and users successfully avoid the problems. In particular,
uses which do not depend on visiting other directories often work just fine.

AT_EXECFN is useful in documenting the kernel's actual behavior of
putting the pathname argument to execve() into the new address space.
Having a designated slot in the the aux vector also provides a convenient
place for a virtualizer to adjust and communicate this path in ways that
are appropriate to the virtualization.

> Of course I wouldn't object to a real implementation which always gives
> me the full, normalized path name of the executable...

Multiple hard links create multiple normalized path names, possibly including
paths that have the same endpoints but which differ in arc length. Which
of these multiple paths is to be blessed as "the full, normalized path"?

The case of mounting another filesystem on top of an intermediate directory
in "the path to the executable" may cause there to be zero "full, normalized
path name"s to the current main executable.
Depending on the relationship of mount point, working directory, pathname to
execve, and the text which surrounds $ORIGIN, then a simplistic method of
textual select+concatenate may succeed despite the failure of a method
involving detailed normalization.

These cases illustrate that there can be problems in asking for too much
when dealing with $ORIGIN. AT_EXECFN provides some actual historical data
that may be useful as a hint, especially when the kernel drops the
/proc/self/exe symbolic link. This hint is enough for some current use cases.

--
John Reiser, [email protected]

2008-08-16 00:02:08

by Ulrich Drepper

[permalink] [raw]
Subject: Re: AT_EXECFN not useful

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Reiser wrote:
> AT_EXECFN does provide one actual path, namely the one that was passed
> to namei. This path works for some uses and some cases, including some
> uses and some cases of $ORIGIN.

It is not possible to use the raw path provided. One _always_ would
have to canonicalize the path (call realpath etc). That's terribly
expensive and it requires that nothing in the path hasn't changed.


> The path which sometimes is revealed
> by readlink("/proc/self/exe",) also works for some uses and some cases,
> including some uses and some cases of $ORIGIN.

The information is always correct when it is provided. If the file goes
away then the AT_EXECFN use case also fails since realpath fails, or
worse, provides wrong data since it's using newer files.


> Neither /proc/self/exe
> nor AT_EXECFN works for all uses and all cases.

The only case where AT_EXECFN has an advantage is when /proc isn't
mounted. That's not supported anyway because this is the only way for
many things how the kernel exposes data (sysctl is deprecated) and we
need this information in many places.


> Please provide a statement or citation which describes "the problem with
> relative paths."

If the program is started via a relative path AT_EXECFN has this string.


> AT_EXECFN is useful when readlink("/proc/self/exe",) disappears

As said above, in that case it isn't useful either because one cannot
verify the value.

I've removed all support for AT_EXECFN and won't put it back since there
is no use case where it has any advantage. realpath() is terribly slow.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkimGHcACgkQ2ijCOnn/RHS8LACfdknhKbYiaKo8fe4hFgRq3VZn
emYAn2vdy7Jddzia7m6hLj6nMlnU0HiO
=NJxG
-----END PGP SIGNATURE-----

2008-08-18 15:06:17

by John Reiser

[permalink] [raw]
Subject: Re: AT_EXECFN not useful

Ulrich Drepper wrote:
> John Reiser wrote:
>
>>>AT_EXECFN does provide one actual path, namely the one that was passed
>>>to namei. This path works for some uses and some cases, including some
>>>uses and some cases of $ORIGIN.
>
>
> It is not possible to use the raw path provided. One _always_ would
> have to canonicalize the path (call realpath etc). That's terribly
> expensive and it requires that nothing in the path hasn't changed.

It certainly is possible to use the path provided. If an application
is installed with the main program in ./app_dir/bin/my_app , and DT_NEEDED
shared library libmy_lib.so in ./app_dir/lib/libmy_lib.so.1.0 , and link
libmy_lib.so -> libmy_lib.so.1.0 , and if DT_RUNPATH contains $ORIGIN/../lib ,
and if the main program is invoked via execve("./app_dir/bin/my_app",,);
then the shared library may be accessed using AT_EXECFN via:
open("./app_dir/bin/../lib/libmy_lib.so.1.0",)
which does not require any canonicalizing of the path [use of realpath, etc.]
It also does not require the success of readlink("/proc/self/exe",).

>>>The path which sometimes is revealed
>>>by readlink("/proc/self/exe",) also works for some uses and some cases,
>>>including some uses and some cases of $ORIGIN.
>
>
> The information is always correct when it is provided.

readlink("/proc/self/exe",) succeeds yet the provided path fails,
if a mount is performed on top of an intermediate directory in that path.
Also, readlink("/proc/self/exe",) returns only one of the paths
which may apply to an executable that has multiple hard links. The paths
may not all be equivalent for the purpose of resolving $ORIGIN, particularly
because they may involve differing numbers of directory components.
This can affect the result if $ORIGIN involves evaluating any ancestor
directory of the executable file.

> If the file goes
> away then the AT_EXECFN use case also fails since realpath fails, or
> worse, provides wrong data since it's using newer files.

As shown above, using AT_EXECFN need not require calling realpath.
Also, some administrators can provide an external guarantee that the files
may not be overwritten [read-only file system, restrictive permissions,
etc.], some programs have their own internal consistency checks,
the usage may be constrained by convention (shell script wrapper, etc.),
and some users are willing to live with any remaining uncertainties.

>>>Neither /proc/self/exe
>>>nor AT_EXECFN works for all uses and all cases.
>
>
> The only case where AT_EXECFN has an advantage is when /proc isn't
> mounted.

As shown above, AT_EXECFN also may have an advantage when the kernel
drops /proc/self/exe, even when /proc is mounted.

> That's not supported anyway because this is the only way for
> many things how the kernel exposes data (sysctl is deprecated) and we
> need this information in many places.

>>>Please provide a statement or citation which describes "the problem with
>>>relative paths."
>
>
> If the program is started via a relative path AT_EXECFN has this string.

A single fact by itself never can be a problem. A problem requires a
contradiction that involves two or more facts, or a discrepancy between facts
and expectations. What expectations regarding $ORIGIN does a relative path
not fulfill, and where and why do these expectations arise?

>>>AT_EXECFN is useful when readlink("/proc/self/exe",) disappears
>
>
> As said above, in that case it isn't useful either because one cannot
> verify the value.

Perhaps glibc cannot verify the value, so that may be a reason to avoid
the value in the case of suid/sgid execution. In other cases an administrator
or user may provide verification external to glibc, or the application
itself performs internal verification, or an administrator provides appropriate
guarantees, or users accept the risks directly, or usage which requires
that parent directory name "../" work when backing up into a symbolic link
is just an error on the part of the usage or design of the installation
of the application software.

> I've removed all support for AT_EXECFN and won't put it back since there
> is no use case where it has any advantage. realpath() is terribly slow.

AT_EXECFN provides an advantage for users of UPX executable compression,
where the kernel drops the link /proc/self/exe because the runtime decompressor
unmaps all pages of the original executable, yet the rest of the environment
is unchanged. AT_EXECFN provides the first documentation for long-standing
kernel behavior. More generally, AT_EXECFN provides a supported mechanism
for cooperation between a virtualizer and its subject programs. These are
sufficient reasons for adding AT_EXECFN, regardless of whether glibc
chooses to use AT_EXECFN. As shown above there are cases where users would
benefit if glibc did use AT_EXECFN, if nothing other than as a second chance
when readlink("/proc/self/exe",) fails.

--
John Reiser, [email protected]