2014-11-06 16:08:13

by David Drysdale

[permalink] [raw]
Subject: [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call

Here's another pass at this. Some things to discuss in particular:

1) The current approach for interpreted execs (i.e. mostly "#!" scripts)
gives them an argv[1] filename like "/dev/fd/<fd>/<path>". This
means that script execution in a /proc-less system isn't going to
work, at least until interpreters get smart enough to spot and
special-case the leading "/dev/fd/<fd>", or until there's something
to use in place of /dev/fd -> /proc/self/fd (e.g. Al's dupfs
suggestion, https://lkml.org/lkml/2014/10/19/141).

So is an execveat(2) that (currently) only works for non-interpreted
programs still useful?

2) I don't like having to add a new LOOKUP_EMPTY_NOPATH flag
just to prevent O_PATH fds from being fexecve()ed -- alternative
suggestions welcomed. (More generally, I don't have a great
feel for what O_PATH is for; how bad would it be to just allow
them to be fexecve()ed?)

.........

This patch set adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc
filesystem, at least for executables (rather than scripts). The
current glibc version of fexecve(3) is implemented via /proc, which
causes problems in sandboxed or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA
suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2)
syscall would be an appropriate generalization.

Also, having a new syscall means that it can take a flags argument
without back-compatibility concerns. The current implementation just
defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other
flags could be added in future -- for example, flags for new namespaces
(as suggested at https://lkml.org/lkml/2006/7/11/474).

Related history:
- https://lkml.org/lkml/2006/12/27/123 is an example of someone
realizing that fexecve() is likely to fail in a chroot environment.
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
documenting the /proc requirement of fexecve(3) in its manpage, to
"prevent other people from wasting their time".
- https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that
it's not possible to fexecve() a file descriptor for a script with
close-on-exec set (which is possible with the implementation here).
- https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
problem where a process that did setuid() could not fexecve()
because it no longer had access to /proc/self/fd; this has since
been fixed.


Changes since v5:
- Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts
that invoke an interpreter fail the exec (as they will not be able
to access the invoked file). [Andy Lutomirski]
- Don't truncate long paths. [Andy Lutomirski]
- Commonize code to open the executed file. [Eric W. Biederman]
- Mark O_PATH file descriptors so they cannot be fexecve()ed.
- Make self-test more helpful, and add additional cases:
- file offset non-zero
- binary file without execute bit
- O_CLOEXEC fds

Changes since v4, suggested by Eric W. Biederman:
- Use empty filename with AT_EMPTY_PATH flag rather than NULL
pathname to request fexecve-like behaviour.
- Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>")
rather than using d_path().
- Patch against v3.17 (bfe01a5ba249)

Changes since Meredydd's v3 patch:
- Added a selftest.
- Added a man page.
- Left open_exec() signature untouched to reduce patch impact
elsewhere (as suggested by Al Viro).
- Filled in bprm->filename with d_path() into a buffer, to avoid use
of potentially-ephemeral dentry->d_name.
- Patch against v3.14 (455c6fdbd21916).


David Drysdale (2):
syscalls,x86: implement execveat() system call
syscalls,x86: add selftest for execveat(2)

arch/x86/ia32/audit.c | 1 +
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/kernel/audit_64.c | 1 +
arch/x86/kernel/entry_64.S | 28 +++
arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/syscall_64.tbl | 2 +
arch/x86/um/sys_call_table_64.c | 1 +
fs/binfmt_em86.c | 4 +
fs/binfmt_misc.c | 4 +
fs/binfmt_script.c | 10 +
fs/exec.c | 115 ++++++++++--
fs/namei.c | 8 +-
include/linux/binfmts.h | 4 +
include/linux/compat.h | 3 +
include/linux/fs.h | 1 +
include/linux/namei.h | 1 +
include/linux/sched.h | 4 +
include/linux/syscalls.h | 4 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/sys_ni.c | 3 +
lib/audit.c | 3 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/exec/.gitignore | 7 +
tools/testing/selftests/exec/Makefile | 25 +++
tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++
25 files changed, 542 insertions(+), 15 deletions(-)
create mode 100644 tools/testing/selftests/exec/.gitignore
create mode 100644 tools/testing/selftests/exec/Makefile
create mode 100644 tools/testing/selftests/exec/execveat.c

--
2.1.0.rc2.206.gedb03e5


2014-11-06 16:07:48

by David Drysdale

[permalink] [raw]
Subject: [PATCHv6 2/3] syscalls,x86: add selftest for execveat(2)

Signed-off-by: David Drysdale <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/exec/.gitignore | 7 +
tools/testing/selftests/exec/Makefile | 25 +++
tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++
4 files changed, 354 insertions(+)
create mode 100644 tools/testing/selftests/exec/.gitignore
create mode 100644 tools/testing/selftests/exec/Makefile
create mode 100644 tools/testing/selftests/exec/execveat.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 36ff2e4c7b6f..210cf68b3511 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += powerpc
TARGETS += user
TARGETS += sysctl
TARGETS += firmware
+TARGETS += exec

TARGETS_HOTPLUG = cpu-hotplug
TARGETS_HOTPLUG += memory-hotplug
diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
new file mode 100644
index 000000000000..778147d01af9
--- /dev/null
+++ b/tools/testing/selftests/exec/.gitignore
@@ -0,0 +1,7 @@
+subdir*
+script*
+execveat
+execveat.symlink
+execveat.moved
+execveat.ephemeral
+execveat.denatured
\ No newline at end of file
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
new file mode 100644
index 000000000000..c97e0aaea02d
--- /dev/null
+++ b/tools/testing/selftests/exec/Makefile
@@ -0,0 +1,25 @@
+CC = $(CROSS_COMPILE)gcc
+CFLAGS = -Wall
+BINARIES = execveat
+DEPS = execveat.symlink execveat.denatured script subdir
+all: $(BINARIES) $(DEPS)
+
+subdir:
+ mkdir -p $@
+script:
+ echo '#!/bin/sh' > $@
+ echo 'exit $$*' >> $@
+ chmod +x $@
+execveat.symlink: execveat
+ ln -s -f $< $@
+execveat.denatured: execveat
+ cp $< $@
+ chmod -x $@
+%: %.c
+ $(CC) $(CFLAGS) -o $@ $^
+
+run_tests: all
+ ./execveat
+
+clean:
+ rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved
diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c
new file mode 100644
index 000000000000..f6ea48176393
--- /dev/null
+++ b/tools/testing/selftests/exec/execveat.c
@@ -0,0 +1,321 @@
+/*
+ * Copyright (c) 2014 Google, Inc.
+ *
+ * Licensed under the terms of the GNU GPL License version 2
+ *
+ * Selftests for execveat(2).
+ */
+
+#define _GNU_SOURCE /* to get O_PATH, AT_EMPTY_PATH */
+#include <sys/sendfile.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+static char *envp[] = { "IN_TEST=yes", NULL, NULL };
+static char *argv[] = { "execveat", "99", NULL };
+
+static int execveat_(int fd, const char *path, char **argv, char **envp,
+ int flags)
+{
+#ifdef __NR_execveat
+ return syscall(__NR_execveat, fd, path, argv, envp, flags);
+#else
+ errno = -ENOSYS;
+ return -1;
+#endif
+}
+
+#define check_execveat_fail(fd, path, flags, errno) \
+ _check_execveat_fail(fd, path, flags, errno, #errno)
+static int _check_execveat_fail(int fd, const char *path, int flags,
+ int expected_errno, const char *errno_str)
+{
+ int rc;
+
+ errno = 0;
+ printf("Check failure of execveat(%d, '%s', %d) with %s... ",
+ fd, path?:"(null)", flags, errno_str);
+ rc = execveat_(fd, path, argv, envp, flags);
+
+ if (rc > 0) {
+ printf("[FAIL] (unexpected success from execveat(2))\n");
+ return 1;
+ }
+ if (errno != expected_errno) {
+ printf("[FAIL] (expected errno %d (%s) not %d (%s)\n",
+ expected_errno, strerror(expected_errno),
+ errno, strerror(errno));
+ return 1;
+ }
+ printf("[OK]\n");
+ return 0;
+}
+
+static int check_execveat_invoked_rc(int fd, const char *path, int flags,
+ int expected_rc)
+{
+ int status;
+ int rc;
+ pid_t child;
+
+ printf("Check success of execveat(%d, '%s', %d)... ",
+ fd, path?:"(null)", flags);
+ child = fork();
+ if (child < 0) {
+ printf("[FAIL] (fork() failed)\n");
+ return 1;
+ }
+ if (child == 0) {
+ /* Child: do execveat(). */
+ rc = execveat_(fd, path, argv, envp, flags);
+ printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n",
+ rc, errno, strerror(errno));
+ exit(1); /* should not reach here */
+ }
+ /* Parent: wait for & check child's exit status. */
+ rc = waitpid(child, &status, 0);
+ if (rc != child) {
+ printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc);
+ return 1;
+ }
+ if (!WIFEXITED(status)) {
+ printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n",
+ child, status);
+ return 1;
+ }
+ if (WEXITSTATUS(status) != expected_rc) {
+ printf("[FAIL] (child %d exited with %d not %d)\n",
+ child, WEXITSTATUS(status), expected_rc);
+ return 1;
+ }
+ printf("[OK]\n");
+ return 0;
+}
+
+static int check_execveat(int fd, const char *path, int flags)
+{
+ return check_execveat_invoked_rc(fd, path, flags, 99);
+}
+
+static char *concat(const char *left, const char *right)
+{
+ char *result = malloc(strlen(left) + strlen(right) + 1);
+
+ strcpy(result, left);
+ strcat(result, right);
+ return result;
+}
+
+static int open_or_die(const char *filename, int flags)
+{
+ int fd = open(filename, flags);
+
+ if (fd < 0) {
+ printf("Failed to open '%s'; "
+ "check prerequisites are available\n", filename);
+ exit(1);
+ }
+ return fd;
+}
+
+static int run_tests(void)
+{
+ int fail = 0;
+ char *fullname = realpath("execveat", NULL);
+ char *fullname_script = realpath("script", NULL);
+ char *fullname_symlink = concat(fullname, ".symlink");
+ int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY);
+ int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral",
+ O_DIRECTORY|O_RDONLY);
+ int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY);
+ int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH);
+ int dot_dfd_cloexec = open_or_die(".", O_DIRECTORY|O_RDONLY|O_CLOEXEC);
+ int fd = open_or_die("execveat", O_RDONLY);
+ int fd_path = open_or_die("execveat", O_RDONLY|O_PATH);
+ int fd_symlink = open_or_die("execveat.symlink", O_RDONLY);
+ int fd_denatured = open_or_die("execveat.denatured", O_RDONLY);
+ int fd_script = open_or_die("script", O_RDONLY);
+ int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY);
+ int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY);
+ int fd_cloexec = open_or_die("execveat", O_RDONLY|O_CLOEXEC);
+ int fd_script_cloexec = open_or_die("script", O_RDONLY|O_CLOEXEC);
+
+ /* Change file position to confirm it doesn't affect anything */
+ lseek(fd, 10, SEEK_SET);
+
+ /* Normal executable file: */
+ /* dfd + path */
+ fail += check_execveat(subdir_dfd, "../execveat", 0);
+ fail += check_execveat(dot_dfd, "execveat", 0);
+ fail += check_execveat(dot_dfd_path, "execveat", 0);
+ /* absolute path */
+ fail += check_execveat(AT_FDCWD, fullname, 0);
+ /* absolute path with nonsense dfd */
+ fail += check_execveat(99, fullname, 0);
+ /* fd + no path */
+ fail += check_execveat(fd, "", AT_EMPTY_PATH);
+ /* O_CLOEXEC fd + no path */
+ fail += check_execveat(fd_cloexec, "", AT_EMPTY_PATH);
+
+ /* Mess with executable file that's already open: */
+ /* fd + no path to a file that's been renamed */
+ rename("execveat.ephemeral", "execveat.moved");
+ fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+ /* fd + no path to a file that's been deleted */
+ unlink("execveat.moved"); /* remove the file now fd open */
+ fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+
+ /* Invalid argument failures */
+ fail += check_execveat_fail(fd, "", 0, ENOENT);
+ fail += check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT);
+
+ /* Symlink to executable file: */
+ /* dfd + path */
+ fail += check_execveat(dot_dfd, "execveat.symlink", 0);
+ fail += check_execveat(dot_dfd_path, "execveat.symlink", 0);
+ /* absolute path */
+ fail += check_execveat(AT_FDCWD, fullname_symlink, 0);
+ /* fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */
+ fail += check_execveat(fd_symlink, "", AT_EMPTY_PATH);
+ fail += check_execveat(fd_symlink, "",
+ AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+
+ /* Symlink fails when AT_SYMLINK_NOFOLLOW set: */
+ /* dfd + path */
+ fail += check_execveat_fail(dot_dfd, "execveat.symlink",
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+ fail += check_execveat_fail(dot_dfd_path, "execveat.symlink",
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+ /* absolute path */
+ fail += check_execveat_fail(AT_FDCWD, fullname_symlink,
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+
+ /* Shell script wrapping executable file: */
+ /* dfd + path */
+ fail += check_execveat(subdir_dfd, "../script", 0);
+ fail += check_execveat(dot_dfd, "script", 0);
+ fail += check_execveat(dot_dfd_path, "script", 0);
+ /* absolute path */
+ fail += check_execveat(AT_FDCWD, fullname_script, 0);
+ /* fd + no path */
+ fail += check_execveat(fd_script, "", AT_EMPTY_PATH);
+ fail += check_execveat(fd_script, "",
+ AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+ /* O_CLOEXEC fd fails for a script (as script file inaccessible) */
+ fail += check_execveat_fail(fd_script_cloexec, "", AT_EMPTY_PATH,
+ ENOENT);
+ fail += check_execveat_fail(dot_dfd_cloexec, "script", 0, ENOENT);
+
+ /* Mess with script file that's already open: */
+ /* fd + no path to a file that's been renamed */
+ rename("script.ephemeral", "script.moved");
+ fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+ /* fd + no path to a file that's been deleted */
+ unlink("script.moved"); /* remove the file while fd open */
+ fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+
+ /* Rename a subdirectory in the path: */
+ rename("subdir.ephemeral", "subdir.moved");
+ fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
+ fail += check_execveat(subdir_dfd_ephemeral, "script", 0);
+ /* Remove the subdir and its contents */
+ unlink("subdir.moved/script");
+ unlink("subdir.moved");
+ /* Shell loads via deleted subdir OK because name starts with .. */
+ fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
+ fail += check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT);
+
+ /* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */
+ fail += check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL);
+ /* Invalid path => ENOENT */
+ fail += check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT);
+ fail += check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT);
+ fail += check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT);
+ /* Attempt to execute directory => EACCES */
+ fail += check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES);
+ /* Attempt to execute non-executable => EACCES */
+ fail += check_execveat_fail(dot_dfd, "Makefile", 0, EACCES);
+ fail += check_execveat_fail(fd_denatured, "", AT_EMPTY_PATH, EACCES);
+ /* Attempt to execute file opened with O_PATH => EBADF */
+ fail += check_execveat_fail(fd_path, "", AT_EMPTY_PATH, EBADF);
+ /* Attempt to execute nonsense FD => EBADF */
+ fail += check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF);
+ fail += check_execveat_fail(99, "execveat", 0, EBADF);
+ /* Attempt to execute relative to non-directory => ENOTDIR */
+ fail += check_execveat_fail(fd, "execveat", 0, ENOTDIR);
+
+ return fail;
+}
+
+static void exe_cp(const char *src, const char *dest)
+{
+ int in_fd = open_or_die(src, O_RDONLY);
+ int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755);
+ struct stat info;
+
+ fstat(in_fd, &info);
+ sendfile(out_fd, in_fd, NULL, info.st_size);
+ close(in_fd);
+ close(out_fd);
+}
+
+static void prerequisites(void)
+{
+ int fd;
+ const char *script = "#!/bin/sh\nexit $*\n";
+
+ /* Create ephemeral copies of files */
+ exe_cp("execveat", "execveat.ephemeral");
+ exe_cp("script", "script.ephemeral");
+ mkdir("subdir.ephemeral", 0755);
+
+ fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755);
+ write(fd, script, strlen(script));
+ close(fd);
+}
+
+int main(int argc, char **argv)
+{
+ int ii;
+ int rc;
+ const char *verbose = getenv("VERBOSE");
+
+ if (argc >= 2) {
+ /* If we are invoked with an argument, don't run tests. */
+ const char *in_test = getenv("IN_TEST");
+
+ if (verbose) {
+ printf(" invoked with:");
+ for (ii = 0; ii < argc; ii++)
+ printf(" [%d]='%s'", ii, argv[ii]);
+ printf("\n");
+ }
+
+ /* Check expected environment transferred. */
+ if (!in_test || strcmp(in_test, "yes") != 0) {
+ printf("[FAIL] (no IN_TEST=yes in env)\n");
+ return 1;
+ }
+
+ /* Use the final argument as an exit code. */
+ rc = atoi(argv[argc - 1]);
+ fflush(stdout);
+ } else {
+ prerequisites();
+ if (verbose)
+ envp[1] = "VERBOSE=1";
+ rc = run_tests();
+ if (rc > 0)
+ printf("%d tests failed\n", rc);
+ }
+ return rc;
+}
--
2.1.0.rc2.206.gedb03e5

2014-11-06 16:07:44

by David Drysdale

[permalink] [raw]
Subject: [PATCHv6 man-pages 3/3] execveat.2: initial man page for execveat(2)

Signed-off-by: David Drysdale <[email protected]>
---
man2/execveat.2 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 153 insertions(+)
create mode 100644 man2/execveat.2

diff --git a/man2/execveat.2 b/man2/execveat.2
new file mode 100644
index 000000000000..937d79e4c4f0
--- /dev/null
+++ b/man2/execveat.2
@@ -0,0 +1,153 @@
+.\" Copyright (c) 2014 Google, Inc.
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date. The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein. The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual"
+.SH NAME
+execveat \- execute program relative to a directory file descriptor
+.SH SYNOPSIS
+.B #include <unistd.h>
+.sp
+.BI "int execveat(int " fd ", const char *" pathname ","
+.br
+.BI " char *const " argv "[], char *const " envp "[],"
+.br
+.BI " int " flags);
+.SH DESCRIPTION
+The
+.BR execveat ()
+system call executes the program pointed to by the combination of \fIfd\fP and \fIpathname\fP.
+The
+.BR execveat ()
+system call operates in exactly the same way as
+.BR execve (2),
+except for the differences described in this manual page.
+
+If the pathname given in
+.I pathname
+is relative, then it is interpreted relative to the directory
+referred to by the file descriptor
+.I fd
+(rather than relative to the current working directory of
+the calling process, as is done by
+.BR execve (2)
+for a relative pathname).
+
+If
+.I pathname
+is relative and
+.I fd
+is the special value
+.BR AT_FDCWD ,
+then
+.I pathname
+is interpreted relative to the current working
+directory of the calling process (like
+.BR execve (2)).
+
+If
+.I pathname
+is absolute, then
+.I fd
+is ignored.
+
+If
+.I pathname
+is an empty string and the
+.BR AT_EMPTY_PATH
+flag is specified, then the file descriptor
+.I fd
+specifies the file to be executed.
+
+.I flags
+can either be 0, or include the following flags:
+.TP
+.BR AT_EMPTY_PATH
+If
+.I pathname
+is an empty string, operate on the file referred to by
+.IR fd
+(which may have been obtained using the
+.BR open (2)
+.B O_PATH
+flag).
+.TP
+.B AT_SYMLINK_NOFOLLOW
+If the file identified by
+.I fd
+and a non-NULL
+.I pathname
+is a symbolic link, then the call fails with the error
+.BR EINVAL .
+.SH "RETURN VALUE"
+On success,
+.BR execveat ()
+does not return. On error \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+The same errors that occur for
+.BR execve (2)
+can also occur for
+.BR execveat ().
+The following additional errors can occur for
+.BR execveat ():
+.TP
+.B EBADF
+.I fd
+is not a valid file descriptor.
+.TP
+.B ENOENT
+The program identified by \fIfd\fP and \fIpathname\fP requires the
+use of an interpreter program (such as a script starting with
+"#!") but the file descriptor
+.I fd
+was opened with the
+.B O_CLOEXEC
+flag and so the program file is inaccessible to the launched interpreter.
+.TP
+.B EINVAL
+Invalid flag specified in
+.IR flags .
+.TP
+.B ENOTDIR
+.I pathname
+is relative and
+.I fd
+is a file descriptor referring to a file other than a directory.
+.SH VERSIONS
+.BR execveat ()
+was added to Linux in kernel 3.???.
+.SH NOTES
+In addition to the reasons explained in
+.BR openat (2),
+the
+.BR execveat ()
+system call is also needed to allow
+.BR fexecve (3)
+to be implemented on systems that do not have the
+.I /proc
+filesystem mounted.
+.SH SEE ALSO
+.BR execve (2),
+.BR fexecve (3)
--
2.1.0.rc2.206.gedb03e5

2014-11-06 16:08:17

by David Drysdale

[permalink] [raw]
Subject: [PATCHv6 1/3] syscalls,x86: implement execveat() system call

Add a new execveat(2) system call. execveat() is to execve() as
openat() is to open(): it takes a file descriptor that refers to a
directory, and resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers. This
replicates the functionality of fexecve(), which is a system call in
other UNIXen, but in Linux glibc it depends on opening
"/proc/self/fd/<fd>" (and so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found. This does however mean that
execution of a script in a /proc-less environment won't work; also,
script execution via an O_CLOEXEC file descriptor fails (as the file
will not be accessible after exec).

Only x86-64, i386 and x32 ABIs are supported in this patch.

Based on patches by Meredydd Luff <[email protected]>

Signed-off-by: David Drysdale <[email protected]>
---
arch/x86/ia32/audit.c | 1 +
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/kernel/audit_64.c | 1 +
arch/x86/kernel/entry_64.S | 28 ++++++++++
arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/syscall_64.tbl | 2 +
arch/x86/um/sys_call_table_64.c | 1 +
fs/binfmt_em86.c | 4 ++
fs/binfmt_misc.c | 4 ++
fs/binfmt_script.c | 10 ++++
fs/exec.c | 115 +++++++++++++++++++++++++++++++++-----
fs/namei.c | 8 ++-
include/linux/binfmts.h | 4 ++
include/linux/compat.h | 3 +
include/linux/fs.h | 1 +
include/linux/namei.h | 1 +
include/linux/sched.h | 4 ++
include/linux/syscalls.h | 4 ++
include/uapi/asm-generic/unistd.h | 4 +-
kernel/sys_ni.c | 3 +
lib/audit.c | 3 +
21 files changed, 188 insertions(+), 15 deletions(-)

diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c
index 5d7b381da692..2eccc8932ae6 100644
--- a/arch/x86/ia32/audit.c
+++ b/arch/x86/ia32/audit.c
@@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
case __NR_socketcall:
return 4;
case __NR_execve:
+ case __NR_execveat:
return 5;
default:
return 1;
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb05023c..2516c09743e0 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -464,6 +464,7 @@ GLOBAL(\label)
PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
PTREGSCALL stub32_sigreturn, sys32_sigreturn
PTREGSCALL stub32_execve, compat_sys_execve
+ PTREGSCALL stub32_execveat, compat_sys_execveat
PTREGSCALL stub32_fork, sys_fork
PTREGSCALL stub32_vfork, sys_vfork

diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c
index 06d3e5a14d9d..f3672508b249 100644
--- a/arch/x86/kernel/audit_64.c
+++ b/arch/x86/kernel/audit_64.c
@@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
case __NR_openat:
return 3;
case __NR_execve:
+ case __NR_execveat:
return 5;
default:
return 0;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 2fac1343a90b..00c4526e6ffe 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -665,6 +665,20 @@ ENTRY(stub_execve)
CFI_ENDPROC
END(stub_execve)

+ENTRY(stub_execveat)
+ CFI_STARTPROC
+ addq $8, %rsp
+ PARTIAL_FRAME 0
+ SAVE_REST
+ FIXUP_TOP_OF_STACK %r11
+ call sys_execveat
+ RESTORE_TOP_OF_STACK %r11
+ movq %rax,RAX(%rsp)
+ RESTORE_REST
+ jmp int_ret_from_sys_call
+ CFI_ENDPROC
+END(stub_execveat)
+
/*
* sigreturn is special because it needs to restore all registers on return.
* This cannot be done with SYSRET, so use the IRET return path instead.
@@ -710,6 +724,20 @@ ENTRY(stub_x32_execve)
CFI_ENDPROC
END(stub_x32_execve)

+ENTRY(stub_x32_execveat)
+ CFI_STARTPROC
+ addq $8, %rsp
+ PARTIAL_FRAME 0
+ SAVE_REST
+ FIXUP_TOP_OF_STACK %r11
+ call compat_sys_execveat
+ RESTORE_TOP_OF_STACK %r11
+ movq %rax,RAX(%rsp)
+ RESTORE_REST
+ jmp int_ret_from_sys_call
+ CFI_ENDPROC
+END(stub_x32_execveat)
+
#endif

/*
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 028b78168d85..2633e3195455 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -363,3 +363,4 @@
354 i386 seccomp sys_seccomp
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
+357 i386 execveat sys_execveat stub32_execveat
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 35dd922727b9..1af5badd159c 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -327,6 +327,7 @@
318 common getrandom sys_getrandom
319 common memfd_create sys_memfd_create
320 common kexec_file_load sys_kexec_file_load
+321 64 execveat stub_execveat

#
# x32-specific system call numbers start at 512 to avoid cache impact
@@ -365,3 +366,4 @@
542 x32 getsockopt compat_sys_getsockopt
543 x32 io_setup compat_sys_io_setup
544 x32 io_submit compat_sys_io_submit
+545 x32 execveat stub_x32_execveat
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index f2f0723070ca..20c3649d0691 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -31,6 +31,7 @@
#define stub_fork sys_fork
#define stub_vfork sys_vfork
#define stub_execve sys_execve
+#define stub_execveat sys_execveat
#define stub_rt_sigreturn sys_rt_sigreturn

#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index f37b08cea1f7..490538536cb4 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm)
return -ENOEXEC;
}

+ /* Need to be able to load the file after exec */
+ if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+ return -ENOENT;
+
allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index b60500300dd7..e659f5562356 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -127,6 +127,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
if (!fmt)
goto _ret;

+ /* Need to be able to load the file after exec */
+ if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+ return -ENOENT;
+
if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
retval = remove_arg_zero(bprm);
if (retval)
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 5027a3e14922..afdf4e3cafc2 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm)

if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
return -ENOEXEC;
+
+ /*
+ * If the script filename will be inaccessible after exec, typically
+ * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
+ * up now (on the assumption that the interpreter will want to load
+ * this file).
+ */
+ if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+ return -ENOENT;
+
/*
* This section does the #! interpretation.
* Sorta complicated, but hopefully it will work. -TYT
diff --git a/fs/exec.c b/fs/exec.c
index a2b42a98c743..800d232c17bb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -747,18 +747,26 @@ EXPORT_SYMBOL(setup_arg_pages);

#endif /* CONFIG_MMU */

-static struct file *do_open_exec(struct filename *name)
+static struct file *do_open_execat(int fd, struct filename *name, int flags)
{
struct file *file;
int err;
- static const struct open_flags open_exec_flags = {
+ struct open_flags open_exec_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
.acc_mode = MAY_EXEC | MAY_OPEN,
.intent = LOOKUP_OPEN,
.lookup_flags = LOOKUP_FOLLOW,
};

- file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
+ if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+ return ERR_PTR(-EINVAL);
+ if (flags & AT_SYMLINK_NOFOLLOW)
+ open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
+ if (flags & AT_EMPTY_PATH)
+ open_exec_flags.lookup_flags |= (LOOKUP_EMPTY |
+ LOOKUP_EMPTY_NOPATH);
+
+ file = do_filp_open(fd, name, &open_exec_flags);
if (IS_ERR(file))
goto out;

@@ -769,12 +777,13 @@ static struct file *do_open_exec(struct filename *name)
if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
goto exit;

- fsnotify_open(file);
-
err = deny_write_access(file);
if (err)
goto exit;

+ if (name->name[0] != '\0')
+ fsnotify_open(file);
+
out:
return file;

@@ -786,7 +795,7 @@ exit:
struct file *open_exec(const char *name)
{
struct filename tmp = { .name = name };
- return do_open_exec(&tmp);
+ return do_open_execat(AT_FDCWD, &tmp, 0);
}
EXPORT_SYMBOL(open_exec);

@@ -1422,10 +1431,12 @@ static int exec_binprm(struct linux_binprm *bprm)
/*
* sys_execve() executes a new program.
*/
-static int do_execve_common(struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp)
+static int do_execveat_common(int fd, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags)
{
+ char *pathbuf = NULL;
struct linux_binprm *bprm;
struct file *file;
struct files_struct *displaced;
@@ -1466,7 +1477,7 @@ static int do_execve_common(struct filename *filename,
check_unsafe_exec(bprm);
current->in_execve = 1;

- file = do_open_exec(filename);
+ file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (IS_ERR(file))
goto out_unmark;
@@ -1474,7 +1485,30 @@ static int do_execve_common(struct filename *filename,
sched_exec();

bprm->file = file;
- bprm->filename = bprm->interp = filename->name;
+ if (fd == AT_FDCWD || filename->name[0] == '/') {
+ bprm->filename = filename->name;
+ } else {
+ /* "/dev/fd/2147483647/" + filename->name */
+ int maxlen = 19 + strlen(filename->name);
+
+ pathbuf = kmalloc(maxlen, GFP_TEMPORARY);
+ if (!pathbuf) {
+ retval = -ENOMEM;
+ goto out_unmark;
+ }
+ if (filename->name[0] == '\0')
+ sprintf(pathbuf, "/dev/fd/%d", fd);
+ else
+ snprintf(pathbuf, maxlen,
+ "/dev/fd/%d/%s", fd, filename->name);
+ /* Record that a name derived from an O_CLOEXEC fd will be
+ * inaccessible after exec. Relies on having exclusive access to
+ * current->files (due to unshare_files above). */
+ if (close_on_exec(fd, current->files->fdt))
+ bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
+ bprm->filename = pathbuf;
+ }
+ bprm->interp = bprm->filename;

retval = bprm_mm_init(bprm);
if (retval)
@@ -1532,6 +1566,7 @@ out_unmark:

out_free:
free_bprm(bprm);
+ kfree(pathbuf);

out_files:
if (displaced)
@@ -1547,7 +1582,18 @@ int do_execve(struct filename *filename,
{
struct user_arg_ptr argv = { .ptr.native = __argv };
struct user_arg_ptr envp = { .ptr.native = __envp };
- return do_execve_common(filename, argv, envp);
+ return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+int do_execveat(int fd, struct filename *filename,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
+ int flags)
+{
+ struct user_arg_ptr argv = { .ptr.native = __argv };
+ struct user_arg_ptr envp = { .ptr.native = __envp };
+
+ return do_execveat_common(fd, filename, argv, envp, flags);
}

#ifdef CONFIG_COMPAT
@@ -1563,7 +1609,23 @@ static int compat_do_execve(struct filename *filename,
.is_compat = true,
.ptr.compat = __envp,
};
- return do_execve_common(filename, argv, envp);
+ return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+static int compat_do_execveat(int fd, struct filename *filename,
+ const compat_uptr_t __user *__argv,
+ const compat_uptr_t __user *__envp,
+ int flags)
+{
+ struct user_arg_ptr argv = {
+ .is_compat = true,
+ .ptr.compat = __argv,
+ };
+ struct user_arg_ptr envp = {
+ .is_compat = true,
+ .ptr.compat = __envp,
+ };
+ return do_execveat_common(fd, filename, argv, envp, flags);
}
#endif

@@ -1603,6 +1665,20 @@ SYSCALL_DEFINE3(execve,
{
return do_execve(getname(filename), argv, envp);
}
+
+SYSCALL_DEFINE5(execveat,
+ int, fd, const char __user *, filename,
+ const char __user *const __user *, argv,
+ const char __user *const __user *, envp,
+ int, flags)
+{
+ int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+ return do_execveat(fd,
+ getname_flags(filename, lookup_flags, NULL),
+ argv, envp, flags);
+}
+
#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
const compat_uptr_t __user *, argv,
@@ -1610,4 +1686,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
{
return compat_do_execve(getname(filename), argv, envp);
}
+
+COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
+ const char __user *, filename,
+ const compat_uptr_t __user *, argv,
+ const compat_uptr_t __user *, envp,
+ int, flags)
+{
+ int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+ return compat_do_execveat(fd,
+ getname_flags(filename, lookup_flags, NULL),
+ argv, envp, flags);
+}
#endif
diff --git a/fs/namei.c b/fs/namei.c
index a7b05bf82d31..757df6777ae5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -130,7 +130,7 @@ void final_putname(struct filename *name)

#define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename))

-static struct filename *
+struct filename *
getname_flags(const char __user *filename, int flags, int *empty)
{
struct filename *result, *err;
@@ -1891,6 +1891,12 @@ static int path_init(int dfd, const char *name, unsigned int flags,
fdput(f);
return -ENOTDIR;
}
+ } else if (flags & LOOKUP_EMPTY_NOPATH) {
+ /* When using the fd alone, disallow O_PATH files */
+ if (f.file->f_mode & FMODE_PATH) {
+ fdput(f);
+ return -EBADF;
+ }
}

nd->path = f.file->f_path;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 61f29e5ea840..576e4639ca60 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -53,6 +53,10 @@ struct linux_binprm {
#define BINPRM_FLAGS_EXECFD_BIT 1
#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)

+/* filename of the binary will be inaccessible after exec */
+#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
+#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)
+
/* Function parameter for binfmt->coredump */
struct coredump_params {
const siginfo_t *siginfo;
diff --git a/include/linux/compat.h b/include/linux/compat.h
index e6494261eaff..7450ca2ac1fc 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);

asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
const compat_uptr_t __user *envp);
+asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
+ const compat_uptr_t __user *argv,
+ const compat_uptr_t __user *envp, int flags);

asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
compat_ulong_t __user *outp, compat_ulong_t __user *exp,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 94187721ad41..e9818574d738 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
extern struct file * dentry_open(const struct path *, int, const struct cred *);
extern int filp_close(struct file *, fl_owner_t id);

+extern struct filename *getname_flags(const char __user *, int, int *);
extern struct filename *getname(const char __user *);
extern struct filename *getname_kernel(const char *);

diff --git a/include/linux/namei.h b/include/linux/namei.h
index 492de72560fa..eaa25cc72213 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -55,6 +55,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
#define LOOKUP_JUMPED 0x1000
#define LOOKUP_ROOT 0x2000
#define LOOKUP_EMPTY 0x4000
+#define LOOKUP_EMPTY_NOPATH 0x8000

extern int user_path_at(int, const char __user *, unsigned, struct path *);
extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b867a4dab38a..33e056da7d33 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2430,6 +2430,10 @@ extern void do_group_exit(int);
extern int do_execve(struct filename *,
const char __user * const __user *,
const char __user * const __user *);
+extern int do_execveat(int, struct filename *,
+ const char __user * const __user *,
+ const char __user * const __user *,
+ int);
extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
struct task_struct *fork_idle(int);
extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 0f86d85a9ce4..df5422294deb 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
asmlinkage long sys_getrandom(char __user *buf, size_t count,
unsigned int flags);

+asmlinkage long sys_execveat(int dfd, const char __user *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 11d11bc5c78f..feef07d29663 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
__SYSCALL(__NR_getrandom, sys_getrandom)
#define __NR_memfd_create 279
__SYSCALL(__NR_memfd_create, sys_memfd_create)
+#define __NR_execveat 280
+__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)

#undef __NR_syscalls
-#define __NR_syscalls 280
+#define __NR_syscalls 281

/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 391d4ddb6f4b..efb06058ad3e 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -218,3 +218,6 @@ cond_syscall(sys_kcmp);

/* operate on Secure Computing state */
cond_syscall(sys_seccomp);
+
+/* execveat */
+cond_syscall(sys_execveat);
diff --git a/lib/audit.c b/lib/audit.c
index 1d726a22565b..b8fb5ee81e26 100644
--- a/lib/audit.c
+++ b/lib/audit.c
@@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall)
case __NR_socketcall:
return 4;
#endif
+#ifdef __NR_execveat
+ case __NR_execveat:
+#endif
case __NR_execve:
return 5;
default:
--
2.1.0.rc2.206.gedb03e5

2014-11-06 16:55:46

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call

On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <[email protected]> wrote:
> Here's another pass at this. Some things to discuss in particular:
>
> 1) The current approach for interpreted execs (i.e. mostly "#!" scripts)
> gives them an argv[1] filename like "/dev/fd/<fd>/<path>". This
> means that script execution in a /proc-less system isn't going to
> work, at least until interpreters get smart enough to spot and
> special-case the leading "/dev/fd/<fd>", or until there's something
> to use in place of /dev/fd -> /proc/self/fd (e.g. Al's dupfs
> suggestion, https://lkml.org/lkml/2014/10/19/141).
>
> So is an execveat(2) that (currently) only works for non-interpreted
> programs still useful?

I think it is. I would make sure to return a distinguishable error
code in the event that the failure happens because of one of the
unsupported cases.

>
> 2) I don't like having to add a new LOOKUP_EMPTY_NOPATH flag
> just to prevent O_PATH fds from being fexecve()ed -- alternative
> suggestions welcomed. (More generally, I don't have a great
> feel for what O_PATH is for; how bad would it be to just allow
> them to be fexecve()ed?)

If you fexecve an O_PATH fd, does it at least check that you have
execute permission on the inode? If so, it seems okay to allow it.

--Andy

>
> .........
>
> This patch set adds execveat(2) for x86, and is derived from Meredydd
> Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
>
> The primary aim of adding an execveat syscall is to allow an
> implementation of fexecve(3) that does not rely on the /proc
> filesystem, at least for executables (rather than scripts). The
> current glibc version of fexecve(3) is implemented via /proc, which
> causes problems in sandboxed or otherwise restricted environments.
>
> Given the desire for a /proc-free fexecve() implementation, HPA
> suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2)
> syscall would be an appropriate generalization.
>
> Also, having a new syscall means that it can take a flags argument
> without back-compatibility concerns. The current implementation just
> defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other
> flags could be added in future -- for example, flags for new namespaces
> (as suggested at https://lkml.org/lkml/2006/7/11/474).
>
> Related history:
> - https://lkml.org/lkml/2006/12/27/123 is an example of someone
> realizing that fexecve() is likely to fail in a chroot environment.
> - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
> documenting the /proc requirement of fexecve(3) in its manpage, to
> "prevent other people from wasting their time".
> - https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that
> it's not possible to fexecve() a file descriptor for a script with
> close-on-exec set (which is possible with the implementation here).

Confused. How does it work for a close-on-exec script? I understand
how it works for a close-on-exec ELF binary.

--Andy

> - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
> problem where a process that did setuid() could not fexecve()
> because it no longer had access to /proc/self/fd; this has since
> been fixed.
>
>
> Changes since v5:
> - Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts
> that invoke an interpreter fail the exec (as they will not be able
> to access the invoked file). [Andy Lutomirski]
> - Don't truncate long paths. [Andy Lutomirski]
> - Commonize code to open the executed file. [Eric W. Biederman]
> - Mark O_PATH file descriptors so they cannot be fexecve()ed.
> - Make self-test more helpful, and add additional cases:
> - file offset non-zero
> - binary file without execute bit
> - O_CLOEXEC fds
>
> Changes since v4, suggested by Eric W. Biederman:
> - Use empty filename with AT_EMPTY_PATH flag rather than NULL
> pathname to request fexecve-like behaviour.
> - Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>")
> rather than using d_path().
> - Patch against v3.17 (bfe01a5ba249)
>
> Changes since Meredydd's v3 patch:
> - Added a selftest.
> - Added a man page.
> - Left open_exec() signature untouched to reduce patch impact
> elsewhere (as suggested by Al Viro).
> - Filled in bprm->filename with d_path() into a buffer, to avoid use
> of potentially-ephemeral dentry->d_name.
> - Patch against v3.14 (455c6fdbd21916).
>
>
> David Drysdale (2):
> syscalls,x86: implement execveat() system call
> syscalls,x86: add selftest for execveat(2)
>
> arch/x86/ia32/audit.c | 1 +
> arch/x86/ia32/ia32entry.S | 1 +
> arch/x86/kernel/audit_64.c | 1 +
> arch/x86/kernel/entry_64.S | 28 +++
> arch/x86/syscalls/syscall_32.tbl | 1 +
> arch/x86/syscalls/syscall_64.tbl | 2 +
> arch/x86/um/sys_call_table_64.c | 1 +
> fs/binfmt_em86.c | 4 +
> fs/binfmt_misc.c | 4 +
> fs/binfmt_script.c | 10 +
> fs/exec.c | 115 ++++++++++--
> fs/namei.c | 8 +-
> include/linux/binfmts.h | 4 +
> include/linux/compat.h | 3 +
> include/linux/fs.h | 1 +
> include/linux/namei.h | 1 +
> include/linux/sched.h | 4 +
> include/linux/syscalls.h | 4 +
> include/uapi/asm-generic/unistd.h | 4 +-
> kernel/sys_ni.c | 3 +
> lib/audit.c | 3 +
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/exec/.gitignore | 7 +
> tools/testing/selftests/exec/Makefile | 25 +++
> tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++
> 25 files changed, 542 insertions(+), 15 deletions(-)
> create mode 100644 tools/testing/selftests/exec/.gitignore
> create mode 100644 tools/testing/selftests/exec/Makefile
> create mode 100644 tools/testing/selftests/exec/execveat.c
>
> --
> 2.1.0.rc2.206.gedb03e5



--
Andy Lutomirski
AMA Capital Management, LLC

2014-11-06 18:06:11

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCHv6 1/3] syscalls,x86: implement execveat() system call

On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <[email protected]> wrote:
> Add a new execveat(2) system call. execveat() is to execve() as
> openat() is to open(): it takes a file descriptor that refers to a
> directory, and resolves the filename relative to that.
>
> In addition, if the filename is empty and AT_EMPTY_PATH is specified,
> execveat() executes the file to which the file descriptor refers. This
> replicates the functionality of fexecve(), which is a system call in
> other UNIXen, but in Linux glibc it depends on opening
> "/proc/self/fd/<fd>" (and so relies on /proc being mounted).
>
> The filename fed to the executed program as argv[0] (or the name of the
> script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
> (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
> reflecting how the executable was found. This does however mean that
> execution of a script in a /proc-less environment won't work; also,
> script execution via an O_CLOEXEC file descriptor fails (as the file
> will not be accessible after exec).
>
> Only x86-64, i386 and x32 ABIs are supported in this patch.
>
> Based on patches by Meredydd Luff <[email protected]>

This'll be quite nice for doing launches into a tight sandbox.

>
> Signed-off-by: David Drysdale <[email protected]>
> ---
> arch/x86/ia32/audit.c | 1 +
> arch/x86/ia32/ia32entry.S | 1 +
> arch/x86/kernel/audit_64.c | 1 +
> arch/x86/kernel/entry_64.S | 28 ++++++++++
> arch/x86/syscalls/syscall_32.tbl | 1 +
> arch/x86/syscalls/syscall_64.tbl | 2 +
> arch/x86/um/sys_call_table_64.c | 1 +
> fs/binfmt_em86.c | 4 ++
> fs/binfmt_misc.c | 4 ++
> fs/binfmt_script.c | 10 ++++
> fs/exec.c | 115 +++++++++++++++++++++++++++++++++-----
> fs/namei.c | 8 ++-
> include/linux/binfmts.h | 4 ++
> include/linux/compat.h | 3 +
> include/linux/fs.h | 1 +
> include/linux/namei.h | 1 +
> include/linux/sched.h | 4 ++
> include/linux/syscalls.h | 4 ++
> include/uapi/asm-generic/unistd.h | 4 +-
> kernel/sys_ni.c | 3 +
> lib/audit.c | 3 +
> 21 files changed, 188 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c
> index 5d7b381da692..2eccc8932ae6 100644
> --- a/arch/x86/ia32/audit.c
> +++ b/arch/x86/ia32/audit.c
> @@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
> case __NR_socketcall:
> return 4;
> case __NR_execve:
> + case __NR_execveat:
> return 5;
> default:
> return 1;
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 4299eb05023c..2516c09743e0 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -464,6 +464,7 @@ GLOBAL(\label)
> PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
> PTREGSCALL stub32_sigreturn, sys32_sigreturn
> PTREGSCALL stub32_execve, compat_sys_execve
> + PTREGSCALL stub32_execveat, compat_sys_execveat
> PTREGSCALL stub32_fork, sys_fork
> PTREGSCALL stub32_vfork, sys_vfork
>
> diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c
> index 06d3e5a14d9d..f3672508b249 100644
> --- a/arch/x86/kernel/audit_64.c
> +++ b/arch/x86/kernel/audit_64.c
> @@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
> case __NR_openat:
> return 3;
> case __NR_execve:
> + case __NR_execveat:
> return 5;
> default:
> return 0;
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 2fac1343a90b..00c4526e6ffe 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -665,6 +665,20 @@ ENTRY(stub_execve)
> CFI_ENDPROC
> END(stub_execve)
>
> +ENTRY(stub_execveat)
> + CFI_STARTPROC
> + addq $8, %rsp
> + PARTIAL_FRAME 0
> + SAVE_REST
> + FIXUP_TOP_OF_STACK %r11
> + call sys_execveat
> + RESTORE_TOP_OF_STACK %r11
> + movq %rax,RAX(%rsp)
> + RESTORE_REST
> + jmp int_ret_from_sys_call
> + CFI_ENDPROC
> +END(stub_execveat)
> +
> /*
> * sigreturn is special because it needs to restore all registers on return.
> * This cannot be done with SYSRET, so use the IRET return path instead.
> @@ -710,6 +724,20 @@ ENTRY(stub_x32_execve)
> CFI_ENDPROC
> END(stub_x32_execve)
>
> +ENTRY(stub_x32_execveat)
> + CFI_STARTPROC
> + addq $8, %rsp
> + PARTIAL_FRAME 0
> + SAVE_REST
> + FIXUP_TOP_OF_STACK %r11
> + call compat_sys_execveat
> + RESTORE_TOP_OF_STACK %r11
> + movq %rax,RAX(%rsp)
> + RESTORE_REST
> + jmp int_ret_from_sys_call
> + CFI_ENDPROC
> +END(stub_x32_execveat)
> +
> #endif
>
> /*
> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
> index 028b78168d85..2633e3195455 100644
> --- a/arch/x86/syscalls/syscall_32.tbl
> +++ b/arch/x86/syscalls/syscall_32.tbl
> @@ -363,3 +363,4 @@
> 354 i386 seccomp sys_seccomp
> 355 i386 getrandom sys_getrandom
> 356 i386 memfd_create sys_memfd_create
> +357 i386 execveat sys_execveat stub32_execveat
> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
> index 35dd922727b9..1af5badd159c 100644
> --- a/arch/x86/syscalls/syscall_64.tbl
> +++ b/arch/x86/syscalls/syscall_64.tbl
> @@ -327,6 +327,7 @@
> 318 common getrandom sys_getrandom
> 319 common memfd_create sys_memfd_create
> 320 common kexec_file_load sys_kexec_file_load
> +321 64 execveat stub_execveat
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> @@ -365,3 +366,4 @@
> 542 x32 getsockopt compat_sys_getsockopt
> 543 x32 io_setup compat_sys_io_setup
> 544 x32 io_submit compat_sys_io_submit
> +545 x32 execveat stub_x32_execveat
> diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
> index f2f0723070ca..20c3649d0691 100644
> --- a/arch/x86/um/sys_call_table_64.c
> +++ b/arch/x86/um/sys_call_table_64.c
> @@ -31,6 +31,7 @@
> #define stub_fork sys_fork
> #define stub_vfork sys_vfork
> #define stub_execve sys_execve
> +#define stub_execveat sys_execveat
> #define stub_rt_sigreturn sys_rt_sigreturn
>
> #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
> diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
> index f37b08cea1f7..490538536cb4 100644
> --- a/fs/binfmt_em86.c
> +++ b/fs/binfmt_em86.c
> @@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm)
> return -ENOEXEC;
> }
>
> + /* Need to be able to load the file after exec */
> + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
> + return -ENOENT;
> +
> allow_write_access(bprm->file);
> fput(bprm->file);
> bprm->file = NULL;
> diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
> index b60500300dd7..e659f5562356 100644
> --- a/fs/binfmt_misc.c
> +++ b/fs/binfmt_misc.c
> @@ -127,6 +127,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
> if (!fmt)
> goto _ret;
>
> + /* Need to be able to load the file after exec */
> + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
> + return -ENOENT;
> +
> if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
> retval = remove_arg_zero(bprm);
> if (retval)
> diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
> index 5027a3e14922..afdf4e3cafc2 100644
> --- a/fs/binfmt_script.c
> +++ b/fs/binfmt_script.c
> @@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm)
>
> if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
> return -ENOEXEC;
> +
> + /*
> + * If the script filename will be inaccessible after exec, typically
> + * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
> + * up now (on the assumption that the interpreter will want to load
> + * this file).
> + */
> + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
> + return -ENOENT;
> +
> /*
> * This section does the #! interpretation.
> * Sorta complicated, but hopefully it will work. -TYT
> diff --git a/fs/exec.c b/fs/exec.c
> index a2b42a98c743..800d232c17bb 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -747,18 +747,26 @@ EXPORT_SYMBOL(setup_arg_pages);
>
> #endif /* CONFIG_MMU */
>
> -static struct file *do_open_exec(struct filename *name)
> +static struct file *do_open_execat(int fd, struct filename *name, int flags)
> {
> struct file *file;
> int err;
> - static const struct open_flags open_exec_flags = {
> + struct open_flags open_exec_flags = {
> .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
> .acc_mode = MAY_EXEC | MAY_OPEN,
> .intent = LOOKUP_OPEN,
> .lookup_flags = LOOKUP_FOLLOW,
> };
>
> - file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
> + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> + return ERR_PTR(-EINVAL);
> + if (flags & AT_SYMLINK_NOFOLLOW)
> + open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> + if (flags & AT_EMPTY_PATH)
> + open_exec_flags.lookup_flags |= (LOOKUP_EMPTY |
> + LOOKUP_EMPTY_NOPATH);
> +
> + file = do_filp_open(fd, name, &open_exec_flags);
> if (IS_ERR(file))
> goto out;
>
> @@ -769,12 +777,13 @@ static struct file *do_open_exec(struct filename *name)
> if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> goto exit;
>
> - fsnotify_open(file);
> -
> err = deny_write_access(file);
> if (err)
> goto exit;
>
> + if (name->name[0] != '\0')
> + fsnotify_open(file);
> +
> out:
> return file;
>
> @@ -786,7 +795,7 @@ exit:
> struct file *open_exec(const char *name)
> {
> struct filename tmp = { .name = name };
> - return do_open_exec(&tmp);
> + return do_open_execat(AT_FDCWD, &tmp, 0);
> }
> EXPORT_SYMBOL(open_exec);
>
> @@ -1422,10 +1431,12 @@ static int exec_binprm(struct linux_binprm *bprm)
> /*
> * sys_execve() executes a new program.
> */
> -static int do_execve_common(struct filename *filename,
> - struct user_arg_ptr argv,
> - struct user_arg_ptr envp)
> +static int do_execveat_common(int fd, struct filename *filename,
> + struct user_arg_ptr argv,
> + struct user_arg_ptr envp,
> + int flags)
> {
> + char *pathbuf = NULL;
> struct linux_binprm *bprm;
> struct file *file;
> struct files_struct *displaced;
> @@ -1466,7 +1477,7 @@ static int do_execve_common(struct filename *filename,
> check_unsafe_exec(bprm);
> current->in_execve = 1;
>
> - file = do_open_exec(filename);
> + file = do_open_execat(fd, filename, flags);
> retval = PTR_ERR(file);
> if (IS_ERR(file))
> goto out_unmark;
> @@ -1474,7 +1485,30 @@ static int do_execve_common(struct filename *filename,
> sched_exec();
>
> bprm->file = file;
> - bprm->filename = bprm->interp = filename->name;
> + if (fd == AT_FDCWD || filename->name[0] == '/') {
> + bprm->filename = filename->name;
> + } else {
> + /* "/dev/fd/2147483647/" + filename->name */
> + int maxlen = 19 + strlen(filename->name);

This should be 20 + strlen (to include the trailing NULL). However, I
think this whole bit of code could be replaced with kasprintf...

> +
> + pathbuf = kmalloc(maxlen, GFP_TEMPORARY);
> + if (!pathbuf) {
> + retval = -ENOMEM;
> + goto out_unmark;
> + }
> + if (filename->name[0] == '\0')
> + sprintf(pathbuf, "/dev/fd/%d", fd);
> + else
> + snprintf(pathbuf, maxlen,
> + "/dev/fd/%d/%s", fd, filename->name);

Maybe something like this? A bit messy, so maybe your original if/else
would be more readable.

} else {
pathbuf = kasprintf("/dev/fd/%d%s%s", fd,
filename->name[0] == '\0' ? "" : "/",
filename->name[0] == '\0' ? ""
: filename->name);
if (!pathbuf) {
retval = -ENOMEM;
goto out_unmark;
}
}

> + /* Record that a name derived from an O_CLOEXEC fd will be
> + * inaccessible after exec. Relies on having exclusive access to
> + * current->files (due to unshare_files above). */
> + if (close_on_exec(fd, current->files->fdt))
> + bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
> + bprm->filename = pathbuf;
> + }
> + bprm->interp = bprm->filename;
>
> retval = bprm_mm_init(bprm);
> if (retval)
> @@ -1532,6 +1566,7 @@ out_unmark:
>
> out_free:
> free_bprm(bprm);
> + kfree(pathbuf);
>
> out_files:
> if (displaced)
> @@ -1547,7 +1582,18 @@ int do_execve(struct filename *filename,
> {
> struct user_arg_ptr argv = { .ptr.native = __argv };
> struct user_arg_ptr envp = { .ptr.native = __envp };
> - return do_execve_common(filename, argv, envp);
> + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
> +}
> +
> +int do_execveat(int fd, struct filename *filename,
> + const char __user *const __user *__argv,
> + const char __user *const __user *__envp,
> + int flags)
> +{
> + struct user_arg_ptr argv = { .ptr.native = __argv };
> + struct user_arg_ptr envp = { .ptr.native = __envp };
> +
> + return do_execveat_common(fd, filename, argv, envp, flags);
> }
>
> #ifdef CONFIG_COMPAT
> @@ -1563,7 +1609,23 @@ static int compat_do_execve(struct filename *filename,
> .is_compat = true,
> .ptr.compat = __envp,
> };
> - return do_execve_common(filename, argv, envp);
> + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
> +}
> +
> +static int compat_do_execveat(int fd, struct filename *filename,
> + const compat_uptr_t __user *__argv,
> + const compat_uptr_t __user *__envp,
> + int flags)
> +{
> + struct user_arg_ptr argv = {
> + .is_compat = true,
> + .ptr.compat = __argv,
> + };
> + struct user_arg_ptr envp = {
> + .is_compat = true,
> + .ptr.compat = __envp,
> + };
> + return do_execveat_common(fd, filename, argv, envp, flags);
> }
> #endif
>
> @@ -1603,6 +1665,20 @@ SYSCALL_DEFINE3(execve,
> {
> return do_execve(getname(filename), argv, envp);
> }
> +
> +SYSCALL_DEFINE5(execveat,
> + int, fd, const char __user *, filename,
> + const char __user *const __user *, argv,
> + const char __user *const __user *, envp,
> + int, flags)
> +{
> + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
> +
> + return do_execveat(fd,
> + getname_flags(filename, lookup_flags, NULL),
> + argv, envp, flags);
> +}
> +
> #ifdef CONFIG_COMPAT
> COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
> const compat_uptr_t __user *, argv,
> @@ -1610,4 +1686,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
> {
> return compat_do_execve(getname(filename), argv, envp);
> }
> +
> +COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
> + const char __user *, filename,
> + const compat_uptr_t __user *, argv,
> + const compat_uptr_t __user *, envp,
> + int, flags)
> +{
> + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
> +
> + return compat_do_execveat(fd,
> + getname_flags(filename, lookup_flags, NULL),
> + argv, envp, flags);
> +}
> #endif
> diff --git a/fs/namei.c b/fs/namei.c
> index a7b05bf82d31..757df6777ae5 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -130,7 +130,7 @@ void final_putname(struct filename *name)
>
> #define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename))
>
> -static struct filename *
> +struct filename *
> getname_flags(const char __user *filename, int flags, int *empty)
> {
> struct filename *result, *err;
> @@ -1891,6 +1891,12 @@ static int path_init(int dfd, const char *name, unsigned int flags,
> fdput(f);
> return -ENOTDIR;
> }
> + } else if (flags & LOOKUP_EMPTY_NOPATH) {
> + /* When using the fd alone, disallow O_PATH files */
> + if (f.file->f_mode & FMODE_PATH) {
> + fdput(f);
> + return -EBADF;
> + }
> }
>
> nd->path = f.file->f_path;
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 61f29e5ea840..576e4639ca60 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -53,6 +53,10 @@ struct linux_binprm {
> #define BINPRM_FLAGS_EXECFD_BIT 1
> #define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)
>
> +/* filename of the binary will be inaccessible after exec */
> +#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
> +#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)
> +
> /* Function parameter for binfmt->coredump */
> struct coredump_params {
> const siginfo_t *siginfo;
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index e6494261eaff..7450ca2ac1fc 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
>
> asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
> const compat_uptr_t __user *envp);
> +asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
> + const compat_uptr_t __user *argv,
> + const compat_uptr_t __user *envp, int flags);
>
> asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
> compat_ulong_t __user *outp, compat_ulong_t __user *exp,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 94187721ad41..e9818574d738 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
> extern struct file * dentry_open(const struct path *, int, const struct cred *);
> extern int filp_close(struct file *, fl_owner_t id);
>
> +extern struct filename *getname_flags(const char __user *, int, int *);
> extern struct filename *getname(const char __user *);
> extern struct filename *getname_kernel(const char *);
>
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 492de72560fa..eaa25cc72213 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -55,6 +55,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
> #define LOOKUP_JUMPED 0x1000
> #define LOOKUP_ROOT 0x2000
> #define LOOKUP_EMPTY 0x4000
> +#define LOOKUP_EMPTY_NOPATH 0x8000
>
> extern int user_path_at(int, const char __user *, unsigned, struct path *);
> extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b867a4dab38a..33e056da7d33 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2430,6 +2430,10 @@ extern void do_group_exit(int);
> extern int do_execve(struct filename *,
> const char __user * const __user *,
> const char __user * const __user *);
> +extern int do_execveat(int, struct filename *,
> + const char __user * const __user *,
> + const char __user * const __user *,
> + int);
> extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
> struct task_struct *fork_idle(int);
> extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 0f86d85a9ce4..df5422294deb 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
> asmlinkage long sys_getrandom(char __user *buf, size_t count,
> unsigned int flags);
>
> +asmlinkage long sys_execveat(int dfd, const char __user *filename,
> + const char __user *const __user *argv,
> + const char __user *const __user *envp, int flags);
> +
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 11d11bc5c78f..feef07d29663 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
> __SYSCALL(__NR_getrandom, sys_getrandom)
> #define __NR_memfd_create 279
> __SYSCALL(__NR_memfd_create, sys_memfd_create)
> +#define __NR_execveat 280
> +__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 280
> +#define __NR_syscalls 281
>
> /*
> * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 391d4ddb6f4b..efb06058ad3e 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -218,3 +218,6 @@ cond_syscall(sys_kcmp);
>
> /* operate on Secure Computing state */
> cond_syscall(sys_seccomp);
> +
> +/* execveat */
> +cond_syscall(sys_execveat);
> diff --git a/lib/audit.c b/lib/audit.c
> index 1d726a22565b..b8fb5ee81e26 100644
> --- a/lib/audit.c
> +++ b/lib/audit.c
> @@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall)
> case __NR_socketcall:
> return 4;
> #endif
> +#ifdef __NR_execveat
> + case __NR_execveat:
> +#endif
> case __NR_execve:
> return 5;
> default:
> --
> 2.1.0.rc2.206.gedb03e5
>

Thanks for working on this!

-Kees

--
Kees Cook
Chrome OS Security

2014-11-06 18:07:53

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCHv6 2/3] syscalls,x86: add selftest for execveat(2)

On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <[email protected]> wrote:
> Signed-off-by: David Drysdale <[email protected]>
> ---
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/exec/.gitignore | 7 +
> tools/testing/selftests/exec/Makefile | 25 +++
> tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++
> 4 files changed, 354 insertions(+)
> create mode 100644 tools/testing/selftests/exec/.gitignore
> create mode 100644 tools/testing/selftests/exec/Makefile
> create mode 100644 tools/testing/selftests/exec/execveat.c
>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index 36ff2e4c7b6f..210cf68b3511 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -14,6 +14,7 @@ TARGETS += powerpc
> TARGETS += user
> TARGETS += sysctl
> TARGETS += firmware
> +TARGETS += exec
>
> TARGETS_HOTPLUG = cpu-hotplug
> TARGETS_HOTPLUG += memory-hotplug
> diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
> new file mode 100644
> index 000000000000..778147d01af9
> --- /dev/null
> +++ b/tools/testing/selftests/exec/.gitignore
> @@ -0,0 +1,7 @@
> +subdir*
> +script*
> +execveat
> +execveat.symlink
> +execveat.moved
> +execveat.ephemeral
> +execveat.denatured
> \ No newline at end of file
> diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
> new file mode 100644
> index 000000000000..c97e0aaea02d
> --- /dev/null
> +++ b/tools/testing/selftests/exec/Makefile
> @@ -0,0 +1,25 @@
> +CC = $(CROSS_COMPILE)gcc
> +CFLAGS = -Wall
> +BINARIES = execveat
> +DEPS = execveat.symlink execveat.denatured script subdir
> +all: $(BINARIES) $(DEPS)
> +
> +subdir:
> + mkdir -p $@
> +script:
> + echo '#!/bin/sh' > $@
> + echo 'exit $$*' >> $@
> + chmod +x $@
> +execveat.symlink: execveat
> + ln -s -f $< $@
> +execveat.denatured: execveat
> + cp $< $@
> + chmod -x $@
> +%: %.c
> + $(CC) $(CFLAGS) -o $@ $^
> +
> +run_tests: all
> + ./execveat
> +
> +clean:
> + rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved
> diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c
> new file mode 100644
> index 000000000000..f6ea48176393
> --- /dev/null
> +++ b/tools/testing/selftests/exec/execveat.c
> @@ -0,0 +1,321 @@
> +/*
> + * Copyright (c) 2014 Google, Inc.
> + *
> + * Licensed under the terms of the GNU GPL License version 2
> + *
> + * Selftests for execveat(2).
> + */
> +
> +#define _GNU_SOURCE /* to get O_PATH, AT_EMPTY_PATH */
> +#include <sys/sendfile.h>
> +#include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +static char *envp[] = { "IN_TEST=yes", NULL, NULL };
> +static char *argv[] = { "execveat", "99", NULL };
> +
> +static int execveat_(int fd, const char *path, char **argv, char **envp,
> + int flags)
> +{
> +#ifdef __NR_execveat
> + return syscall(__NR_execveat, fd, path, argv, envp, flags);
> +#else
> + errno = -ENOSYS;
> + return -1;
> +#endif
> +}
> +
> +#define check_execveat_fail(fd, path, flags, errno) \
> + _check_execveat_fail(fd, path, flags, errno, #errno)
> +static int _check_execveat_fail(int fd, const char *path, int flags,
> + int expected_errno, const char *errno_str)
> +{
> + int rc;
> +
> + errno = 0;
> + printf("Check failure of execveat(%d, '%s', %d) with %s... ",
> + fd, path?:"(null)", flags, errno_str);
> + rc = execveat_(fd, path, argv, envp, flags);
> +
> + if (rc > 0) {
> + printf("[FAIL] (unexpected success from execveat(2))\n");
> + return 1;
> + }
> + if (errno != expected_errno) {
> + printf("[FAIL] (expected errno %d (%s) not %d (%s)\n",
> + expected_errno, strerror(expected_errno),
> + errno, strerror(errno));
> + return 1;
> + }
> + printf("[OK]\n");
> + return 0;
> +}
> +
> +static int check_execveat_invoked_rc(int fd, const char *path, int flags,
> + int expected_rc)
> +{
> + int status;
> + int rc;
> + pid_t child;
> +
> + printf("Check success of execveat(%d, '%s', %d)... ",
> + fd, path?:"(null)", flags);
> + child = fork();
> + if (child < 0) {
> + printf("[FAIL] (fork() failed)\n");
> + return 1;
> + }
> + if (child == 0) {
> + /* Child: do execveat(). */
> + rc = execveat_(fd, path, argv, envp, flags);
> + printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n",
> + rc, errno, strerror(errno));
> + exit(1); /* should not reach here */
> + }
> + /* Parent: wait for & check child's exit status. */
> + rc = waitpid(child, &status, 0);
> + if (rc != child) {
> + printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc);
> + return 1;
> + }
> + if (!WIFEXITED(status)) {
> + printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n",
> + child, status);
> + return 1;
> + }
> + if (WEXITSTATUS(status) != expected_rc) {
> + printf("[FAIL] (child %d exited with %d not %d)\n",
> + child, WEXITSTATUS(status), expected_rc);
> + return 1;
> + }
> + printf("[OK]\n");
> + return 0;
> +}
> +
> +static int check_execveat(int fd, const char *path, int flags)
> +{
> + return check_execveat_invoked_rc(fd, path, flags, 99);
> +}
> +
> +static char *concat(const char *left, const char *right)
> +{
> + char *result = malloc(strlen(left) + strlen(right) + 1);
> +
> + strcpy(result, left);
> + strcat(result, right);
> + return result;
> +}
> +
> +static int open_or_die(const char *filename, int flags)
> +{
> + int fd = open(filename, flags);
> +
> + if (fd < 0) {
> + printf("Failed to open '%s'; "
> + "check prerequisites are available\n", filename);
> + exit(1);
> + }
> + return fd;
> +}
> +
> +static int run_tests(void)
> +{
> + int fail = 0;
> + char *fullname = realpath("execveat", NULL);
> + char *fullname_script = realpath("script", NULL);
> + char *fullname_symlink = concat(fullname, ".symlink");
> + int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY);
> + int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral",
> + O_DIRECTORY|O_RDONLY);
> + int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY);
> + int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH);
> + int dot_dfd_cloexec = open_or_die(".", O_DIRECTORY|O_RDONLY|O_CLOEXEC);
> + int fd = open_or_die("execveat", O_RDONLY);
> + int fd_path = open_or_die("execveat", O_RDONLY|O_PATH);
> + int fd_symlink = open_or_die("execveat.symlink", O_RDONLY);
> + int fd_denatured = open_or_die("execveat.denatured", O_RDONLY);
> + int fd_script = open_or_die("script", O_RDONLY);
> + int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY);
> + int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY);
> + int fd_cloexec = open_or_die("execveat", O_RDONLY|O_CLOEXEC);
> + int fd_script_cloexec = open_or_die("script", O_RDONLY|O_CLOEXEC);
> +
> + /* Change file position to confirm it doesn't affect anything */
> + lseek(fd, 10, SEEK_SET);
> +
> + /* Normal executable file: */
> + /* dfd + path */
> + fail += check_execveat(subdir_dfd, "../execveat", 0);
> + fail += check_execveat(dot_dfd, "execveat", 0);
> + fail += check_execveat(dot_dfd_path, "execveat", 0);
> + /* absolute path */
> + fail += check_execveat(AT_FDCWD, fullname, 0);
> + /* absolute path with nonsense dfd */
> + fail += check_execveat(99, fullname, 0);
> + /* fd + no path */
> + fail += check_execveat(fd, "", AT_EMPTY_PATH);
> + /* O_CLOEXEC fd + no path */
> + fail += check_execveat(fd_cloexec, "", AT_EMPTY_PATH);
> +
> + /* Mess with executable file that's already open: */
> + /* fd + no path to a file that's been renamed */
> + rename("execveat.ephemeral", "execveat.moved");
> + fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
> + /* fd + no path to a file that's been deleted */
> + unlink("execveat.moved"); /* remove the file now fd open */
> + fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
> +
> + /* Invalid argument failures */
> + fail += check_execveat_fail(fd, "", 0, ENOENT);
> + fail += check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT);
> +
> + /* Symlink to executable file: */
> + /* dfd + path */
> + fail += check_execveat(dot_dfd, "execveat.symlink", 0);
> + fail += check_execveat(dot_dfd_path, "execveat.symlink", 0);
> + /* absolute path */
> + fail += check_execveat(AT_FDCWD, fullname_symlink, 0);
> + /* fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */
> + fail += check_execveat(fd_symlink, "", AT_EMPTY_PATH);
> + fail += check_execveat(fd_symlink, "",
> + AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
> +
> + /* Symlink fails when AT_SYMLINK_NOFOLLOW set: */
> + /* dfd + path */
> + fail += check_execveat_fail(dot_dfd, "execveat.symlink",
> + AT_SYMLINK_NOFOLLOW, ELOOP);
> + fail += check_execveat_fail(dot_dfd_path, "execveat.symlink",
> + AT_SYMLINK_NOFOLLOW, ELOOP);
> + /* absolute path */
> + fail += check_execveat_fail(AT_FDCWD, fullname_symlink,
> + AT_SYMLINK_NOFOLLOW, ELOOP);
> +
> + /* Shell script wrapping executable file: */
> + /* dfd + path */
> + fail += check_execveat(subdir_dfd, "../script", 0);
> + fail += check_execveat(dot_dfd, "script", 0);
> + fail += check_execveat(dot_dfd_path, "script", 0);
> + /* absolute path */
> + fail += check_execveat(AT_FDCWD, fullname_script, 0);
> + /* fd + no path */
> + fail += check_execveat(fd_script, "", AT_EMPTY_PATH);
> + fail += check_execveat(fd_script, "",
> + AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
> + /* O_CLOEXEC fd fails for a script (as script file inaccessible) */
> + fail += check_execveat_fail(fd_script_cloexec, "", AT_EMPTY_PATH,
> + ENOENT);
> + fail += check_execveat_fail(dot_dfd_cloexec, "script", 0, ENOENT);
> +
> + /* Mess with script file that's already open: */
> + /* fd + no path to a file that's been renamed */
> + rename("script.ephemeral", "script.moved");
> + fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
> + /* fd + no path to a file that's been deleted */
> + unlink("script.moved"); /* remove the file while fd open */
> + fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
> +
> + /* Rename a subdirectory in the path: */
> + rename("subdir.ephemeral", "subdir.moved");
> + fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
> + fail += check_execveat(subdir_dfd_ephemeral, "script", 0);
> + /* Remove the subdir and its contents */
> + unlink("subdir.moved/script");
> + unlink("subdir.moved");
> + /* Shell loads via deleted subdir OK because name starts with .. */
> + fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
> + fail += check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT);
> +
> + /* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */
> + fail += check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL);
> + /* Invalid path => ENOENT */
> + fail += check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT);
> + fail += check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT);
> + fail += check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT);
> + /* Attempt to execute directory => EACCES */
> + fail += check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES);
> + /* Attempt to execute non-executable => EACCES */
> + fail += check_execveat_fail(dot_dfd, "Makefile", 0, EACCES);
> + fail += check_execveat_fail(fd_denatured, "", AT_EMPTY_PATH, EACCES);
> + /* Attempt to execute file opened with O_PATH => EBADF */
> + fail += check_execveat_fail(fd_path, "", AT_EMPTY_PATH, EBADF);
> + /* Attempt to execute nonsense FD => EBADF */
> + fail += check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF);
> + fail += check_execveat_fail(99, "execveat", 0, EBADF);
> + /* Attempt to execute relative to non-directory => ENOTDIR */
> + fail += check_execveat_fail(fd, "execveat", 0, ENOTDIR);
> +

I'd add some tests that check PATH_MAX with the /dev/fd/n/filename
off-by-one I noticed. That could catch any regressions there.

-Kees

> + return fail;
> +}
> +
> +static void exe_cp(const char *src, const char *dest)
> +{
> + int in_fd = open_or_die(src, O_RDONLY);
> + int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755);
> + struct stat info;
> +
> + fstat(in_fd, &info);
> + sendfile(out_fd, in_fd, NULL, info.st_size);
> + close(in_fd);
> + close(out_fd);
> +}
> +
> +static void prerequisites(void)
> +{
> + int fd;
> + const char *script = "#!/bin/sh\nexit $*\n";
> +
> + /* Create ephemeral copies of files */
> + exe_cp("execveat", "execveat.ephemeral");
> + exe_cp("script", "script.ephemeral");
> + mkdir("subdir.ephemeral", 0755);
> +
> + fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755);
> + write(fd, script, strlen(script));
> + close(fd);
> +}
> +
> +int main(int argc, char **argv)
> +{
> + int ii;
> + int rc;
> + const char *verbose = getenv("VERBOSE");
> +
> + if (argc >= 2) {
> + /* If we are invoked with an argument, don't run tests. */
> + const char *in_test = getenv("IN_TEST");
> +
> + if (verbose) {
> + printf(" invoked with:");
> + for (ii = 0; ii < argc; ii++)
> + printf(" [%d]='%s'", ii, argv[ii]);
> + printf("\n");
> + }
> +
> + /* Check expected environment transferred. */
> + if (!in_test || strcmp(in_test, "yes") != 0) {
> + printf("[FAIL] (no IN_TEST=yes in env)\n");
> + return 1;
> + }
> +
> + /* Use the final argument as an exit code. */
> + rc = atoi(argv[argc - 1]);
> + fflush(stdout);
> + } else {
> + prerequisites();
> + if (verbose)
> + envp[1] = "VERBOSE=1";
> + rc = run_tests();
> + if (rc > 0)
> + printf("%d tests failed\n", rc);
> + }
> + return rc;
> +}
> --
> 2.1.0.rc2.206.gedb03e5
>



--
Kees Cook
Chrome OS Security

2014-11-07 13:20:19

by David Drysdale

[permalink] [raw]
Subject: Re: [PATCHv6 1/3] syscalls,x86: implement execveat() system call

On Thu, Nov 6, 2014 at 6:06 PM, Kees Cook <[email protected]> wrote:
> On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <[email protected]> wrote:
>> diff --git a/fs/exec.c b/fs/exec.c
>> index a2b42a98c743..800d232c17bb 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1422,10 +1431,12 @@ static int exec_binprm(struct linux_binprm *bprm)
>> /*
>> * sys_execve() executes a new program.
>> */
>> -static int do_execve_common(struct filename *filename,
>> - struct user_arg_ptr argv,
>> - struct user_arg_ptr envp)
>> +static int do_execveat_common(int fd, struct filename *filename,
>> + struct user_arg_ptr argv,
>> + struct user_arg_ptr envp,
>> + int flags)
>> {
>> + char *pathbuf = NULL;
>> struct linux_binprm *bprm;
>> struct file *file;
>> struct files_struct *displaced;
>> @@ -1466,7 +1477,7 @@ static int do_execve_common(struct filename *filename,
>> check_unsafe_exec(bprm);
>> current->in_execve = 1;
>>
>> - file = do_open_exec(filename);
>> + file = do_open_execat(fd, filename, flags);
>> retval = PTR_ERR(file);
>> if (IS_ERR(file))
>> goto out_unmark;
>> @@ -1474,7 +1485,30 @@ static int do_execve_common(struct filename *filename,
>> sched_exec();
>>
>> bprm->file = file;
>> - bprm->filename = bprm->interp = filename->name;
>> + if (fd == AT_FDCWD || filename->name[0] == '/') {
>> + bprm->filename = filename->name;
>> + } else {
>> + /* "/dev/fd/2147483647/" + filename->name */
>> + int maxlen = 19 + strlen(filename->name);
>
> This should be 20 + strlen (to include the trailing NULL). However, I
> think this whole bit of code could be replaced with kasprintf...
>
>> +
>> + pathbuf = kmalloc(maxlen, GFP_TEMPORARY);
>> + if (!pathbuf) {
>> + retval = -ENOMEM;
>> + goto out_unmark;
>> + }
>> + if (filename->name[0] == '\0')
>> + sprintf(pathbuf, "/dev/fd/%d", fd);
>> + else
>> + snprintf(pathbuf, maxlen,
>> + "/dev/fd/%d/%s", fd, filename->name);
>
> Maybe something like this? A bit messy, so maybe your original if/else
> would be more readable.
>
> } else {
> pathbuf = kasprintf("/dev/fd/%d%s%s", fd,
> filename->name[0] == '\0' ? "" : "/",
> filename->name[0] == '\0' ? ""
> : filename->name);
> if (!pathbuf) {
> retval = -ENOMEM;
> goto out_unmark;
> }
> }

Ah, I didn't know about kasprintf(), that will make things much
easier -- thanks!

2014-11-07 13:21:21

by David Drysdale

[permalink] [raw]
Subject: Re: [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call

On Thu, Nov 6, 2014 at 4:55 PM, Andy Lutomirski <[email protected]> wrote:
> On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <[email protected]> wrote:
>> Here's another pass at this. Some things to discuss in particular:
>>
>> 1) The current approach for interpreted execs (i.e. mostly "#!" scripts)
>> gives them an argv[1] filename like "/dev/fd/<fd>/<path>". This
>> means that script execution in a /proc-less system isn't going to
>> work, at least until interpreters get smart enough to spot and
>> special-case the leading "/dev/fd/<fd>", or until there's something
>> to use in place of /dev/fd -> /proc/self/fd (e.g. Al's dupfs
>> suggestion, https://lkml.org/lkml/2014/10/19/141).
>>
>> So is an execveat(2) that (currently) only works for non-interpreted
>> programs still useful?
>
> I think it is. I would make sure to return a distinguishable error
> code in the event that the failure happens because of one of the
> unsupported cases.
>
>>
>> 2) I don't like having to add a new LOOKUP_EMPTY_NOPATH flag
>> just to prevent O_PATH fds from being fexecve()ed -- alternative
>> suggestions welcomed. (More generally, I don't have a great
>> feel for what O_PATH is for; how bad would it be to just allow
>> them to be fexecve()ed?)
>
> If you fexecve an O_PATH fd, does it at least check that you have
> execute permission on the inode? If so, it seems okay to allow it.

Yes, the same checks will happen for an O_PATH fd as for a normal fd.
I'll add an explicit test case for that too.

> --Andy
>
>>
>> .........
>>
>> This patch set adds execveat(2) for x86, and is derived from Meredydd
>> Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
>>
>> The primary aim of adding an execveat syscall is to allow an
>> implementation of fexecve(3) that does not rely on the /proc
>> filesystem, at least for executables (rather than scripts). The
>> current glibc version of fexecve(3) is implemented via /proc, which
>> causes problems in sandboxed or otherwise restricted environments.
>>
>> Given the desire for a /proc-free fexecve() implementation, HPA
>> suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2)
>> syscall would be an appropriate generalization.
>>
>> Also, having a new syscall means that it can take a flags argument
>> without back-compatibility concerns. The current implementation just
>> defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other
>> flags could be added in future -- for example, flags for new namespaces
>> (as suggested at https://lkml.org/lkml/2006/7/11/474).
>>
>> Related history:
>> - https://lkml.org/lkml/2006/12/27/123 is an example of someone
>> realizing that fexecve() is likely to fail in a chroot environment.
>> - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
>> documenting the /proc requirement of fexecve(3) in its manpage, to
>> "prevent other people from wasting their time".
>> - https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that
>> it's not possible to fexecve() a file descriptor for a script with
>> close-on-exec set (which is possible with the implementation here).
>
> Confused. How does it work for a close-on-exec script? I understand
> how it works for a close-on-exec ELF binary.

My bad, it doesn't -- I forgot to update that part of the commit
description, it's left over from the version that used d_path().

> --Andy
>
>> - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
>> problem where a process that did setuid() could not fexecve()
>> because it no longer had access to /proc/self/fd; this has since
>> been fixed.
>>
>>
>> Changes since v5:
>> - Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts
>> that invoke an interpreter fail the exec (as they will not be able
>> to access the invoked file). [Andy Lutomirski]
>> - Don't truncate long paths. [Andy Lutomirski]
>> - Commonize code to open the executed file. [Eric W. Biederman]
>> - Mark O_PATH file descriptors so they cannot be fexecve()ed.
>> - Make self-test more helpful, and add additional cases:
>> - file offset non-zero
>> - binary file without execute bit
>> - O_CLOEXEC fds
>>
>> Changes since v4, suggested by Eric W. Biederman:
>> - Use empty filename with AT_EMPTY_PATH flag rather than NULL
>> pathname to request fexecve-like behaviour.
>> - Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>")
>> rather than using d_path().
>> - Patch against v3.17 (bfe01a5ba249)
>>
>> Changes since Meredydd's v3 patch:
>> - Added a selftest.
>> - Added a man page.
>> - Left open_exec() signature untouched to reduce patch impact
>> elsewhere (as suggested by Al Viro).
>> - Filled in bprm->filename with d_path() into a buffer, to avoid use
>> of potentially-ephemeral dentry->d_name.
>> - Patch against v3.14 (455c6fdbd21916).
>>
>>
>> David Drysdale (2):
>> syscalls,x86: implement execveat() system call
>> syscalls,x86: add selftest for execveat(2)
>>
>> arch/x86/ia32/audit.c | 1 +
>> arch/x86/ia32/ia32entry.S | 1 +
>> arch/x86/kernel/audit_64.c | 1 +
>> arch/x86/kernel/entry_64.S | 28 +++
>> arch/x86/syscalls/syscall_32.tbl | 1 +
>> arch/x86/syscalls/syscall_64.tbl | 2 +
>> arch/x86/um/sys_call_table_64.c | 1 +
>> fs/binfmt_em86.c | 4 +
>> fs/binfmt_misc.c | 4 +
>> fs/binfmt_script.c | 10 +
>> fs/exec.c | 115 ++++++++++--
>> fs/namei.c | 8 +-
>> include/linux/binfmts.h | 4 +
>> include/linux/compat.h | 3 +
>> include/linux/fs.h | 1 +
>> include/linux/namei.h | 1 +
>> include/linux/sched.h | 4 +
>> include/linux/syscalls.h | 4 +
>> include/uapi/asm-generic/unistd.h | 4 +-
>> kernel/sys_ni.c | 3 +
>> lib/audit.c | 3 +
>> tools/testing/selftests/Makefile | 1 +
>> tools/testing/selftests/exec/.gitignore | 7 +
>> tools/testing/selftests/exec/Makefile | 25 +++
>> tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++
>> 25 files changed, 542 insertions(+), 15 deletions(-)
>> create mode 100644 tools/testing/selftests/exec/.gitignore
>> create mode 100644 tools/testing/selftests/exec/Makefile
>> create mode 100644 tools/testing/selftests/exec/execveat.c
>>
>> --
>> 2.1.0.rc2.206.gedb03e5
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC