This patch set adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc
filesystem. The current glibc version of fexecve(3) is implemented
via /proc, which causes problems in sandboxed or otherwise restricted
environments.
Given the desire for a /proc-free fexecve() implementation, HPA
suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2)
syscall would be an appropriate generalization.
Also, having a new syscall means that it can take a flags argument
without back-compatibility concerns. The current implementation just
defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other
flags could be added in future -- for example, flags for new namespaces
(as suggested at https://lkml.org/lkml/2006/7/11/474).
Related history:
- https://lkml.org/lkml/2006/12/27/123 is an example of someone
realizing that fexecve() is likely to fail in a chroot environment.
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
documenting the /proc requirement of fexecve(3) in its manpage, to
"prevent other people from wasting their time".
- https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that
it's not possible to fexecve() a file descriptor for a script with
close-on-exec set (which is possible with the implementation here).
- https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
problem where a process that did setuid() could not fexecve()
because it no longer had access to /proc/self/fd; this has since
been fixed.
Changes since v4, suggested by Eric W. Biederman:
- Use empty filename with AT_EMPTY_PATH flag rather than NULL
pathname to request fexecve-like behaviour.
- Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>")
rather than using d_path().
- Patch against v3.17 (bfe01a5ba249)
Changes since Meredydd's v3 patch:
- Added a selftest.
- Added a man page.
- Left open_exec() signature untouched to reduce patch impact
elsewhere (as suggested by Al Viro).
- Filled in bprm->filename with d_path() into a buffer, to avoid use
of potentially-ephemeral dentry->d_name.
- Patch against v3.14 (455c6fdbd21916).
David Drysdale (2):
syscalls,x86: implement execveat() system call
syscalls,x86: add selftest for execveat(2)
arch/x86/ia32/audit.c | 1 +
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/kernel/audit_64.c | 1 +
arch/x86/kernel/entry_64.S | 28 +++
arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/syscall_64.tbl | 2 +
arch/x86/um/sys_call_table_64.c | 1 +
fs/exec.c | 130 ++++++++++++--
fs/namei.c | 2 +-
include/linux/compat.h | 3 +
include/linux/fs.h | 1 +
include/linux/sched.h | 4 +
include/linux/syscalls.h | 4 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/sys_ni.c | 3 +
lib/audit.c | 3 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/exec/.gitignore | 6 +
tools/testing/selftests/exec/Makefile | 22 +++
tools/testing/selftests/exec/execveat.c | 295 ++++++++++++++++++++++++++++++++
20 files changed, 497 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/exec/.gitignore
create mode 100644 tools/testing/selftests/exec/Makefile
create mode 100644 tools/testing/selftests/exec/execveat.c
--
2.1.0.rc2.206.gedb03e5
Signed-off-by: David Drysdale <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/exec/.gitignore | 6 +
tools/testing/selftests/exec/Makefile | 22 +++
tools/testing/selftests/exec/execveat.c | 295 ++++++++++++++++++++++++++++++++
4 files changed, 324 insertions(+)
create mode 100644 tools/testing/selftests/exec/.gitignore
create mode 100644 tools/testing/selftests/exec/Makefile
create mode 100644 tools/testing/selftests/exec/execveat.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 36ff2e4c7b6f..210cf68b3511 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += powerpc
TARGETS += user
TARGETS += sysctl
TARGETS += firmware
+TARGETS += exec
TARGETS_HOTPLUG = cpu-hotplug
TARGETS_HOTPLUG += memory-hotplug
diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
new file mode 100644
index 000000000000..349a786899a7
--- /dev/null
+++ b/tools/testing/selftests/exec/.gitignore
@@ -0,0 +1,6 @@
+subdir*
+script*
+execveat
+execveat.symlink
+execveat.moved
+execveat.ephemeral
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
new file mode 100644
index 000000000000..428cb1d2a06a
--- /dev/null
+++ b/tools/testing/selftests/exec/Makefile
@@ -0,0 +1,22 @@
+CC = $(CROSS_COMPILE)gcc
+CFLAGS = -Wall
+BINARIES = execveat
+DEPS = execveat.symlink script subdir
+all: $(BINARIES) $(DEPS)
+
+subdir:
+ mkdir -p $@
+script:
+ echo '#!/bin/sh' > $@
+ echo 'exit $$*' >> $@
+ chmod +x $@
+execveat.symlink: execveat
+ ln -s -f $< $@
+%: %.c
+ $(CC) $(CFLAGS) -o $@ $^
+
+run_tests: all
+ ./execveat
+
+clean:
+ rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved
diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c
new file mode 100644
index 000000000000..1f2722ed3847
--- /dev/null
+++ b/tools/testing/selftests/exec/execveat.c
@@ -0,0 +1,295 @@
+/*
+ * Copyright (c) 2014 Google, Inc.
+ *
+ * Licensed under the terms of the GNU GPL License version 2
+ *
+ * Selftests for execveat(2).
+ */
+
+#define _GNU_SOURCE /* to get O_PATH, AT_EMPTY_PATH */
+#include <sys/sendfile.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+static char *envp[] = { "IN_TEST=yes", NULL };
+static char *argv[] = { "execveat", "99", NULL };
+
+static int execveat_(int fd, const char *path, char **argv, char **envp,
+ int flags)
+{
+#ifdef __NR_execveat
+ return syscall(__NR_execveat, fd, path, argv, envp, flags);
+#else
+ errno = -ENOSYS;
+ return -1;
+#endif
+}
+
+#define check_execveat_fail(fd, path, flags, errno) \
+ _check_execveat_fail(fd, path, flags, errno, #errno)
+static int _check_execveat_fail(int fd, const char *path, int flags,
+ int expected_errno, const char *errno_str)
+{
+ int rc;
+
+ errno = 0;
+ printf("Check failure of execveat(%d, '%s', %d) with %s... ",
+ fd, path?:"(null)", flags, errno_str);
+ rc = execveat_(fd, path, argv, envp, flags);
+
+ if (rc > 0) {
+ printf("[FAIL] (unexpected success from execveat(2))\n");
+ return 1;
+ }
+ if (errno != expected_errno) {
+ printf("[FAIL] (expected errno %d (%s) not %d (%s)\n",
+ expected_errno, strerror(expected_errno),
+ errno, strerror(errno));
+ return 1;
+ }
+ printf("[OK]\n");
+ return 0;
+}
+
+static int check_execveat_invoked_rc(int fd, const char *path, int flags,
+ int expected_rc)
+{
+ int status;
+ int rc;
+ pid_t child;
+
+ printf("Check success of execveat(%d, '%s', %d)... ",
+ fd, path?:"(null)", flags);
+ child = fork();
+ if (child < 0) {
+ printf("[FAIL] (fork() failed)\n");
+ return 1;
+ }
+ if (child == 0) {
+ /* Child: do execveat(). */
+ rc = execveat_(fd, path, argv, envp, flags);
+ printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n",
+ rc, errno, strerror(errno));
+ exit(1); /* should not reach here */
+ }
+ /* Parent: wait for & check child's exit status. */
+ rc = waitpid(child, &status, 0);
+ if (rc != child) {
+ printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc);
+ return 1;
+ }
+ if (!WIFEXITED(status)) {
+ printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n",
+ child, status);
+ return 1;
+ }
+ if (WEXITSTATUS(status) != expected_rc) {
+ printf("[FAIL] (child %d exited with %d not %d)\n",
+ child, WEXITSTATUS(status), expected_rc);
+ return 1;
+ }
+ printf("[OK]\n");
+ return 0;
+}
+
+static int check_execveat(int fd, const char *path, int flags)
+{
+ return check_execveat_invoked_rc(fd, path, flags, 99);
+}
+
+static char *concat(const char *left, const char *right)
+{
+ char *result = malloc(strlen(left) + strlen(right) + 1);
+
+ strcpy(result, left);
+ strcat(result, right);
+ return result;
+}
+
+static int open_or_die(const char *filename, int flags)
+{
+ int fd = open(filename, flags);
+
+ if (fd < 0) {
+ printf("Failed to open '%s'; "
+ "check prerequisites are available\n", filename);
+ exit(1);
+ }
+ return fd;
+}
+
+static int run_tests(void)
+{
+ int fail = 0;
+ char *fullname = realpath("execveat", NULL);
+ char *fullname_script = realpath("script", NULL);
+ char *fullname_symlink = concat(fullname, ".symlink");
+ int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY);
+ int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral",
+ O_DIRECTORY|O_RDONLY);
+ int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY);
+ int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH);
+ int fd = open_or_die("execveat", O_RDONLY);
+ int fd_path = open_or_die("execveat", O_RDONLY|O_PATH);
+ int fd_symlink = open_or_die("execveat.symlink", O_RDONLY);
+ int fd_script = open_or_die("script", O_RDONLY);
+ int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY);
+ int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY);
+
+ /* Normal executable file: */
+ /* dfd + path */
+ fail |= check_execveat(subdir_dfd, "../execveat", 0);
+ fail |= check_execveat(dot_dfd, "execveat", 0);
+ fail |= check_execveat(dot_dfd_path, "execveat", 0);
+ /* absolute path */
+ fail |= check_execveat(AT_FDCWD, fullname, 0);
+ /* absolute path with nonsense dfd */
+ fail |= check_execveat(99, fullname, 0);
+ /* fd + no path */
+ fail |= check_execveat(fd, "", AT_EMPTY_PATH);
+
+ /* Mess with executable file that's already open: */
+ /* fd + no path to a file that's been renamed */
+ rename("execveat.ephemeral", "execveat.moved");
+ fail |= check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+ /* fd + no path to a file that's been deleted */
+ unlink("execveat.moved"); /* remove the file now fd open */
+ fail |= check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+
+ /* Invalid argument failures */
+ fail |= check_execveat_fail(fd, "", 0, ENOENT);
+ fail |= check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT);
+
+ /* Symlink to executable file: */
+ /* dfd + path */
+ fail |= check_execveat(dot_dfd, "execveat.symlink", 0);
+ fail |= check_execveat(dot_dfd_path, "execveat.symlink", 0);
+ /* absolute path */
+ fail |= check_execveat(AT_FDCWD, fullname_symlink, 0);
+ /* fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */
+ fail |= check_execveat(fd_symlink, "", AT_EMPTY_PATH);
+ fail |= check_execveat(fd_symlink, "",
+ AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+
+ /* Symlink fails when AT_SYMLINK_NOFOLLOW set: */
+ /* dfd + path */
+ fail |= check_execveat_fail(dot_dfd, "execveat.symlink",
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+ fail |= check_execveat_fail(dot_dfd_path, "execveat.symlink",
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+ /* absolute path */
+ fail |= check_execveat_fail(AT_FDCWD, fullname_symlink,
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+
+ /* Shell script wrapping executable file: */
+ /* dfd + path */
+ fail |= check_execveat(subdir_dfd, "../script", 0);
+ fail |= check_execveat(dot_dfd, "script", 0);
+ fail |= check_execveat(dot_dfd_path, "script", 0);
+ /* absolute path */
+ fail |= check_execveat(AT_FDCWD, fullname_script, 0);
+ /* fd + no path */
+ fail |= check_execveat(fd_script, "", AT_EMPTY_PATH);
+ fail |= check_execveat(fd_script, "",
+ AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+
+ /* Mess with script file that's already open: */
+ /* fd + no path to a file that's been renamed */
+ rename("script.ephemeral", "script.moved");
+ fail |= check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+ /* fd + no path to a file that's been deleted */
+ unlink("script.moved"); /* remove the file while fd open */
+ fail |= check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+
+ /* Rename a subdirectory in the path: */
+ rename("subdir.ephemeral", "subdir.moved");
+ fail |= check_execveat(subdir_dfd_ephemeral, "../script", 0);
+ fail |= check_execveat(subdir_dfd_ephemeral, "script", 0);
+ /* Remove the subdir and its contents */
+ unlink("subdir.moved/script");
+ unlink("subdir.moved");
+ /* Shell loads via deleted subdir OK because name starts with .. */
+ fail |= check_execveat(subdir_dfd_ephemeral, "../script", 0);
+ fail |= check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT);
+
+ /* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */
+ fail |= check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL);
+ /* Invalid path => ENOENT */
+ fail |= check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT);
+ fail |= check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT);
+ fail |= check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT);
+ /* Attempt to execute directory => EACCES */
+ fail |= check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES);
+ /* Attempt to execute non-executable => EACCES */
+ fail |= check_execveat_fail(dot_dfd, "Makefile", 0, EACCES);
+ /* Attempt to execute file opened with O_PATH => EBADF */
+ fail |= check_execveat_fail(dot_dfd_path, "", AT_EMPTY_PATH, EBADF);
+ fail |= check_execveat_fail(fd_path, "", AT_EMPTY_PATH, EBADF);
+ /* Attempt to execute nonsense FD => EBADF */
+ fail |= check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF);
+ fail |= check_execveat_fail(99, "execveat", 0, EBADF);
+ /* Attempt to execute relative to non-directory => ENOTDIR */
+ fail |= check_execveat_fail(fd, "execveat", 0, ENOTDIR);
+
+ return fail ? -1 : 0;
+}
+
+void exe_cp(const char *src, const char *dest)
+{
+ int in_fd = open_or_die(src, O_RDONLY);
+ int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755);
+ struct stat info;
+
+ fstat(in_fd, &info);
+ sendfile(out_fd, in_fd, NULL, info.st_size);
+ close(in_fd);
+ close(out_fd);
+}
+
+void prerequisites(void)
+{
+ int fd;
+ const char *script = "#!/bin/sh\nexit $*\n";
+
+ /* Create ephemeral copies of files */
+ exe_cp("execveat", "execveat.ephemeral");
+ exe_cp("script", "script.ephemeral");
+ mkdir("subdir.ephemeral", 0755);
+
+ fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755);
+ write(fd, script, strlen(script));
+ close(fd);
+}
+
+int main(int argc, char **argv)
+{
+ int rc;
+
+ if (argc >= 2) {
+ /* If we are invoked with an argument, exit immediately. */
+ /* Check expected environment transferred. */
+ const char *in_test = getenv("IN_TEST");
+
+ if (!in_test || strcmp(in_test, "yes") != 0) {
+ printf("[FAIL] (no IN_TEST=yes in env)\n");
+ return 1;
+ }
+
+ /* Use the final argument as an exit code. */
+ rc = atoi(argv[argc - 1]);
+ fflush(stdout);
+ } else {
+ prerequisites();
+ rc = run_tests();
+ }
+ return rc;
+}
--
2.1.0.rc2.206.gedb03e5
Add a new system execveat(2) syscall. execveat() is to execve() as
openat() is to open(): it takes a file descriptor that refers to a
directory, and resolves the filename relative to that.
In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers. This
replicates the functionality of fexecve(), which is a system call in
other UNIXen, but in Linux glibc it depends on opening
"/proc/self/fd/<fd>" (and so relies on /proc being mounted).
The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found. This does however mean that
execution of a script in a /proc-less environment won't work.
Only x86-64, i386 and x32 ABIs are supported in this patch.
Based on patches by Meredydd Luff <[email protected]>
Signed-off-by: David Drysdale <[email protected]>
---
arch/x86/ia32/audit.c | 1 +
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/kernel/audit_64.c | 1 +
arch/x86/kernel/entry_64.S | 28 ++++++++
arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/syscall_64.tbl | 2 +
arch/x86/um/sys_call_table_64.c | 1 +
fs/exec.c | 130 ++++++++++++++++++++++++++++++++++----
fs/namei.c | 2 +-
include/linux/compat.h | 3 +
include/linux/fs.h | 1 +
include/linux/sched.h | 4 ++
include/linux/syscalls.h | 4 ++
include/uapi/asm-generic/unistd.h | 4 +-
kernel/sys_ni.c | 3 +
lib/audit.c | 3 +
16 files changed, 173 insertions(+), 16 deletions(-)
diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c
index 5d7b381da692..2eccc8932ae6 100644
--- a/arch/x86/ia32/audit.c
+++ b/arch/x86/ia32/audit.c
@@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
case __NR_socketcall:
return 4;
case __NR_execve:
+ case __NR_execveat:
return 5;
default:
return 1;
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb05023c..2516c09743e0 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -464,6 +464,7 @@ GLOBAL(\label)
PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
PTREGSCALL stub32_sigreturn, sys32_sigreturn
PTREGSCALL stub32_execve, compat_sys_execve
+ PTREGSCALL stub32_execveat, compat_sys_execveat
PTREGSCALL stub32_fork, sys_fork
PTREGSCALL stub32_vfork, sys_vfork
diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c
index 06d3e5a14d9d..f3672508b249 100644
--- a/arch/x86/kernel/audit_64.c
+++ b/arch/x86/kernel/audit_64.c
@@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
case __NR_openat:
return 3;
case __NR_execve:
+ case __NR_execveat:
return 5;
default:
return 0;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 2fac1343a90b..00c4526e6ffe 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -665,6 +665,20 @@ ENTRY(stub_execve)
CFI_ENDPROC
END(stub_execve)
+ENTRY(stub_execveat)
+ CFI_STARTPROC
+ addq $8, %rsp
+ PARTIAL_FRAME 0
+ SAVE_REST
+ FIXUP_TOP_OF_STACK %r11
+ call sys_execveat
+ RESTORE_TOP_OF_STACK %r11
+ movq %rax,RAX(%rsp)
+ RESTORE_REST
+ jmp int_ret_from_sys_call
+ CFI_ENDPROC
+END(stub_execveat)
+
/*
* sigreturn is special because it needs to restore all registers on return.
* This cannot be done with SYSRET, so use the IRET return path instead.
@@ -710,6 +724,20 @@ ENTRY(stub_x32_execve)
CFI_ENDPROC
END(stub_x32_execve)
+ENTRY(stub_x32_execveat)
+ CFI_STARTPROC
+ addq $8, %rsp
+ PARTIAL_FRAME 0
+ SAVE_REST
+ FIXUP_TOP_OF_STACK %r11
+ call compat_sys_execveat
+ RESTORE_TOP_OF_STACK %r11
+ movq %rax,RAX(%rsp)
+ RESTORE_REST
+ jmp int_ret_from_sys_call
+ CFI_ENDPROC
+END(stub_x32_execveat)
+
#endif
/*
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 028b78168d85..2633e3195455 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -363,3 +363,4 @@
354 i386 seccomp sys_seccomp
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
+357 i386 execveat sys_execveat stub32_execveat
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 35dd922727b9..1af5badd159c 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -327,6 +327,7 @@
318 common getrandom sys_getrandom
319 common memfd_create sys_memfd_create
320 common kexec_file_load sys_kexec_file_load
+321 64 execveat stub_execveat
#
# x32-specific system call numbers start at 512 to avoid cache impact
@@ -365,3 +366,4 @@
542 x32 getsockopt compat_sys_getsockopt
543 x32 io_setup compat_sys_io_setup
544 x32 io_submit compat_sys_io_submit
+545 x32 execveat stub_x32_execveat
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index f2f0723070ca..20c3649d0691 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -31,6 +31,7 @@
#define stub_fork sys_fork
#define stub_vfork sys_vfork
#define stub_execve sys_execve
+#define stub_execveat sys_execveat
#define stub_rt_sigreturn sys_rt_sigreturn
#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
diff --git a/fs/exec.c b/fs/exec.c
index a2b42a98c743..92a6e14f096a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -747,7 +747,7 @@ EXPORT_SYMBOL(setup_arg_pages);
#endif /* CONFIG_MMU */
-static struct file *do_open_exec(struct filename *name)
+static struct file *do_open_execat(int fd, struct filename *name, int flags)
{
struct file *file;
int err;
@@ -757,10 +757,34 @@ static struct file *do_open_exec(struct filename *name)
.intent = LOOKUP_OPEN,
.lookup_flags = LOOKUP_FOLLOW,
};
+ static const struct open_flags open_exec_nofollow_flags = {
+ .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
+ .acc_mode = MAY_EXEC | MAY_OPEN,
+ .intent = LOOKUP_OPEN,
+ .lookup_flags = 0,
+ };
- file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
- if (IS_ERR(file))
- goto out;
+ if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+ return ERR_PTR(-EINVAL);
+
+ if (name->name[0] != '\0') {
+ const struct open_flags *oflags = ((flags & AT_SYMLINK_NOFOLLOW)
+ ? &open_exec_nofollow_flags
+ : &open_exec_flags);
+
+ file = do_filp_open(fd, name, oflags);
+ if (IS_ERR(file))
+ goto out;
+ } else {
+ file = fget(fd);
+ if (!file)
+ return ERR_PTR(-EBADF);
+
+ err = inode_permission(file->f_path.dentry->d_inode,
+ open_exec_flags.acc_mode);
+ if (err)
+ goto exit;
+ }
err = -EACCES;
if (!S_ISREG(file_inode(file)->i_mode))
@@ -769,12 +793,13 @@ static struct file *do_open_exec(struct filename *name)
if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
goto exit;
- fsnotify_open(file);
-
err = deny_write_access(file);
if (err)
goto exit;
+ if (name->name[0] != '\0')
+ fsnotify_open(file);
+
out:
return file;
@@ -786,7 +811,7 @@ exit:
struct file *open_exec(const char *name)
{
struct filename tmp = { .name = name };
- return do_open_exec(&tmp);
+ return do_open_execat(AT_FDCWD, &tmp, 0);
}
EXPORT_SYMBOL(open_exec);
@@ -1422,10 +1447,12 @@ static int exec_binprm(struct linux_binprm *bprm)
/*
* sys_execve() executes a new program.
*/
-static int do_execve_common(struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp)
+static int do_execveat_common(int fd, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags)
{
+ char *pathbuf = NULL;
struct linux_binprm *bprm;
struct file *file;
struct files_struct *displaced;
@@ -1466,7 +1493,7 @@ static int do_execve_common(struct filename *filename,
check_unsafe_exec(bprm);
current->in_execve = 1;
- file = do_open_exec(filename);
+ file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (IS_ERR(file))
goto out_unmark;
@@ -1474,7 +1501,27 @@ static int do_execve_common(struct filename *filename,
sched_exec();
bprm->file = file;
- bprm->filename = bprm->interp = filename->name;
+ if (fd == AT_FDCWD || filename->name[0] == '/') {
+ bprm->filename = filename->name;
+ } else {
+ /*
+ * Build a pathname that reflects how we got to the file,
+ * either "/dev/fd/<fd>" (for an empty filename) or
+ * "/dev/fd/<fd>/<filename>".
+ */
+ pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
+ if (!pathbuf) {
+ retval = -ENOMEM;
+ goto out_unmark;
+ }
+ bprm->filename = pathbuf;
+ if (filename->name[0] == '\0')
+ sprintf(pathbuf, "/dev/fd/%d", fd);
+ else
+ snprintf(pathbuf, PATH_MAX,
+ "/dev/fd/%d/%s", fd, filename->name);
+ }
+ bprm->interp = bprm->filename;
retval = bprm_mm_init(bprm);
if (retval)
@@ -1532,6 +1579,7 @@ out_unmark:
out_free:
free_bprm(bprm);
+ kfree(pathbuf);
out_files:
if (displaced)
@@ -1547,7 +1595,18 @@ int do_execve(struct filename *filename,
{
struct user_arg_ptr argv = { .ptr.native = __argv };
struct user_arg_ptr envp = { .ptr.native = __envp };
- return do_execve_common(filename, argv, envp);
+ return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+int do_execveat(int fd, struct filename *filename,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
+ int flags)
+{
+ struct user_arg_ptr argv = { .ptr.native = __argv };
+ struct user_arg_ptr envp = { .ptr.native = __envp };
+
+ return do_execveat_common(fd, filename, argv, envp, flags);
}
#ifdef CONFIG_COMPAT
@@ -1563,7 +1622,23 @@ static int compat_do_execve(struct filename *filename,
.is_compat = true,
.ptr.compat = __envp,
};
- return do_execve_common(filename, argv, envp);
+ return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+static int compat_do_execveat(int fd, struct filename *filename,
+ const compat_uptr_t __user *__argv,
+ const compat_uptr_t __user *__envp,
+ int flags)
+{
+ struct user_arg_ptr argv = {
+ .is_compat = true,
+ .ptr.compat = __argv,
+ };
+ struct user_arg_ptr envp = {
+ .is_compat = true,
+ .ptr.compat = __envp,
+ };
+ return do_execveat_common(fd, filename, argv, envp, flags);
}
#endif
@@ -1603,6 +1678,20 @@ SYSCALL_DEFINE3(execve,
{
return do_execve(getname(filename), argv, envp);
}
+
+SYSCALL_DEFINE5(execveat,
+ int, fd, const char __user *, filename,
+ const char __user *const __user *, argv,
+ const char __user *const __user *, envp,
+ int, flags)
+{
+ int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+ return do_execveat(fd,
+ getname_flags(filename, lookup_flags, NULL),
+ argv, envp, flags);
+}
+
#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
const compat_uptr_t __user *, argv,
@@ -1610,4 +1699,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
{
return compat_do_execve(getname(filename), argv, envp);
}
+
+COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
+ const char __user *, filename,
+ const compat_uptr_t __user *, argv,
+ const compat_uptr_t __user *, envp,
+ int, flags)
+{
+ int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+ return compat_do_execveat(fd,
+ getname_flags(filename, lookup_flags, NULL),
+ argv, envp, flags);
+}
#endif
diff --git a/fs/namei.c b/fs/namei.c
index a7b05bf82d31..553c84d3e0cc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -130,7 +130,7 @@ void final_putname(struct filename *name)
#define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename))
-static struct filename *
+struct filename *
getname_flags(const char __user *filename, int flags, int *empty)
{
struct filename *result, *err;
diff --git a/include/linux/compat.h b/include/linux/compat.h
index e6494261eaff..7450ca2ac1fc 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
const compat_uptr_t __user *envp);
+asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
+ const compat_uptr_t __user *argv,
+ const compat_uptr_t __user *envp, int flags);
asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
compat_ulong_t __user *outp, compat_ulong_t __user *exp,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 94187721ad41..e9818574d738 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
extern struct file * dentry_open(const struct path *, int, const struct cred *);
extern int filp_close(struct file *, fl_owner_t id);
+extern struct filename *getname_flags(const char __user *, int, int *);
extern struct filename *getname(const char __user *);
extern struct filename *getname_kernel(const char *);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b867a4dab38a..33e056da7d33 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2430,6 +2430,10 @@ extern void do_group_exit(int);
extern int do_execve(struct filename *,
const char __user * const __user *,
const char __user * const __user *);
+extern int do_execveat(int, struct filename *,
+ const char __user * const __user *,
+ const char __user * const __user *,
+ int);
extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
struct task_struct *fork_idle(int);
extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 0f86d85a9ce4..df5422294deb 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
asmlinkage long sys_getrandom(char __user *buf, size_t count,
unsigned int flags);
+asmlinkage long sys_execveat(int dfd, const char __user *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 11d11bc5c78f..feef07d29663 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
__SYSCALL(__NR_getrandom, sys_getrandom)
#define __NR_memfd_create 279
__SYSCALL(__NR_memfd_create, sys_memfd_create)
+#define __NR_execveat 280
+__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
#undef __NR_syscalls
-#define __NR_syscalls 280
+#define __NR_syscalls 281
/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 391d4ddb6f4b..efb06058ad3e 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -218,3 +218,6 @@ cond_syscall(sys_kcmp);
/* operate on Secure Computing state */
cond_syscall(sys_seccomp);
+
+/* execveat */
+cond_syscall(sys_execveat);
diff --git a/lib/audit.c b/lib/audit.c
index 1d726a22565b..b8fb5ee81e26 100644
--- a/lib/audit.c
+++ b/lib/audit.c
@@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall)
case __NR_socketcall:
return 4;
#endif
+#ifdef __NR_execveat
+ case __NR_execveat:
+#endif
case __NR_execve:
return 5;
default:
--
2.1.0.rc2.206.gedb03e5
Signed-off-by: David Drysdale <[email protected]>
---
man2/execveat.2 | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 144 insertions(+)
create mode 100644 man2/execveat.2
diff --git a/man2/execveat.2 b/man2/execveat.2
new file mode 100644
index 000000000000..d19571a3eb9d
--- /dev/null
+++ b/man2/execveat.2
@@ -0,0 +1,144 @@
+.\" Copyright (c) 2014 Google, Inc.
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date. The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein. The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual"
+.SH NAME
+execveat \- execute program relative to a directory file descriptor
+.SH SYNOPSIS
+.B #include <unistd.h>
+.sp
+.BI "int execve(int " fd ", const char *" pathname ","
+.br
+.BI " char *const " argv "[], char *const " envp "[],"
+.br
+.BI " int " flags);
+.SH DESCRIPTION
+The
+.BR execveat ()
+system call executes the program pointed to by the combination of \fIfd\fP and \fIpathname\fP.
+The
+.BR execveat ()
+system call operates in exactly the same way as
+.BR execve (2),
+except for the differences described in this manual page.
+
+If the pathname given in
+.I pathname
+is relative, then it is interpreted relative to the directory
+referred to by the file descriptor
+.I fd
+(rather than relative to the current working directory of
+the calling process, as is done by
+.BR execve (2)
+for a relative pathname).
+
+If
+.I pathname
+is relative and
+.I fd
+is the special value
+.BR AT_FDCWD ,
+then
+.I pathname
+is interpreted relative to the current working
+directory of the calling process (like
+.BR execve (2)).
+
+If
+.I pathname
+is absolute, then
+.I fd
+is ignored.
+
+If
+.I pathname
+is an empty string and the
+.BR AT_EMPTY_PATH
+flag is specified, then the file descriptor
+.I fd
+specifies the file to be executed.
+
+.I flags
+can either be 0, or include the following flags:
+.TP
+.BR AT_EMPTY_PATH
+If
+.I pathname
+is an empty string, operate on the file referred to by
+.IR fd
+(which may have been obtained using the
+.BR open (2)
+.B O_PATH
+flag).
+.TP
+.B AT_SYMLINK_NOFOLLOW
+If the file identified by
+.I fd
+and a non-NULL
+.I pathname
+is a symbolic link, then the call fails with the error
+.BR EINVAL .
+.SH "RETURN VALUE"
+On success,
+.BR execveat ()
+does not return. On error \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+The same errors that occur for
+.BR execve (2)
+can also occur for
+.BR execveat ().
+The following additional errors can occur for
+.BR execveat ():
+.TP
+.B EBADF
+.I fd
+is not a valid file descriptor.
+.TP
+.B EINVAL
+Invalid flag specified in
+.IR flags .
+.TP
+.B ENOTDIR
+.I pathname
+is relative and
+.I fd
+is a file descriptor referring to a file other than a directory.
+.SH VERSIONS
+.BR execveat ()
+was added to Linux in kernel 3.???.
+.SH NOTES
+In addition to the reasons explained in
+.BR openat (2),
+the
+.BR execveat ()
+system call is also needed to allow
+.BR fexecve (3)
+to be implemented on systems that do not have the
+.I /proc
+filesystem mounted.
+.SH SEE ALSO
+.BR execve (2),
+.BR fexecve (3)
--
2.1.0.rc2.206.gedb03e5
David Drysdale <[email protected]> writes:
> Signed-off-by: David Drysdale <[email protected]>
> ---
> man2/execveat.2 | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 144 insertions(+)
> create mode 100644 man2/execveat.2
>
> diff --git a/man2/execveat.2 b/man2/execveat.2
> new file mode 100644
> index 000000000000..d19571a3eb9d
> --- /dev/null
> +++ b/man2/execveat.2
> @@ -0,0 +1,144 @@
> +.\" Copyright (c) 2014 Google, Inc.
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date. The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein. The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +execveat \- execute program relative to a directory file descriptor
> +.SH SYNOPSIS
> +.B #include <unistd.h>
> +.sp
> +.BI "int execve(int " fd ", const char *" pathname ","
^^^^^^ That needs to be execveat
> +.br
> +.BI " char *const " argv "[], char *const " envp "[],"
> +.br
> +.BI " int " flags);
> +.SH DESCRIPTION
> +The
> +.BR execveat ()
> +system call executes the program pointed to by the combination of \fIfd\fP and \fIpathname\fP.
> +The
> +.BR execveat ()
> +system call operates in exactly the same way as
> +.BR execve (2),
> +except for the differences described in this manual page.
> +
> +If the pathname given in
> +.I pathname
> +is relative, then it is interpreted relative to the directory
> +referred to by the file descriptor
> +.I fd
> +(rather than relative to the current working directory of
> +the calling process, as is done by
> +.BR execve (2)
> +for a relative pathname).
> +
> +If
> +.I pathname
> +is relative and
> +.I fd
> +is the special value
> +.BR AT_FDCWD ,
> +then
> +.I pathname
> +is interpreted relative to the current working
> +directory of the calling process (like
> +.BR execve (2)).
> +
> +If
> +.I pathname
> +is absolute, then
> +.I fd
> +is ignored.
> +
> +If
> +.I pathname
> +is an empty string and the
> +.BR AT_EMPTY_PATH
> +flag is specified, then the file descriptor
> +.I fd
> +specifies the file to be executed.
> +
> +.I flags
> +can either be 0, or include the following flags:
> +.TP
> +.BR AT_EMPTY_PATH
> +If
> +.I pathname
> +is an empty string, operate on the file referred to by
> +.IR fd
> +(which may have been obtained using the
> +.BR open (2)
> +.B O_PATH
> +flag).
> +.TP
> +.B AT_SYMLINK_NOFOLLOW
> +If the file identified by
> +.I fd
> +and a non-NULL
> +.I pathname
> +is a symbolic link, then the call fails with the error
> +.BR EINVAL .
> +.SH "RETURN VALUE"
> +On success,
> +.BR execveat ()
> +does not return. On error \-1 is returned, and
> +.I errno
> +is set appropriately.
> +.SH ERRORS
> +The same errors that occur for
> +.BR execve (2)
> +can also occur for
> +.BR execveat ().
> +The following additional errors can occur for
> +.BR execveat ():
> +.TP
> +.B EBADF
> +.I fd
> +is not a valid file descriptor.
> +.TP
> +.B EINVAL
> +Invalid flag specified in
> +.IR flags .
> +.TP
> +.B ENOTDIR
> +.I pathname
> +is relative and
> +.I fd
> +is a file descriptor referring to a file other than a directory.
> +.SH VERSIONS
> +.BR execveat ()
> +was added to Linux in kernel 3.???.
> +.SH NOTES
> +In addition to the reasons explained in
> +.BR openat (2),
> +the
> +.BR execveat ()
> +system call is also needed to allow
> +.BR fexecve (3)
> +to be implemented on systems that do not have the
> +.I /proc
> +filesystem mounted.
> +.SH SEE ALSO
> +.BR execve (2),
> +.BR fexecve (3)
David Drysdale <[email protected]> writes:
> Add a new system execveat(2) syscall. execveat() is to execve() as
> openat() is to open(): it takes a file descriptor that refers to a
> directory, and resolves the filename relative to that.
>
> In addition, if the filename is empty and AT_EMPTY_PATH is specified,
> execveat() executes the file to which the file descriptor refers. This
> replicates the functionality of fexecve(), which is a system call in
> other UNIXen, but in Linux glibc it depends on opening
> "/proc/self/fd/<fd>" (and so relies on /proc being mounted).
>
> The filename fed to the executed program as argv[0] (or the name of the
> script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
> (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
> reflecting how the executable was found. This does however mean that
> execution of a script in a /proc-less environment won't work.
>
> Only x86-64, i386 and x32 ABIs are supported in this patch.
>
> Based on patches by Meredydd Luff <[email protected]>
>
> Signed-off-by: David Drysdale <[email protected]>
> ---
> arch/x86/ia32/audit.c | 1 +
> arch/x86/ia32/ia32entry.S | 1 +
> arch/x86/kernel/audit_64.c | 1 +
> arch/x86/kernel/entry_64.S | 28 ++++++++
> arch/x86/syscalls/syscall_32.tbl | 1 +
> arch/x86/syscalls/syscall_64.tbl | 2 +
> arch/x86/um/sys_call_table_64.c | 1 +
> fs/exec.c | 130 ++++++++++++++++++++++++++++++++++----
> fs/namei.c | 2 +-
> include/linux/compat.h | 3 +
> include/linux/fs.h | 1 +
> include/linux/sched.h | 4 ++
> include/linux/syscalls.h | 4 ++
> include/uapi/asm-generic/unistd.h | 4 +-
> kernel/sys_ni.c | 3 +
> lib/audit.c | 3 +
> 16 files changed, 173 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c
> index 5d7b381da692..2eccc8932ae6 100644
> --- a/arch/x86/ia32/audit.c
> +++ b/arch/x86/ia32/audit.c
> @@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
> case __NR_socketcall:
> return 4;
> case __NR_execve:
> + case __NR_execveat:
> return 5;
> default:
> return 1;
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 4299eb05023c..2516c09743e0 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -464,6 +464,7 @@ GLOBAL(\label)
> PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
> PTREGSCALL stub32_sigreturn, sys32_sigreturn
> PTREGSCALL stub32_execve, compat_sys_execve
> + PTREGSCALL stub32_execveat, compat_sys_execveat
> PTREGSCALL stub32_fork, sys_fork
> PTREGSCALL stub32_vfork, sys_vfork
>
> diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c
> index 06d3e5a14d9d..f3672508b249 100644
> --- a/arch/x86/kernel/audit_64.c
> +++ b/arch/x86/kernel/audit_64.c
> @@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
> case __NR_openat:
> return 3;
> case __NR_execve:
> + case __NR_execveat:
> return 5;
> default:
> return 0;
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 2fac1343a90b..00c4526e6ffe 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -665,6 +665,20 @@ ENTRY(stub_execve)
> CFI_ENDPROC
> END(stub_execve)
>
> +ENTRY(stub_execveat)
> + CFI_STARTPROC
> + addq $8, %rsp
> + PARTIAL_FRAME 0
> + SAVE_REST
> + FIXUP_TOP_OF_STACK %r11
> + call sys_execveat
> + RESTORE_TOP_OF_STACK %r11
> + movq %rax,RAX(%rsp)
> + RESTORE_REST
> + jmp int_ret_from_sys_call
> + CFI_ENDPROC
> +END(stub_execveat)
> +
> /*
> * sigreturn is special because it needs to restore all registers on return.
> * This cannot be done with SYSRET, so use the IRET return path instead.
> @@ -710,6 +724,20 @@ ENTRY(stub_x32_execve)
> CFI_ENDPROC
> END(stub_x32_execve)
>
> +ENTRY(stub_x32_execveat)
> + CFI_STARTPROC
> + addq $8, %rsp
> + PARTIAL_FRAME 0
> + SAVE_REST
> + FIXUP_TOP_OF_STACK %r11
> + call compat_sys_execveat
> + RESTORE_TOP_OF_STACK %r11
> + movq %rax,RAX(%rsp)
> + RESTORE_REST
> + jmp int_ret_from_sys_call
> + CFI_ENDPROC
> +END(stub_x32_execveat)
> +
> #endif
>
> /*
> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
> index 028b78168d85..2633e3195455 100644
> --- a/arch/x86/syscalls/syscall_32.tbl
> +++ b/arch/x86/syscalls/syscall_32.tbl
> @@ -363,3 +363,4 @@
> 354 i386 seccomp sys_seccomp
> 355 i386 getrandom sys_getrandom
> 356 i386 memfd_create sys_memfd_create
> +357 i386 execveat sys_execveat stub32_execveat
> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
> index 35dd922727b9..1af5badd159c 100644
> --- a/arch/x86/syscalls/syscall_64.tbl
> +++ b/arch/x86/syscalls/syscall_64.tbl
> @@ -327,6 +327,7 @@
> 318 common getrandom sys_getrandom
> 319 common memfd_create sys_memfd_create
> 320 common kexec_file_load sys_kexec_file_load
> +321 64 execveat stub_execveat
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> @@ -365,3 +366,4 @@
> 542 x32 getsockopt compat_sys_getsockopt
> 543 x32 io_setup compat_sys_io_setup
> 544 x32 io_submit compat_sys_io_submit
> +545 x32 execveat stub_x32_execveat
> diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
> index f2f0723070ca..20c3649d0691 100644
> --- a/arch/x86/um/sys_call_table_64.c
> +++ b/arch/x86/um/sys_call_table_64.c
> @@ -31,6 +31,7 @@
> #define stub_fork sys_fork
> #define stub_vfork sys_vfork
> #define stub_execve sys_execve
> +#define stub_execveat sys_execveat
> #define stub_rt_sigreturn sys_rt_sigreturn
>
> #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
> diff --git a/fs/exec.c b/fs/exec.c
> index a2b42a98c743..92a6e14f096a 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -747,7 +747,7 @@ EXPORT_SYMBOL(setup_arg_pages);
>
> #endif /* CONFIG_MMU */
>
> -static struct file *do_open_exec(struct filename *name)
> +static struct file *do_open_execat(int fd, struct filename *name, int flags)
> {
> struct file *file;
> int err;
> @@ -757,10 +757,34 @@ static struct file *do_open_exec(struct filename *name)
> .intent = LOOKUP_OPEN,
> .lookup_flags = LOOKUP_FOLLOW,
> };
> + static const struct open_flags open_exec_nofollow_flags = {
> + .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
> + .acc_mode = MAY_EXEC | MAY_OPEN,
> + .intent = LOOKUP_OPEN,
> + .lookup_flags = 0,
> + };
>
> - file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
> - if (IS_ERR(file))
> - goto out;
> + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> + return ERR_PTR(-EINVAL);
> +
> + if (name->name[0] != '\0') {
Is it really necessary to special case AT_EMPTY_PATH here. I would
have thought the existing logic in namei.c would have been fine
assuning we passed LOOKUP_EMPTY.
> + const struct open_flags *oflags = ((flags & AT_SYMLINK_NOFOLLOW)
> + ? &open_exec_nofollow_flags
> + : &open_exec_flags);
> +
> + file = do_filp_open(fd, name, oflags);
> + if (IS_ERR(file))
> + goto out;
> + } else {
> + file = fget(fd);
> + if (!file)
> + return ERR_PTR(-EBADF);
> +
> + err = inode_permission(file->f_path.dentry->d_inode,
> + open_exec_flags.acc_mode);
> + if (err)
> + goto exit;
> + }
>
> err = -EACCES;
> if (!S_ISREG(file_inode(file)->i_mode))
> @@ -769,12 +793,13 @@ static struct file *do_open_exec(struct filename *name)
> if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
> goto exit;
>
> - fsnotify_open(file);
> -
> err = deny_write_access(file);
> if (err)
> goto exit;
>
> + if (name->name[0] != '\0')
> + fsnotify_open(file);
> +
> out:
> return file;
>
> @@ -786,7 +811,7 @@ exit:
> struct file *open_exec(const char *name)
> {
> struct filename tmp = { .name = name };
> - return do_open_exec(&tmp);
> + return do_open_execat(AT_FDCWD, &tmp, 0);
> }
> EXPORT_SYMBOL(open_exec);
>
> @@ -1422,10 +1447,12 @@ static int exec_binprm(struct linux_binprm *bprm)
> /*
> * sys_execve() executes a new program.
> */
> -static int do_execve_common(struct filename *filename,
> - struct user_arg_ptr argv,
> - struct user_arg_ptr envp)
> +static int do_execveat_common(int fd, struct filename *filename,
> + struct user_arg_ptr argv,
> + struct user_arg_ptr envp,
> + int flags)
> {
> + char *pathbuf = NULL;
> struct linux_binprm *bprm;
> struct file *file;
> struct files_struct *displaced;
> @@ -1466,7 +1493,7 @@ static int do_execve_common(struct filename *filename,
> check_unsafe_exec(bprm);
> current->in_execve = 1;
>
> - file = do_open_exec(filename);
> + file = do_open_execat(fd, filename, flags);
> retval = PTR_ERR(file);
> if (IS_ERR(file))
> goto out_unmark;
> @@ -1474,7 +1501,27 @@ static int do_execve_common(struct filename *filename,
> sched_exec();
>
> bprm->file = file;
> - bprm->filename = bprm->interp = filename->name;
> + if (fd == AT_FDCWD || filename->name[0] == '/') {
> + bprm->filename = filename->name;
> + } else {
> + /*
> + * Build a pathname that reflects how we got to the file,
> + * either "/dev/fd/<fd>" (for an empty filename) or
> + * "/dev/fd/<fd>/<filename>".
> + */
> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
> + if (!pathbuf) {
> + retval = -ENOMEM;
> + goto out_unmark;
> + }
> + bprm->filename = pathbuf;
> + if (filename->name[0] == '\0')
> + sprintf(pathbuf, "/dev/fd/%d", fd);
> + else
> + snprintf(pathbuf, PATH_MAX,
> + "/dev/fd/%d/%s", fd, filename->name);
> + }
> + bprm->interp = bprm->filename;
>
> retval = bprm_mm_init(bprm);
> if (retval)
> @@ -1532,6 +1579,7 @@ out_unmark:
>
> out_free:
> free_bprm(bprm);
> + kfree(pathbuf);
>
> out_files:
> if (displaced)
> @@ -1547,7 +1595,18 @@ int do_execve(struct filename *filename,
> {
> struct user_arg_ptr argv = { .ptr.native = __argv };
> struct user_arg_ptr envp = { .ptr.native = __envp };
> - return do_execve_common(filename, argv, envp);
> + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
> +}
> +
> +int do_execveat(int fd, struct filename *filename,
> + const char __user *const __user *__argv,
> + const char __user *const __user *__envp,
> + int flags)
> +{
> + struct user_arg_ptr argv = { .ptr.native = __argv };
> + struct user_arg_ptr envp = { .ptr.native = __envp };
> +
> + return do_execveat_common(fd, filename, argv, envp, flags);
> }
>
> #ifdef CONFIG_COMPAT
> @@ -1563,7 +1622,23 @@ static int compat_do_execve(struct filename *filename,
> .is_compat = true,
> .ptr.compat = __envp,
> };
> - return do_execve_common(filename, argv, envp);
> + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
> +}
> +
> +static int compat_do_execveat(int fd, struct filename *filename,
> + const compat_uptr_t __user *__argv,
> + const compat_uptr_t __user *__envp,
> + int flags)
> +{
> + struct user_arg_ptr argv = {
> + .is_compat = true,
> + .ptr.compat = __argv,
> + };
> + struct user_arg_ptr envp = {
> + .is_compat = true,
> + .ptr.compat = __envp,
> + };
> + return do_execveat_common(fd, filename, argv, envp, flags);
> }
> #endif
>
> @@ -1603,6 +1678,20 @@ SYSCALL_DEFINE3(execve,
> {
> return do_execve(getname(filename), argv, envp);
> }
> +
> +SYSCALL_DEFINE5(execveat,
> + int, fd, const char __user *, filename,
> + const char __user *const __user *, argv,
> + const char __user *const __user *, envp,
> + int, flags)
> +{
> + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
> +
> + return do_execveat(fd,
> + getname_flags(filename, lookup_flags, NULL),
> + argv, envp, flags);
> +}
> +
> #ifdef CONFIG_COMPAT
> COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
> const compat_uptr_t __user *, argv,
> @@ -1610,4 +1699,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
> {
> return compat_do_execve(getname(filename), argv, envp);
> }
> +
> +COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
> + const char __user *, filename,
> + const compat_uptr_t __user *, argv,
> + const compat_uptr_t __user *, envp,
> + int, flags)
> +{
> + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
> +
> + return compat_do_execveat(fd,
> + getname_flags(filename, lookup_flags, NULL),
> + argv, envp, flags);
> +}
> #endif
> diff --git a/fs/namei.c b/fs/namei.c
> index a7b05bf82d31..553c84d3e0cc 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -130,7 +130,7 @@ void final_putname(struct filename *name)
>
> #define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename))
>
> -static struct filename *
> +struct filename *
> getname_flags(const char __user *filename, int flags, int *empty)
> {
> struct filename *result, *err;
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index e6494261eaff..7450ca2ac1fc 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
>
> asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
> const compat_uptr_t __user *envp);
> +asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
> + const compat_uptr_t __user *argv,
> + const compat_uptr_t __user *envp, int flags);
>
> asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
> compat_ulong_t __user *outp, compat_ulong_t __user *exp,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 94187721ad41..e9818574d738 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
> extern struct file * dentry_open(const struct path *, int, const struct cred *);
> extern int filp_close(struct file *, fl_owner_t id);
>
> +extern struct filename *getname_flags(const char __user *, int, int *);
> extern struct filename *getname(const char __user *);
> extern struct filename *getname_kernel(const char *);
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b867a4dab38a..33e056da7d33 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2430,6 +2430,10 @@ extern void do_group_exit(int);
> extern int do_execve(struct filename *,
> const char __user * const __user *,
> const char __user * const __user *);
> +extern int do_execveat(int, struct filename *,
> + const char __user * const __user *,
> + const char __user * const __user *,
> + int);
> extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
> struct task_struct *fork_idle(int);
> extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 0f86d85a9ce4..df5422294deb 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
> asmlinkage long sys_getrandom(char __user *buf, size_t count,
> unsigned int flags);
>
> +asmlinkage long sys_execveat(int dfd, const char __user *filename,
> + const char __user *const __user *argv,
> + const char __user *const __user *envp, int flags);
> +
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 11d11bc5c78f..feef07d29663 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
> __SYSCALL(__NR_getrandom, sys_getrandom)
> #define __NR_memfd_create 279
> __SYSCALL(__NR_memfd_create, sys_memfd_create)
> +#define __NR_execveat 280
> +__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 280
> +#define __NR_syscalls 281
>
> /*
> * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 391d4ddb6f4b..efb06058ad3e 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -218,3 +218,6 @@ cond_syscall(sys_kcmp);
>
> /* operate on Secure Computing state */
> cond_syscall(sys_seccomp);
> +
> +/* execveat */
> +cond_syscall(sys_execveat);
> diff --git a/lib/audit.c b/lib/audit.c
> index 1d726a22565b..b8fb5ee81e26 100644
> --- a/lib/audit.c
> +++ b/lib/audit.c
> @@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall)
> case __NR_socketcall:
> return 4;
> #endif
> +#ifdef __NR_execveat
> + case __NR_execveat:
> +#endif
> case __NR_execve:
> return 5;
> default:
On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <[email protected]> wrote:
> Add a new system execveat(2) syscall. execveat() is to execve() as
> openat() is to open(): it takes a file descriptor that refers to a
> directory, and resolves the filename relative to that.
>
> bprm->file = file;
> - bprm->filename = bprm->interp = filename->name;
> + if (fd == AT_FDCWD || filename->name[0] == '/') {
> + bprm->filename = filename->name;
> + } else {
> + /*
> + * Build a pathname that reflects how we got to the file,
> + * either "/dev/fd/<fd>" (for an empty filename) or
> + * "/dev/fd/<fd>/<filename>".
> + */
> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
> + if (!pathbuf) {
> + retval = -ENOMEM;
> + goto out_unmark;
> + }
> + bprm->filename = pathbuf;
> + if (filename->name[0] == '\0')
> + sprintf(pathbuf, "/dev/fd/%d", fd);
If the fd is O_CLOEXEC, then this will result in a confused child
process. Should we fail exec attempts like that for non-static
programs? (E.g. set filename to "" or something and fix up the binfmt
drivers to handle that?)
> + else
> + snprintf(pathbuf, PATH_MAX,
> + "/dev/fd/%d/%s", fd, filename->name);
Does this need to handle the case where the result exceeds PATH_MAX?
--Andy
On Wed, Oct 22, 2014 at 6:55 PM, Eric W. Biederman
<[email protected]> wrote:
> David Drysdale <[email protected]> writes:
>> +.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual"
>> +.SH NAME
>> +execveat \- execute program relative to a directory file descriptor
>> +.SH SYNOPSIS
>> +.B #include <unistd.h>
>> +.sp
>> +.BI "int execve(int " fd ", const char *" pathname ","
> ^^^^^^ That needs to be execveat
Ah yes, so it does -- thanks for spotting.
On Wed, Oct 22, 2014 at 7:07 PM, Eric W. Biederman
<[email protected]> wrote:
> David Drysdale <[email protected]> writes:
>
>> Add a new system execveat(2) syscall. execveat() is to execve() as
>> openat() is to open(): it takes a file descriptor that refers to a
>> directory, and resolves the filename relative to that.
>>
>> In addition, if the filename is empty and AT_EMPTY_PATH is specified,
>> execveat() executes the file to which the file descriptor refers. This
>> replicates the functionality of fexecve(), which is a system call in
>> other UNIXen, but in Linux glibc it depends on opening
>> "/proc/self/fd/<fd>" (and so relies on /proc being mounted).
>>
>> The filename fed to the executed program as argv[0] (or the name of the
>> script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
>> (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
>> reflecting how the executable was found. This does however mean that
>> execution of a script in a /proc-less environment won't work.
>>
>> Only x86-64, i386 and x32 ABIs are supported in this patch.
>>
>> Based on patches by Meredydd Luff <[email protected]>
>>
>> Signed-off-by: David Drysdale <[email protected]>
>> ---
>> arch/x86/ia32/audit.c | 1 +
>> arch/x86/ia32/ia32entry.S | 1 +
>> arch/x86/kernel/audit_64.c | 1 +
>> arch/x86/kernel/entry_64.S | 28 ++++++++
>> arch/x86/syscalls/syscall_32.tbl | 1 +
>> arch/x86/syscalls/syscall_64.tbl | 2 +
>> arch/x86/um/sys_call_table_64.c | 1 +
>> fs/exec.c | 130 ++++++++++++++++++++++++++++++++++----
>> fs/namei.c | 2 +-
>> include/linux/compat.h | 3 +
>> include/linux/fs.h | 1 +
>> include/linux/sched.h | 4 ++
>> include/linux/syscalls.h | 4 ++
>> include/uapi/asm-generic/unistd.h | 4 +-
>> kernel/sys_ni.c | 3 +
>> lib/audit.c | 3 +
>> 16 files changed, 173 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c
>> index 5d7b381da692..2eccc8932ae6 100644
>> --- a/arch/x86/ia32/audit.c
>> +++ b/arch/x86/ia32/audit.c
>> @@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
>> case __NR_socketcall:
>> return 4;
>> case __NR_execve:
>> + case __NR_execveat:
>> return 5;
>> default:
>> return 1;
>> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> index 4299eb05023c..2516c09743e0 100644
>> --- a/arch/x86/ia32/ia32entry.S
>> +++ b/arch/x86/ia32/ia32entry.S
>> @@ -464,6 +464,7 @@ GLOBAL(\label)
>> PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
>> PTREGSCALL stub32_sigreturn, sys32_sigreturn
>> PTREGSCALL stub32_execve, compat_sys_execve
>> + PTREGSCALL stub32_execveat, compat_sys_execveat
>> PTREGSCALL stub32_fork, sys_fork
>> PTREGSCALL stub32_vfork, sys_vfork
>>
>> diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c
>> index 06d3e5a14d9d..f3672508b249 100644
>> --- a/arch/x86/kernel/audit_64.c
>> +++ b/arch/x86/kernel/audit_64.c
>> @@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
>> case __NR_openat:
>> return 3;
>> case __NR_execve:
>> + case __NR_execveat:
>> return 5;
>> default:
>> return 0;
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index 2fac1343a90b..00c4526e6ffe 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -665,6 +665,20 @@ ENTRY(stub_execve)
>> CFI_ENDPROC
>> END(stub_execve)
>>
>> +ENTRY(stub_execveat)
>> + CFI_STARTPROC
>> + addq $8, %rsp
>> + PARTIAL_FRAME 0
>> + SAVE_REST
>> + FIXUP_TOP_OF_STACK %r11
>> + call sys_execveat
>> + RESTORE_TOP_OF_STACK %r11
>> + movq %rax,RAX(%rsp)
>> + RESTORE_REST
>> + jmp int_ret_from_sys_call
>> + CFI_ENDPROC
>> +END(stub_execveat)
>> +
>> /*
>> * sigreturn is special because it needs to restore all registers on return.
>> * This cannot be done with SYSRET, so use the IRET return path instead.
>> @@ -710,6 +724,20 @@ ENTRY(stub_x32_execve)
>> CFI_ENDPROC
>> END(stub_x32_execve)
>>
>> +ENTRY(stub_x32_execveat)
>> + CFI_STARTPROC
>> + addq $8, %rsp
>> + PARTIAL_FRAME 0
>> + SAVE_REST
>> + FIXUP_TOP_OF_STACK %r11
>> + call compat_sys_execveat
>> + RESTORE_TOP_OF_STACK %r11
>> + movq %rax,RAX(%rsp)
>> + RESTORE_REST
>> + jmp int_ret_from_sys_call
>> + CFI_ENDPROC
>> +END(stub_x32_execveat)
>> +
>> #endif
>>
>> /*
>> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
>> index 028b78168d85..2633e3195455 100644
>> --- a/arch/x86/syscalls/syscall_32.tbl
>> +++ b/arch/x86/syscalls/syscall_32.tbl
>> @@ -363,3 +363,4 @@
>> 354 i386 seccomp sys_seccomp
>> 355 i386 getrandom sys_getrandom
>> 356 i386 memfd_create sys_memfd_create
>> +357 i386 execveat sys_execveat stub32_execveat
>> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
>> index 35dd922727b9..1af5badd159c 100644
>> --- a/arch/x86/syscalls/syscall_64.tbl
>> +++ b/arch/x86/syscalls/syscall_64.tbl
>> @@ -327,6 +327,7 @@
>> 318 common getrandom sys_getrandom
>> 319 common memfd_create sys_memfd_create
>> 320 common kexec_file_load sys_kexec_file_load
>> +321 64 execveat stub_execveat
>>
>> #
>> # x32-specific system call numbers start at 512 to avoid cache impact
>> @@ -365,3 +366,4 @@
>> 542 x32 getsockopt compat_sys_getsockopt
>> 543 x32 io_setup compat_sys_io_setup
>> 544 x32 io_submit compat_sys_io_submit
>> +545 x32 execveat stub_x32_execveat
>> diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
>> index f2f0723070ca..20c3649d0691 100644
>> --- a/arch/x86/um/sys_call_table_64.c
>> +++ b/arch/x86/um/sys_call_table_64.c
>> @@ -31,6 +31,7 @@
>> #define stub_fork sys_fork
>> #define stub_vfork sys_vfork
>> #define stub_execve sys_execve
>> +#define stub_execveat sys_execveat
>> #define stub_rt_sigreturn sys_rt_sigreturn
>>
>> #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
>> diff --git a/fs/exec.c b/fs/exec.c
>> index a2b42a98c743..92a6e14f096a 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -747,7 +747,7 @@ EXPORT_SYMBOL(setup_arg_pages);
>>
>> #endif /* CONFIG_MMU */
>>
>> -static struct file *do_open_exec(struct filename *name)
>> +static struct file *do_open_execat(int fd, struct filename *name, int flags)
>> {
>> struct file *file;
>> int err;
>> @@ -757,10 +757,34 @@ static struct file *do_open_exec(struct filename *name)
>> .intent = LOOKUP_OPEN,
>> .lookup_flags = LOOKUP_FOLLOW,
>> };
>> + static const struct open_flags open_exec_nofollow_flags = {
>> + .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
>> + .acc_mode = MAY_EXEC | MAY_OPEN,
>> + .intent = LOOKUP_OPEN,
>> + .lookup_flags = 0,
>> + };
>>
>> - file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
>> - if (IS_ERR(file))
>> - goto out;
>> + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
>> + return ERR_PTR(-EINVAL);
>> +
>> + if (name->name[0] != '\0') {
>
> Is it really necessary to special case AT_EMPTY_PATH here. I would
> have thought the existing logic in namei.c would have been fine
> assuning we passed LOOKUP_EMPTY.
Just using do_filp_open() throughout looks mostly plausible on a quick
experiment, but my initial version appears to make O_PATH fds unexpectedly
fexecve()-able (I'm glad I had a test case for that).
I'll look for a way around that, hopefully without an explicit special case.
>> + const struct open_flags *oflags = ((flags & AT_SYMLINK_NOFOLLOW)
>> + ? &open_exec_nofollow_flags
>> + : &open_exec_flags);
>> +
>> + file = do_filp_open(fd, name, oflags);
>> + if (IS_ERR(file))
>> + goto out;
>> + } else {
>> + file = fget(fd);
>> + if (!file)
>> + return ERR_PTR(-EBADF);
>> +
>> + err = inode_permission(file->f_path.dentry->d_inode,
>> + open_exec_flags.acc_mode);
>> + if (err)
>> + goto exit;
>> + }
>>
>> err = -EACCES;
>> if (!S_ISREG(file_inode(file)->i_mode))
>> @@ -769,12 +793,13 @@ static struct file *do_open_exec(struct filename *name)
>> if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
>> goto exit;
>>
>> - fsnotify_open(file);
>> -
>> err = deny_write_access(file);
>> if (err)
>> goto exit;
>>
>> + if (name->name[0] != '\0')
>> + fsnotify_open(file);
>> +
>> out:
>> return file;
>>
>> @@ -786,7 +811,7 @@ exit:
>> struct file *open_exec(const char *name)
>> {
>> struct filename tmp = { .name = name };
>> - return do_open_exec(&tmp);
>> + return do_open_execat(AT_FDCWD, &tmp, 0);
>> }
>> EXPORT_SYMBOL(open_exec);
>>
>> @@ -1422,10 +1447,12 @@ static int exec_binprm(struct linux_binprm *bprm)
>> /*
>> * sys_execve() executes a new program.
>> */
>> -static int do_execve_common(struct filename *filename,
>> - struct user_arg_ptr argv,
>> - struct user_arg_ptr envp)
>> +static int do_execveat_common(int fd, struct filename *filename,
>> + struct user_arg_ptr argv,
>> + struct user_arg_ptr envp,
>> + int flags)
>> {
>> + char *pathbuf = NULL;
>> struct linux_binprm *bprm;
>> struct file *file;
>> struct files_struct *displaced;
>> @@ -1466,7 +1493,7 @@ static int do_execve_common(struct filename *filename,
>> check_unsafe_exec(bprm);
>> current->in_execve = 1;
>>
>> - file = do_open_exec(filename);
>> + file = do_open_execat(fd, filename, flags);
>> retval = PTR_ERR(file);
>> if (IS_ERR(file))
>> goto out_unmark;
>> @@ -1474,7 +1501,27 @@ static int do_execve_common(struct filename *filename,
>> sched_exec();
>>
>> bprm->file = file;
>> - bprm->filename = bprm->interp = filename->name;
>> + if (fd == AT_FDCWD || filename->name[0] == '/') {
>> + bprm->filename = filename->name;
>> + } else {
>> + /*
>> + * Build a pathname that reflects how we got to the file,
>> + * either "/dev/fd/<fd>" (for an empty filename) or
>> + * "/dev/fd/<fd>/<filename>".
>> + */
>> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
>> + if (!pathbuf) {
>> + retval = -ENOMEM;
>> + goto out_unmark;
>> + }
>> + bprm->filename = pathbuf;
>> + if (filename->name[0] == '\0')
>> + sprintf(pathbuf, "/dev/fd/%d", fd);
>> + else
>> + snprintf(pathbuf, PATH_MAX,
>> + "/dev/fd/%d/%s", fd, filename->name);
>> + }
>> + bprm->interp = bprm->filename;
>>
>> retval = bprm_mm_init(bprm);
>> if (retval)
>> @@ -1532,6 +1579,7 @@ out_unmark:
>>
>> out_free:
>> free_bprm(bprm);
>> + kfree(pathbuf);
>>
>> out_files:
>> if (displaced)
>> @@ -1547,7 +1595,18 @@ int do_execve(struct filename *filename,
>> {
>> struct user_arg_ptr argv = { .ptr.native = __argv };
>> struct user_arg_ptr envp = { .ptr.native = __envp };
>> - return do_execve_common(filename, argv, envp);
>> + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
>> +}
>> +
>> +int do_execveat(int fd, struct filename *filename,
>> + const char __user *const __user *__argv,
>> + const char __user *const __user *__envp,
>> + int flags)
>> +{
>> + struct user_arg_ptr argv = { .ptr.native = __argv };
>> + struct user_arg_ptr envp = { .ptr.native = __envp };
>> +
>> + return do_execveat_common(fd, filename, argv, envp, flags);
>> }
>>
>> #ifdef CONFIG_COMPAT
>> @@ -1563,7 +1622,23 @@ static int compat_do_execve(struct filename *filename,
>> .is_compat = true,
>> .ptr.compat = __envp,
>> };
>> - return do_execve_common(filename, argv, envp);
>> + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
>> +}
>> +
>> +static int compat_do_execveat(int fd, struct filename *filename,
>> + const compat_uptr_t __user *__argv,
>> + const compat_uptr_t __user *__envp,
>> + int flags)
>> +{
>> + struct user_arg_ptr argv = {
>> + .is_compat = true,
>> + .ptr.compat = __argv,
>> + };
>> + struct user_arg_ptr envp = {
>> + .is_compat = true,
>> + .ptr.compat = __envp,
>> + };
>> + return do_execveat_common(fd, filename, argv, envp, flags);
>> }
>> #endif
>>
>> @@ -1603,6 +1678,20 @@ SYSCALL_DEFINE3(execve,
>> {
>> return do_execve(getname(filename), argv, envp);
>> }
>> +
>> +SYSCALL_DEFINE5(execveat,
>> + int, fd, const char __user *, filename,
>> + const char __user *const __user *, argv,
>> + const char __user *const __user *, envp,
>> + int, flags)
>> +{
>> + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
>> +
>> + return do_execveat(fd,
>> + getname_flags(filename, lookup_flags, NULL),
>> + argv, envp, flags);
>> +}
>> +
>> #ifdef CONFIG_COMPAT
>> COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
>> const compat_uptr_t __user *, argv,
>> @@ -1610,4 +1699,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
>> {
>> return compat_do_execve(getname(filename), argv, envp);
>> }
>> +
>> +COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
>> + const char __user *, filename,
>> + const compat_uptr_t __user *, argv,
>> + const compat_uptr_t __user *, envp,
>> + int, flags)
>> +{
>> + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
>> +
>> + return compat_do_execveat(fd,
>> + getname_flags(filename, lookup_flags, NULL),
>> + argv, envp, flags);
>> +}
>> #endif
>> diff --git a/fs/namei.c b/fs/namei.c
>> index a7b05bf82d31..553c84d3e0cc 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -130,7 +130,7 @@ void final_putname(struct filename *name)
>>
>> #define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename))
>>
>> -static struct filename *
>> +struct filename *
>> getname_flags(const char __user *filename, int flags, int *empty)
>> {
>> struct filename *result, *err;
>> diff --git a/include/linux/compat.h b/include/linux/compat.h
>> index e6494261eaff..7450ca2ac1fc 100644
>> --- a/include/linux/compat.h
>> +++ b/include/linux/compat.h
>> @@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
>>
>> asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
>> const compat_uptr_t __user *envp);
>> +asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
>> + const compat_uptr_t __user *argv,
>> + const compat_uptr_t __user *envp, int flags);
>>
>> asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
>> compat_ulong_t __user *outp, compat_ulong_t __user *exp,
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 94187721ad41..e9818574d738 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
>> extern struct file * dentry_open(const struct path *, int, const struct cred *);
>> extern int filp_close(struct file *, fl_owner_t id);
>>
>> +extern struct filename *getname_flags(const char __user *, int, int *);
>> extern struct filename *getname(const char __user *);
>> extern struct filename *getname_kernel(const char *);
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index b867a4dab38a..33e056da7d33 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -2430,6 +2430,10 @@ extern void do_group_exit(int);
>> extern int do_execve(struct filename *,
>> const char __user * const __user *,
>> const char __user * const __user *);
>> +extern int do_execveat(int, struct filename *,
>> + const char __user * const __user *,
>> + const char __user * const __user *,
>> + int);
>> extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
>> struct task_struct *fork_idle(int);
>> extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index 0f86d85a9ce4..df5422294deb 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
>> asmlinkage long sys_getrandom(char __user *buf, size_t count,
>> unsigned int flags);
>>
>> +asmlinkage long sys_execveat(int dfd, const char __user *filename,
>> + const char __user *const __user *argv,
>> + const char __user *const __user *envp, int flags);
>> +
>> #endif
>> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
>> index 11d11bc5c78f..feef07d29663 100644
>> --- a/include/uapi/asm-generic/unistd.h
>> +++ b/include/uapi/asm-generic/unistd.h
>> @@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
>> __SYSCALL(__NR_getrandom, sys_getrandom)
>> #define __NR_memfd_create 279
>> __SYSCALL(__NR_memfd_create, sys_memfd_create)
>> +#define __NR_execveat 280
>> +__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
>>
>> #undef __NR_syscalls
>> -#define __NR_syscalls 280
>> +#define __NR_syscalls 281
>>
>> /*
>> * All syscalls below here should go away really,
>> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
>> index 391d4ddb6f4b..efb06058ad3e 100644
>> --- a/kernel/sys_ni.c
>> +++ b/kernel/sys_ni.c
>> @@ -218,3 +218,6 @@ cond_syscall(sys_kcmp);
>>
>> /* operate on Secure Computing state */
>> cond_syscall(sys_seccomp);
>> +
>> +/* execveat */
>> +cond_syscall(sys_execveat);
>> diff --git a/lib/audit.c b/lib/audit.c
>> index 1d726a22565b..b8fb5ee81e26 100644
>> --- a/lib/audit.c
>> +++ b/lib/audit.c
>> @@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall)
>> case __NR_socketcall:
>> return 4;
>> #endif
>> +#ifdef __NR_execveat
>> + case __NR_execveat:
>> +#endif
>> case __NR_execve:
>> return 5;
>> default:
On Wed, Oct 22, 2014 at 7:44 PM, Andy Lutomirski <[email protected]> wrote:
> On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <[email protected]> wrote:
>> Add a new system execveat(2) syscall. execveat() is to execve() as
>> openat() is to open(): it takes a file descriptor that refers to a
>> directory, and resolves the filename relative to that.
>>
>
>> bprm->file = file;
>> - bprm->filename = bprm->interp = filename->name;
>> + if (fd == AT_FDCWD || filename->name[0] == '/') {
>> + bprm->filename = filename->name;
>> + } else {
>> + /*
>> + * Build a pathname that reflects how we got to the file,
>> + * either "/dev/fd/<fd>" (for an empty filename) or
>> + * "/dev/fd/<fd>/<filename>".
>> + */
>> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
>> + if (!pathbuf) {
>> + retval = -ENOMEM;
>> + goto out_unmark;
>> + }
>> + bprm->filename = pathbuf;
>> + if (filename->name[0] == '\0')
>> + sprintf(pathbuf, "/dev/fd/%d", fd);
>
> If the fd is O_CLOEXEC, then this will result in a confused child
> process. Should we fail exec attempts like that for non-static
> programs? (E.g. set filename to "" or something and fix up the binfmt
> drivers to handle that?)
Isn't it just scripts that get confused here (as normal executables don't
get to see brpm->filename)?
Given that we don't know which we have at this point, I'd suggest
carrying on regardless. Or we could fall back to use the previous
best-effort d_path() code for O_CLOEXEC fds. Thoughts?
>> + else
>> + snprintf(pathbuf, PATH_MAX,
>> + "/dev/fd/%d/%s", fd, filename->name);
>
> Does this need to handle the case where the result exceeds PATH_MAX?
I guess we could kmalloc(strlen(filename->name) + 19) to avoid the
possibility of failure, but that just defers the inevitable -- the interpreter
won't be able to open the script file anyway. But it would at least then
generate the appropriate error (ENAMETOOLONG rather than ENOENT).
On Mon, Oct 27, 2014 at 11:03 AM, David Drysdale <[email protected]> wrote:
> On Wed, Oct 22, 2014 at 7:44 PM, Andy Lutomirski <[email protected]> wrote:
>> On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <[email protected]> wrote:
>>> Add a new system execveat(2) syscall. execveat() is to execve() as
>>> openat() is to open(): it takes a file descriptor that refers to a
>>> directory, and resolves the filename relative to that.
>>>
>>
>>> bprm->file = file;
>>> - bprm->filename = bprm->interp = filename->name;
>>> + if (fd == AT_FDCWD || filename->name[0] == '/') {
>>> + bprm->filename = filename->name;
>>> + } else {
>>> + /*
>>> + * Build a pathname that reflects how we got to the file,
>>> + * either "/dev/fd/<fd>" (for an empty filename) or
>>> + * "/dev/fd/<fd>/<filename>".
>>> + */
>>> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
>>> + if (!pathbuf) {
>>> + retval = -ENOMEM;
>>> + goto out_unmark;
>>> + }
>>> + bprm->filename = pathbuf;
>>> + if (filename->name[0] == '\0')
>>> + sprintf(pathbuf, "/dev/fd/%d", fd);
>>
>> If the fd is O_CLOEXEC, then this will result in a confused child
>> process. Should we fail exec attempts like that for non-static
>> programs? (E.g. set filename to "" or something and fix up the binfmt
>> drivers to handle that?)
>
> Isn't it just scripts that get confused here (as normal executables don't
> get to see brpm->filename)?
>
> Given that we don't know which we have at this point, I'd suggest
> carrying on regardless. Or we could fall back to use the previous
> best-effort d_path() code for O_CLOEXEC fds. Thoughts?
How hard would it be to mark the bprm as not having a path for the
binary? Then we could fail later on if and when we actually need the
path.
I don't really have a strong opinion here, though. I do prefer
actually failing the execveat call over succeeding but invoking a
script interpreter than can't possibly work.
>
>>> + else
>>> + snprintf(pathbuf, PATH_MAX,
>>> + "/dev/fd/%d/%s", fd, filename->name);
>>
>> Does this need to handle the case where the result exceeds PATH_MAX?
>
> I guess we could kmalloc(strlen(filename->name) + 19) to avoid the
> possibility of failure, but that just defers the inevitable -- the interpreter
> won't be able to open the script file anyway. But it would at least then
> generate the appropriate error (ENAMETOOLONG rather than ENOENT).
Depends whether anyone cares about bprm->filename. But I think the
code should either return an error or allocate enough space.
--
Andy Lutomirski
AMA Capital Management, LLC
[Oops, re-send remembering to turn on plaintext mode -- sorry]
On Mon, Oct 27, 2014 at 6:47 PM, Andy Lutomirski <[email protected]> wrote:
> On Mon, Oct 27, 2014 at 11:03 AM, David Drysdale <[email protected]> wrote:
>> On Wed, Oct 22, 2014 at 7:44 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <[email protected]> wrote:
>>>> Add a new system execveat(2) syscall. execveat() is to execve() as
>>>> openat() is to open(): it takes a file descriptor that refers to a
>>>> directory, and resolves the filename relative to that.
>>>>
>>>
>>>> bprm->file = file;
>>>> - bprm->filename = bprm->interp = filename->name;
>>>> + if (fd == AT_FDCWD || filename->name[0] == '/') {
>>>> + bprm->filename = filename->name;
>>>> + } else {
>>>> + /*
>>>> + * Build a pathname that reflects how we got to the file,
>>>> + * either "/dev/fd/<fd>" (for an empty filename) or
>>>> + * "/dev/fd/<fd>/<filename>".
>>>> + */
>>>> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
>>>> + if (!pathbuf) {
>>>> + retval = -ENOMEM;
>>>> + goto out_unmark;
>>>> + }
>>>> + bprm->filename = pathbuf;
>>>> + if (filename->name[0] == '\0')
>>>> + sprintf(pathbuf, "/dev/fd/%d", fd);
>>>
>>> If the fd is O_CLOEXEC, then this will result in a confused child
>>> process. Should we fail exec attempts like that for non-static
>>> programs? (E.g. set filename to "" or something and fix up the binfmt
>>> drivers to handle that?)
>>
>> Isn't it just scripts that get confused here (as normal executables don't
>> get to see brpm->filename)?
>>
>> Given that we don't know which we have at this point, I'd suggest
>> carrying on regardless. Or we could fall back to use the previous
>> best-effort d_path() code for O_CLOEXEC fds. Thoughts?
>
> How hard would it be to mark the bprm as not having a path for the
> binary? Then we could fail later on if and when we actually need the
> path.
Adding a flag to bprm->interp_flags to indicate that the bprm->filename
will be inaccessible after exec is straightforward. But I'm not sure who
should/could make use of the flag...
> I don't really have a strong opinion here, though. I do prefer
> actually failing the execveat call over succeeding but invoking a
> script interpreter than can't possibly work.
Yeah, but that involves the kernel code (e.g. fs/binfmt_script.c) making
an assumption about what the interpreter is going to do -- specifically
that it's going to try to open its argv[1]. Admittedly, that's a very likely
assumption, but I'm not sure it's one the kernel should make -- a script
like "#!/bin/echo" wouldn't be very useful, but fexecve()ing it would still
work OK on a name like "/dev/fd/7" after fd 7 is closed.
(Also, we need some kind of non-empty name in bprm->filename,
even if it's going to be inaccessible later, so that any LSM processing
off of the bprm_set_creds()/bprm_check_security() hooks has something
to work with; those hooks are pre-exec so the "/dev/fd/<fd>" part should
still be OK at that point.)
So I guess I lean towards keeping "/dev/fd/<fd>/<path>" regardless.
>>
>>>> + else
>>>> + snprintf(pathbuf, PATH_MAX,
>>>> + "/dev/fd/%d/%s", fd, filename->name);
>>>
>>> Does this need to handle the case where the result exceeds PATH_MAX?
>>
>> I guess we could kmalloc(strlen(filename->name) + 19) to avoid the
>> possibility of failure, but that just defers the inevitable -- the interpreter
>> won't be able to open the script file anyway. But it would at least then
>> generate the appropriate error (ENAMETOOLONG rather than ENOENT).
>
> Depends whether anyone cares about bprm->filename. But I think the
> code should either return an error or allocate enough space.
I'll allocate enough space.
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC
On Tue, Oct 28, 2014 at 10:30 AM, David Drysdale <[email protected]> wrote:
> [Oops, re-send remembering to turn on plaintext mode -- sorry]
>
> On Mon, Oct 27, 2014 at 6:47 PM, Andy Lutomirski <[email protected]> wrote:
>> On Mon, Oct 27, 2014 at 11:03 AM, David Drysdale <[email protected]> wrote:
>>> On Wed, Oct 22, 2014 at 7:44 PM, Andy Lutomirski <[email protected]> wrote:
>>>> On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <[email protected]> wrote:
>>>>> Add a new system execveat(2) syscall. execveat() is to execve() as
>>>>> openat() is to open(): it takes a file descriptor that refers to a
>>>>> directory, and resolves the filename relative to that.
>>>>>
>>>>
>>>>> bprm->file = file;
>>>>> - bprm->filename = bprm->interp = filename->name;
>>>>> + if (fd == AT_FDCWD || filename->name[0] == '/') {
>>>>> + bprm->filename = filename->name;
>>>>> + } else {
>>>>> + /*
>>>>> + * Build a pathname that reflects how we got to the file,
>>>>> + * either "/dev/fd/<fd>" (for an empty filename) or
>>>>> + * "/dev/fd/<fd>/<filename>".
>>>>> + */
>>>>> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
>>>>> + if (!pathbuf) {
>>>>> + retval = -ENOMEM;
>>>>> + goto out_unmark;
>>>>> + }
>>>>> + bprm->filename = pathbuf;
>>>>> + if (filename->name[0] == '\0')
>>>>> + sprintf(pathbuf, "/dev/fd/%d", fd);
>>>>
>>>> If the fd is O_CLOEXEC, then this will result in a confused child
>>>> process. Should we fail exec attempts like that for non-static
>>>> programs? (E.g. set filename to "" or something and fix up the binfmt
>>>> drivers to handle that?)
>>>
>>> Isn't it just scripts that get confused here (as normal executables don't
>>> get to see brpm->filename)?
>>>
>>> Given that we don't know which we have at this point, I'd suggest
>>> carrying on regardless. Or we could fall back to use the previous
>>> best-effort d_path() code for O_CLOEXEC fds. Thoughts?
>>
>> How hard would it be to mark the bprm as not having a path for the
>> binary? Then we could fail later on if and when we actually need the
>> path.
>
> Adding a flag to bprm->interp_flags to indicate that the bprm->filename
> will be inaccessible after exec is straightforward. But I'm not sure who
> should/could make use of the flag...
>
>> I don't really have a strong opinion here, though. I do prefer
>> actually failing the execveat call over succeeding but invoking a
>> script interpreter than can't possibly work.
>
> Yeah, but that involves the kernel code (e.g. fs/binfmt_script.c) making
> an assumption about what the interpreter is going to do -- specifically
> that it's going to try to open its argv[1]. Admittedly, that's a very likely
> assumption, but I'm not sure it's one the kernel should make -- a script
> like "#!/bin/echo" wouldn't be very useful, but fexecve()ing it would still
> work OK on a name like "/dev/fd/7" after fd 7 is closed.
Hmm. I'm unconvinced. If an important part of executing the script
is passing it an argv[0] that can be opened, then I think we shouldn't
allow a known-bad argv[0].
>
> (Also, we need some kind of non-empty name in bprm->filename,
> even if it's going to be inaccessible later, so that any LSM processing
> off of the bprm_set_creds()/bprm_check_security() hooks has something
> to work with; those hooks are pre-exec so the "/dev/fd/<fd>" part should
> still be OK at that point.)
>
> So I guess I lean towards keeping "/dev/fd/<fd>/<path>" regardless.
>
>>>
>>>>> + else
>>>>> + snprintf(pathbuf, PATH_MAX,
>>>>> + "/dev/fd/%d/%s", fd, filename->name);
>>>>
>>>> Does this need to handle the case where the result exceeds PATH_MAX?
>>>
>>> I guess we could kmalloc(strlen(filename->name) + 19) to avoid the
>>> possibility of failure, but that just defers the inevitable -- the interpreter
>>> won't be able to open the script file anyway. But it would at least then
>>> generate the appropriate error (ENAMETOOLONG rather than ENOENT).
>>
>> Depends whether anyone cares about bprm->filename. But I think the
>> code should either return an error or allocate enough space.
>
> I'll allocate enough space.
>
>>
>> --
>> Andy Lutomirski
>> AMA Capital Management, LLC
--
Andy Lutomirski
AMA Capital Management, LLC