2014-11-24 11:54:16

by David Drysdale

[permalink] [raw]
Subject: [PATCHv10 0/5] syscalls,x86,sparc: Add execveat() system call

This patch set adds execveat(2) for x86 and sparc, and is derived from
Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc
filesystem, at least for executables (rather than scripts). The
current glibc version of fexecve(3) is implemented via /proc, which
causes problems in sandboxed or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA
suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2)
syscall would be an appropriate generalization.

Also, having a new syscall means that it can take a flags argument
without back-compatibility concerns. The current implementation just
defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other
flags could be added in future -- for example, flags for new namespaces
(as suggested at https://lkml.org/lkml/2006/7/11/474).

Related history:
- https://lkml.org/lkml/2006/12/27/123 is an example of someone
realizing that fexecve() is likely to fail in a chroot environment.
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
documenting the /proc requirement of fexecve(3) in its manpage, to
"prevent other people from wasting their time".
- https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
problem where a process that did setuid() could not fexecve()
because it no longer had access to /proc/self/fd; this has since
been fixed.


Changes since v9:
- Add sparc syscall wrappers and use correct sparc 32b compatibility
function [Stephen Rothwell, David S. Miller]

Changes since v8:
- Split core/fs changes from x86 changes [Thomas Gleixner]

Changes since v7:
- Speculatively wire up sparc version of syscall (untested)
- Fix leak of pathbuf in mainline arm [Oleg Nesterov]
- Add rcu_dereference_raw() on fdt access [sparse kbuild robot]
- Realigned comment [Andrew Morton]
- Merged up to v3.18-rc4

Changes since v6:
- Remove special case for O_PATH file descriptors [Andy Lutomirski]
- Use kasprintf rather than error-prone arithmetic [Kees Cook]
- Add test for long name [Kees Cook]
- Add test for non-executable O_PATH fd [Andy Lutomirski]

Changes since v5:
- Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts
that invoke an interpreter fail the exec (as they will not be able
to access the invoked file). [Andy Lutomirski]
- Don't truncate long paths. [Andy Lutomirski]
- Commonize code to open the executed file. [Eric W. Biederman]
- Mark O_PATH file descriptors so they cannot be fexecve()ed.
- Make self-test more helpful, and add additional cases:
- file offset non-zero
- binary file without execute bit
- O_CLOEXEC fds

Changes since v4, suggested by Eric W. Biederman:
- Use empty filename with AT_EMPTY_PATH flag rather than NULL
pathname to request fexecve-like behaviour.
- Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>")
rather than using d_path().
- Patch against v3.17 (bfe01a5ba249)

Changes since Meredydd's v3 patch:
- Added a selftest.
- Added a man page.
- Left open_exec() signature untouched to reduce patch impact
elsewhere (as suggested by Al Viro).
- Filled in bprm->filename with d_path() into a buffer, to avoid use
of potentially-ephemeral dentry->d_name.
- Patch against v3.14 (455c6fdbd21916).


David Drysdale (4):
syscalls: implement execveat() system call
x86: Hook up execveat system call.
syscalls: add selftest for execveat(2)
sparc: Hook up execveat system call.

arch/sparc/include/uapi/asm/unistd.h | 3 +-
arch/sparc/kernel/systbls_32.S | 1 +
arch/sparc/kernel/systbls_64.S | 2 +
arch/x86/ia32/audit.c | 1 +
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/kernel/audit_64.c | 1 +
arch/x86/kernel/entry_64.S | 28 +++
arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/syscall_64.tbl | 2 +
arch/x86/um/sys_call_table_64.c | 1 +
fs/binfmt_em86.c | 4 +
fs/binfmt_misc.c | 4 +
fs/binfmt_script.c | 10 +
fs/exec.c | 113 +++++++--
fs/namei.c | 2 +-
include/linux/binfmts.h | 4 +
include/linux/compat.h | 3 +
include/linux/fs.h | 1 +
include/linux/sched.h | 4 +
include/linux/syscalls.h | 5 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/sys_ni.c | 3 +
lib/audit.c | 3 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/exec/.gitignore | 9 +
tools/testing/selftests/exec/Makefile | 25 ++
tools/testing/selftests/exec/execveat.c | 397 ++++++++++++++++++++++++++++++++
27 files changed, 617 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/exec/.gitignore
create mode 100644 tools/testing/selftests/exec/Makefile
create mode 100644 tools/testing/selftests/exec/execveat.c

--
2.1.0.rc2.206.gedb03e5


2014-11-24 12:04:21

by David Drysdale

[permalink] [raw]
Subject: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)

Signed-off-by: David Drysdale <[email protected]>
---
man2/execveat.2 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 153 insertions(+)
create mode 100644 man2/execveat.2

diff --git a/man2/execveat.2 b/man2/execveat.2
new file mode 100644
index 000000000000..937d79e4c4f0
--- /dev/null
+++ b/man2/execveat.2
@@ -0,0 +1,153 @@
+.\" Copyright (c) 2014 Google, Inc.
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date. The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein. The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual"
+.SH NAME
+execveat \- execute program relative to a directory file descriptor
+.SH SYNOPSIS
+.B #include <unistd.h>
+.sp
+.BI "int execveat(int " fd ", const char *" pathname ","
+.br
+.BI " char *const " argv "[], char *const " envp "[],"
+.br
+.BI " int " flags);
+.SH DESCRIPTION
+The
+.BR execveat ()
+system call executes the program pointed to by the combination of \fIfd\fP and \fIpathname\fP.
+The
+.BR execveat ()
+system call operates in exactly the same way as
+.BR execve (2),
+except for the differences described in this manual page.
+
+If the pathname given in
+.I pathname
+is relative, then it is interpreted relative to the directory
+referred to by the file descriptor
+.I fd
+(rather than relative to the current working directory of
+the calling process, as is done by
+.BR execve (2)
+for a relative pathname).
+
+If
+.I pathname
+is relative and
+.I fd
+is the special value
+.BR AT_FDCWD ,
+then
+.I pathname
+is interpreted relative to the current working
+directory of the calling process (like
+.BR execve (2)).
+
+If
+.I pathname
+is absolute, then
+.I fd
+is ignored.
+
+If
+.I pathname
+is an empty string and the
+.BR AT_EMPTY_PATH
+flag is specified, then the file descriptor
+.I fd
+specifies the file to be executed.
+
+.I flags
+can either be 0, or include the following flags:
+.TP
+.BR AT_EMPTY_PATH
+If
+.I pathname
+is an empty string, operate on the file referred to by
+.IR fd
+(which may have been obtained using the
+.BR open (2)
+.B O_PATH
+flag).
+.TP
+.B AT_SYMLINK_NOFOLLOW
+If the file identified by
+.I fd
+and a non-NULL
+.I pathname
+is a symbolic link, then the call fails with the error
+.BR EINVAL .
+.SH "RETURN VALUE"
+On success,
+.BR execveat ()
+does not return. On error \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+The same errors that occur for
+.BR execve (2)
+can also occur for
+.BR execveat ().
+The following additional errors can occur for
+.BR execveat ():
+.TP
+.B EBADF
+.I fd
+is not a valid file descriptor.
+.TP
+.B ENOENT
+The program identified by \fIfd\fP and \fIpathname\fP requires the
+use of an interpreter program (such as a script starting with
+"#!") but the file descriptor
+.I fd
+was opened with the
+.B O_CLOEXEC
+flag and so the program file is inaccessible to the launched interpreter.
+.TP
+.B EINVAL
+Invalid flag specified in
+.IR flags .
+.TP
+.B ENOTDIR
+.I pathname
+is relative and
+.I fd
+is a file descriptor referring to a file other than a directory.
+.SH VERSIONS
+.BR execveat ()
+was added to Linux in kernel 3.???.
+.SH NOTES
+In addition to the reasons explained in
+.BR openat (2),
+the
+.BR execveat ()
+system call is also needed to allow
+.BR fexecve (3)
+to be implemented on systems that do not have the
+.I /proc
+filesystem mounted.
+.SH SEE ALSO
+.BR execve (2),
+.BR fexecve (3)
--
2.1.0.rc2.206.gedb03e5

2014-11-24 12:04:50

by David Drysdale

[permalink] [raw]
Subject: [PATCHv10 1/5] syscalls: implement execveat() system call

Add a new execveat(2) system call. execveat() is to execve() as
openat() is to open(): it takes a file descriptor that refers to a
directory, and resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers. This
replicates the functionality of fexecve(), which is a system call in
other UNIXen, but in Linux glibc it depends on opening
"/proc/self/fd/<fd>" (and so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found. This does however mean that
execution of a script in a /proc-less environment won't work; also,
script execution via an O_CLOEXEC file descriptor fails (as the file
will not be accessible after exec).

Based on patches by Meredydd Luff <[email protected]>

Signed-off-by: David Drysdale <[email protected]>
---
fs/binfmt_em86.c | 4 ++
fs/binfmt_misc.c | 4 ++
fs/binfmt_script.c | 10 ++++
fs/exec.c | 113 +++++++++++++++++++++++++++++++++-----
fs/namei.c | 2 +-
include/linux/binfmts.h | 4 ++
include/linux/compat.h | 3 +
include/linux/fs.h | 1 +
include/linux/sched.h | 4 ++
include/linux/syscalls.h | 5 ++
include/uapi/asm-generic/unistd.h | 4 +-
kernel/sys_ni.c | 3 +
lib/audit.c | 3 +
13 files changed, 145 insertions(+), 15 deletions(-)

diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index f37b08cea1f7..490538536cb4 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm)
return -ENOEXEC;
}

+ /* Need to be able to load the file after exec */
+ if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+ return -ENOENT;
+
allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index fd8beb9657a2..85acb8c83a9a 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -142,6 +142,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
if (!fmt)
goto _ret;

+ /* Need to be able to load the file after exec */
+ if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+ return -ENOENT;
+
if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
retval = remove_arg_zero(bprm);
if (retval)
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 5027a3e14922..afdf4e3cafc2 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm)

if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
return -ENOEXEC;
+
+ /*
+ * If the script filename will be inaccessible after exec, typically
+ * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
+ * up now (on the assumption that the interpreter will want to load
+ * this file).
+ */
+ if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+ return -ENOENT;
+
/*
* This section does the #! interpretation.
* Sorta complicated, but hopefully it will work. -TYT
diff --git a/fs/exec.c b/fs/exec.c
index 7302b75a9820..6ce5cc47a201 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -747,18 +747,25 @@ EXPORT_SYMBOL(setup_arg_pages);

#endif /* CONFIG_MMU */

-static struct file *do_open_exec(struct filename *name)
+static struct file *do_open_execat(int fd, struct filename *name, int flags)
{
struct file *file;
int err;
- static const struct open_flags open_exec_flags = {
+ struct open_flags open_exec_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
.acc_mode = MAY_EXEC | MAY_OPEN,
.intent = LOOKUP_OPEN,
.lookup_flags = LOOKUP_FOLLOW,
};

- file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
+ if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+ return ERR_PTR(-EINVAL);
+ if (flags & AT_SYMLINK_NOFOLLOW)
+ open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
+ if (flags & AT_EMPTY_PATH)
+ open_exec_flags.lookup_flags |= LOOKUP_EMPTY;
+
+ file = do_filp_open(fd, name, &open_exec_flags);
if (IS_ERR(file))
goto out;

@@ -769,12 +776,13 @@ static struct file *do_open_exec(struct filename *name)
if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
goto exit;

- fsnotify_open(file);
-
err = deny_write_access(file);
if (err)
goto exit;

+ if (name->name[0] != '\0')
+ fsnotify_open(file);
+
out:
return file;

@@ -786,7 +794,7 @@ exit:
struct file *open_exec(const char *name)
{
struct filename tmp = { .name = name };
- return do_open_exec(&tmp);
+ return do_open_execat(AT_FDCWD, &tmp, 0);
}
EXPORT_SYMBOL(open_exec);

@@ -1427,10 +1435,12 @@ static int exec_binprm(struct linux_binprm *bprm)
/*
* sys_execve() executes a new program.
*/
-static int do_execve_common(struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp)
+static int do_execveat_common(int fd, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags)
{
+ char *pathbuf = NULL;
struct linux_binprm *bprm;
struct file *file;
struct files_struct *displaced;
@@ -1471,7 +1481,7 @@ static int do_execve_common(struct filename *filename,
check_unsafe_exec(bprm);
current->in_execve = 1;

- file = do_open_exec(filename);
+ file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (IS_ERR(file))
goto out_unmark;
@@ -1479,7 +1489,28 @@ static int do_execve_common(struct filename *filename,
sched_exec();

bprm->file = file;
- bprm->filename = bprm->interp = filename->name;
+ if (fd == AT_FDCWD || filename->name[0] == '/') {
+ bprm->filename = filename->name;
+ } else {
+ if (filename->name[0] == '\0')
+ pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d", fd);
+ else
+ pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d/%s",
+ fd, filename->name);
+ if (!pathbuf) {
+ retval = -ENOMEM;
+ goto out_unmark;
+ }
+ /*
+ * Record that a name derived from an O_CLOEXEC fd will be
+ * inaccessible after exec. Relies on having exclusive access to
+ * current->files (due to unshare_files above).
+ */
+ if (close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
+ bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
+ bprm->filename = pathbuf;
+ }
+ bprm->interp = bprm->filename;

retval = bprm_mm_init(bprm);
if (retval)
@@ -1520,6 +1551,7 @@ static int do_execve_common(struct filename *filename,
acct_update_integrals(current);
task_numa_free(current);
free_bprm(bprm);
+ kfree(pathbuf);
putname(filename);
if (displaced)
put_files_struct(displaced);
@@ -1537,6 +1569,7 @@ out_unmark:

out_free:
free_bprm(bprm);
+ kfree(pathbuf);

out_files:
if (displaced)
@@ -1552,7 +1585,18 @@ int do_execve(struct filename *filename,
{
struct user_arg_ptr argv = { .ptr.native = __argv };
struct user_arg_ptr envp = { .ptr.native = __envp };
- return do_execve_common(filename, argv, envp);
+ return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+int do_execveat(int fd, struct filename *filename,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
+ int flags)
+{
+ struct user_arg_ptr argv = { .ptr.native = __argv };
+ struct user_arg_ptr envp = { .ptr.native = __envp };
+
+ return do_execveat_common(fd, filename, argv, envp, flags);
}

#ifdef CONFIG_COMPAT
@@ -1568,7 +1612,23 @@ static int compat_do_execve(struct filename *filename,
.is_compat = true,
.ptr.compat = __envp,
};
- return do_execve_common(filename, argv, envp);
+ return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+static int compat_do_execveat(int fd, struct filename *filename,
+ const compat_uptr_t __user *__argv,
+ const compat_uptr_t __user *__envp,
+ int flags)
+{
+ struct user_arg_ptr argv = {
+ .is_compat = true,
+ .ptr.compat = __argv,
+ };
+ struct user_arg_ptr envp = {
+ .is_compat = true,
+ .ptr.compat = __envp,
+ };
+ return do_execveat_common(fd, filename, argv, envp, flags);
}
#endif

@@ -1608,6 +1668,20 @@ SYSCALL_DEFINE3(execve,
{
return do_execve(getname(filename), argv, envp);
}
+
+SYSCALL_DEFINE5(execveat,
+ int, fd, const char __user *, filename,
+ const char __user *const __user *, argv,
+ const char __user *const __user *, envp,
+ int, flags)
+{
+ int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+ return do_execveat(fd,
+ getname_flags(filename, lookup_flags, NULL),
+ argv, envp, flags);
+}
+
#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
const compat_uptr_t __user *, argv,
@@ -1615,4 +1689,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
{
return compat_do_execve(getname(filename), argv, envp);
}
+
+COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
+ const char __user *, filename,
+ const compat_uptr_t __user *, argv,
+ const compat_uptr_t __user *, envp,
+ int, flags)
+{
+ int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+ return compat_do_execveat(fd,
+ getname_flags(filename, lookup_flags, NULL),
+ argv, envp, flags);
+}
#endif
diff --git a/fs/namei.c b/fs/namei.c
index db5fe86319e6..ca814165d84c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -130,7 +130,7 @@ void final_putname(struct filename *name)

#define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename))

-static struct filename *
+struct filename *
getname_flags(const char __user *filename, int flags, int *empty)
{
struct filename *result, *err;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 61f29e5ea840..576e4639ca60 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -53,6 +53,10 @@ struct linux_binprm {
#define BINPRM_FLAGS_EXECFD_BIT 1
#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)

+/* filename of the binary will be inaccessible after exec */
+#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
+#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)
+
/* Function parameter for binfmt->coredump */
struct coredump_params {
const siginfo_t *siginfo;
diff --git a/include/linux/compat.h b/include/linux/compat.h
index e6494261eaff..7450ca2ac1fc 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);

asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
const compat_uptr_t __user *envp);
+asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
+ const compat_uptr_t __user *argv,
+ const compat_uptr_t __user *envp, int flags);

asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
compat_ulong_t __user *outp, compat_ulong_t __user *exp,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ab779e8a63c..133b60b1d4d0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2072,6 +2072,7 @@ extern int vfs_open(const struct path *, struct file *, const struct cred *);
extern struct file * dentry_open(const struct path *, int, const struct cred *);
extern int filp_close(struct file *, fl_owner_t id);

+extern struct filename *getname_flags(const char __user *, int, int *);
extern struct filename *getname(const char __user *);
extern struct filename *getname_kernel(const char *);

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e344bbe63ec..344163d09efb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2441,6 +2441,10 @@ extern void do_group_exit(int);
extern int do_execve(struct filename *,
const char __user * const __user *,
const char __user * const __user *);
+extern int do_execveat(int, struct filename *,
+ const char __user * const __user *,
+ const char __user * const __user *,
+ int);
extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
struct task_struct *fork_idle(int);
extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index bda9b81357cc..1ff5a4d09693 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -877,4 +877,9 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
asmlinkage long sys_getrandom(char __user *buf, size_t count,
unsigned int flags);
asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+
+asmlinkage long sys_execveat(int dfd, const char __user *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp, int flags);
+
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 22749c134117..e016bd9b1a04 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -707,9 +707,11 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
__SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 280
__SYSCALL(__NR_bpf, sys_bpf)
+#define __NR_execveat 281
+__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)

#undef __NR_syscalls
-#define __NR_syscalls 281
+#define __NR_syscalls 282

/*
* All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 02aa4185b17e..832fba6e2eb1 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -224,3 +224,6 @@ cond_syscall(sys_seccomp);

/* access BPF programs and maps */
cond_syscall(sys_bpf);
+
+/* execveat */
+cond_syscall(sys_execveat);
diff --git a/lib/audit.c b/lib/audit.c
index 1d726a22565b..b8fb5ee81e26 100644
--- a/lib/audit.c
+++ b/lib/audit.c
@@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall)
case __NR_socketcall:
return 4;
#endif
+#ifdef __NR_execveat
+ case __NR_execveat:
+#endif
case __NR_execve:
return 5;
default:
--
2.1.0.rc2.206.gedb03e5

2014-11-24 12:04:47

by David Drysdale

[permalink] [raw]
Subject: [PATCHv10 3/5] syscalls: add selftest for execveat(2)

Signed-off-by: David Drysdale <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/exec/.gitignore | 9 +
tools/testing/selftests/exec/Makefile | 25 ++
tools/testing/selftests/exec/execveat.c | 397 ++++++++++++++++++++++++++++++++
4 files changed, 432 insertions(+)
create mode 100644 tools/testing/selftests/exec/.gitignore
create mode 100644 tools/testing/selftests/exec/Makefile
create mode 100644 tools/testing/selftests/exec/execveat.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 45f145c6f843..c14893b501a9 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += user
TARGETS += sysctl
TARGETS += firmware
TARGETS += ftrace
+TARGETS += exec

TARGETS_HOTPLUG = cpu-hotplug
TARGETS_HOTPLUG += memory-hotplug
diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
new file mode 100644
index 000000000000..64073e050c6a
--- /dev/null
+++ b/tools/testing/selftests/exec/.gitignore
@@ -0,0 +1,9 @@
+subdir*
+script*
+execveat
+execveat.symlink
+execveat.moved
+execveat.path.ephemeral
+execveat.ephemeral
+execveat.denatured
+xxxxxxxx*
\ No newline at end of file
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
new file mode 100644
index 000000000000..66dfc2ce1788
--- /dev/null
+++ b/tools/testing/selftests/exec/Makefile
@@ -0,0 +1,25 @@
+CC = $(CROSS_COMPILE)gcc
+CFLAGS = -Wall
+BINARIES = execveat
+DEPS = execveat.symlink execveat.denatured script subdir
+all: $(BINARIES) $(DEPS)
+
+subdir:
+ mkdir -p $@
+script:
+ echo '#!/bin/sh' > $@
+ echo 'exit $$*' >> $@
+ chmod +x $@
+execveat.symlink: execveat
+ ln -s -f $< $@
+execveat.denatured: execveat
+ cp $< $@
+ chmod -x $@
+%: %.c
+ $(CC) $(CFLAGS) -o $@ $^
+
+run_tests: all
+ ./execveat
+
+clean:
+ rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved xxxxx*
diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c
new file mode 100644
index 000000000000..33a5c06d95ca
--- /dev/null
+++ b/tools/testing/selftests/exec/execveat.c
@@ -0,0 +1,397 @@
+/*
+ * Copyright (c) 2014 Google, Inc.
+ *
+ * Licensed under the terms of the GNU GPL License version 2
+ *
+ * Selftests for execveat(2).
+ */
+
+#define _GNU_SOURCE /* to get O_PATH, AT_EMPTY_PATH */
+#include <sys/sendfile.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+static char longpath[2 * PATH_MAX] = "";
+static char *envp[] = { "IN_TEST=yes", NULL, NULL };
+static char *argv[] = { "execveat", "99", NULL };
+
+static int execveat_(int fd, const char *path, char **argv, char **envp,
+ int flags)
+{
+#ifdef __NR_execveat
+ return syscall(__NR_execveat, fd, path, argv, envp, flags);
+#else
+ errno = -ENOSYS;
+ return -1;
+#endif
+}
+
+#define check_execveat_fail(fd, path, flags, errno) \
+ _check_execveat_fail(fd, path, flags, errno, #errno)
+static int _check_execveat_fail(int fd, const char *path, int flags,
+ int expected_errno, const char *errno_str)
+{
+ int rc;
+
+ errno = 0;
+ printf("Check failure of execveat(%d, '%s', %d) with %s... ",
+ fd, path?:"(null)", flags, errno_str);
+ rc = execveat_(fd, path, argv, envp, flags);
+
+ if (rc > 0) {
+ printf("[FAIL] (unexpected success from execveat(2))\n");
+ return 1;
+ }
+ if (errno != expected_errno) {
+ printf("[FAIL] (expected errno %d (%s) not %d (%s)\n",
+ expected_errno, strerror(expected_errno),
+ errno, strerror(errno));
+ return 1;
+ }
+ printf("[OK]\n");
+ return 0;
+}
+
+static int check_execveat_invoked_rc(int fd, const char *path, int flags,
+ int expected_rc)
+{
+ int status;
+ int rc;
+ pid_t child;
+ int pathlen = path ? strlen(path) : 0;
+
+ if (pathlen > 40)
+ printf("Check success of execveat(%d, '%.20s...%s', %d)... ",
+ fd, path, (path + pathlen - 20), flags);
+ else
+ printf("Check success of execveat(%d, '%s', %d)... ",
+ fd, path?:"(null)", flags);
+ child = fork();
+ if (child < 0) {
+ printf("[FAIL] (fork() failed)\n");
+ return 1;
+ }
+ if (child == 0) {
+ /* Child: do execveat(). */
+ rc = execveat_(fd, path, argv, envp, flags);
+ printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n",
+ rc, errno, strerror(errno));
+ exit(1); /* should not reach here */
+ }
+ /* Parent: wait for & check child's exit status. */
+ rc = waitpid(child, &status, 0);
+ if (rc != child) {
+ printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc);
+ return 1;
+ }
+ if (!WIFEXITED(status)) {
+ printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n",
+ child, status);
+ return 1;
+ }
+ if (WEXITSTATUS(status) != expected_rc) {
+ printf("[FAIL] (child %d exited with %d not %d)\n",
+ child, WEXITSTATUS(status), expected_rc);
+ return 1;
+ }
+ printf("[OK]\n");
+ return 0;
+}
+
+static int check_execveat(int fd, const char *path, int flags)
+{
+ return check_execveat_invoked_rc(fd, path, flags, 99);
+}
+
+static char *concat(const char *left, const char *right)
+{
+ char *result = malloc(strlen(left) + strlen(right) + 1);
+
+ strcpy(result, left);
+ strcat(result, right);
+ return result;
+}
+
+static int open_or_die(const char *filename, int flags)
+{
+ int fd = open(filename, flags);
+
+ if (fd < 0) {
+ printf("Failed to open '%s'; "
+ "check prerequisites are available\n", filename);
+ exit(1);
+ }
+ return fd;
+}
+
+static void exe_cp(const char *src, const char *dest)
+{
+ int in_fd = open_or_die(src, O_RDONLY);
+ int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755);
+ struct stat info;
+
+ fstat(in_fd, &info);
+ sendfile(out_fd, in_fd, NULL, info.st_size);
+ close(in_fd);
+ close(out_fd);
+}
+
+#define XX_DIR_LEN 200
+static int check_execveat_pathmax(int dot_dfd, const char *src, int is_script)
+{
+ int fail = 0;
+ int ii, count, len;
+ char longname[XX_DIR_LEN + 1];
+ int fd;
+
+ if (*longpath == '\0') {
+ /* Create a filename close to PATH_MAX in length */
+ memset(longname, 'x', XX_DIR_LEN - 1);
+ longname[XX_DIR_LEN - 1] = '/';
+ longname[XX_DIR_LEN] = '\0';
+ count = (PATH_MAX - 3) / XX_DIR_LEN;
+ for (ii = 0; ii < count; ii++) {
+ strcat(longpath, longname);
+ mkdir(longpath, 0755);
+ }
+ len = (PATH_MAX - 3) - (count * XX_DIR_LEN);
+ if (len <= 0)
+ len = 1;
+ memset(longname, 'y', len);
+ longname[len] = '\0';
+ strcat(longpath, longname);
+ }
+ exe_cp(src, longpath);
+
+ /*
+ * Execute as a pre-opened file descriptor, which works whether this is
+ * a script or not (because the interpreter sees a filename like
+ * "/dev/fd/20").
+ */
+ fd = open(longpath, O_RDONLY);
+ if (fd > 0) {
+ printf("Invoke copy of '%s' via filename of length %lu:\n",
+ src, strlen(longpath));
+ fail += check_execveat(fd, "", AT_EMPTY_PATH);
+ } else {
+ printf("Failed to open length %lu filename, errno=%d (%s)\n",
+ strlen(longpath), errno, strerror(errno));
+ fail++;
+ }
+
+ /*
+ * Execute as a long pathname relative to ".". If this is a script,
+ * the interpreter will launch but fail to open the script because its
+ * name ("/dev/fd/5/xxx....") is bigger than PATH_MAX.
+ */
+ if (is_script)
+ fail += check_execveat_invoked_rc(dot_dfd, longpath, 0, 127);
+ else
+ fail += check_execveat(dot_dfd, longpath, 0);
+
+ return fail;
+}
+
+static int run_tests(void)
+{
+ int fail = 0;
+ char *fullname = realpath("execveat", NULL);
+ char *fullname_script = realpath("script", NULL);
+ char *fullname_symlink = concat(fullname, ".symlink");
+ int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY);
+ int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral",
+ O_DIRECTORY|O_RDONLY);
+ int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY);
+ int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH);
+ int dot_dfd_cloexec = open_or_die(".", O_DIRECTORY|O_RDONLY|O_CLOEXEC);
+ int fd = open_or_die("execveat", O_RDONLY);
+ int fd_path = open_or_die("execveat", O_RDONLY|O_PATH);
+ int fd_symlink = open_or_die("execveat.symlink", O_RDONLY);
+ int fd_denatured = open_or_die("execveat.denatured", O_RDONLY);
+ int fd_denatured_path = open_or_die("execveat.denatured",
+ O_RDONLY|O_PATH);
+ int fd_script = open_or_die("script", O_RDONLY);
+ int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY);
+ int fd_ephemeral_path = open_or_die("execveat.path.ephemeral",
+ O_RDONLY|O_PATH);
+ int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY);
+ int fd_cloexec = open_or_die("execveat", O_RDONLY|O_CLOEXEC);
+ int fd_script_cloexec = open_or_die("script", O_RDONLY|O_CLOEXEC);
+
+ /* Change file position to confirm it doesn't affect anything */
+ lseek(fd, 10, SEEK_SET);
+
+ /* Normal executable file: */
+ /* dfd + path */
+ fail += check_execveat(subdir_dfd, "../execveat", 0);
+ fail += check_execveat(dot_dfd, "execveat", 0);
+ fail += check_execveat(dot_dfd_path, "execveat", 0);
+ /* absolute path */
+ fail += check_execveat(AT_FDCWD, fullname, 0);
+ /* absolute path with nonsense dfd */
+ fail += check_execveat(99, fullname, 0);
+ /* fd + no path */
+ fail += check_execveat(fd, "", AT_EMPTY_PATH);
+ /* O_CLOEXEC fd + no path */
+ fail += check_execveat(fd_cloexec, "", AT_EMPTY_PATH);
+ /* O_PATH fd */
+ fail += check_execveat(fd_path, "", AT_EMPTY_PATH);
+
+ /* Mess with executable file that's already open: */
+ /* fd + no path to a file that's been renamed */
+ rename("execveat.ephemeral", "execveat.moved");
+ fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+ /* fd + no path to a file that's been deleted */
+ unlink("execveat.moved"); /* remove the file now fd open */
+ fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+
+ /* Mess with executable file that's already open with O_PATH */
+ /* fd + no path to a file that's been deleted */
+ unlink("execveat.path.ephemeral");
+ fail += check_execveat(fd_ephemeral_path, "", AT_EMPTY_PATH);
+
+ /* Invalid argument failures */
+ fail += check_execveat_fail(fd, "", 0, ENOENT);
+ fail += check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT);
+
+ /* Symlink to executable file: */
+ /* dfd + path */
+ fail += check_execveat(dot_dfd, "execveat.symlink", 0);
+ fail += check_execveat(dot_dfd_path, "execveat.symlink", 0);
+ /* absolute path */
+ fail += check_execveat(AT_FDCWD, fullname_symlink, 0);
+ /* fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */
+ fail += check_execveat(fd_symlink, "", AT_EMPTY_PATH);
+ fail += check_execveat(fd_symlink, "",
+ AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+
+ /* Symlink fails when AT_SYMLINK_NOFOLLOW set: */
+ /* dfd + path */
+ fail += check_execveat_fail(dot_dfd, "execveat.symlink",
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+ fail += check_execveat_fail(dot_dfd_path, "execveat.symlink",
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+ /* absolute path */
+ fail += check_execveat_fail(AT_FDCWD, fullname_symlink,
+ AT_SYMLINK_NOFOLLOW, ELOOP);
+
+ /* Shell script wrapping executable file: */
+ /* dfd + path */
+ fail += check_execveat(subdir_dfd, "../script", 0);
+ fail += check_execveat(dot_dfd, "script", 0);
+ fail += check_execveat(dot_dfd_path, "script", 0);
+ /* absolute path */
+ fail += check_execveat(AT_FDCWD, fullname_script, 0);
+ /* fd + no path */
+ fail += check_execveat(fd_script, "", AT_EMPTY_PATH);
+ fail += check_execveat(fd_script, "",
+ AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+ /* O_CLOEXEC fd fails for a script (as script file inaccessible) */
+ fail += check_execveat_fail(fd_script_cloexec, "", AT_EMPTY_PATH,
+ ENOENT);
+ fail += check_execveat_fail(dot_dfd_cloexec, "script", 0, ENOENT);
+
+ /* Mess with script file that's already open: */
+ /* fd + no path to a file that's been renamed */
+ rename("script.ephemeral", "script.moved");
+ fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+ /* fd + no path to a file that's been deleted */
+ unlink("script.moved"); /* remove the file while fd open */
+ fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+
+ /* Rename a subdirectory in the path: */
+ rename("subdir.ephemeral", "subdir.moved");
+ fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
+ fail += check_execveat(subdir_dfd_ephemeral, "script", 0);
+ /* Remove the subdir and its contents */
+ unlink("subdir.moved/script");
+ unlink("subdir.moved");
+ /* Shell loads via deleted subdir OK because name starts with .. */
+ fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
+ fail += check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT);
+
+ /* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */
+ fail += check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL);
+ /* Invalid path => ENOENT */
+ fail += check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT);
+ fail += check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT);
+ fail += check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT);
+ /* Attempt to execute directory => EACCES */
+ fail += check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES);
+ /* Attempt to execute non-executable => EACCES */
+ fail += check_execveat_fail(dot_dfd, "Makefile", 0, EACCES);
+ fail += check_execveat_fail(fd_denatured, "", AT_EMPTY_PATH, EACCES);
+ fail += check_execveat_fail(fd_denatured_path, "", AT_EMPTY_PATH,
+ EACCES);
+ /* Attempt to execute nonsense FD => EBADF */
+ fail += check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF);
+ fail += check_execveat_fail(99, "execveat", 0, EBADF);
+ /* Attempt to execute relative to non-directory => ENOTDIR */
+ fail += check_execveat_fail(fd, "execveat", 0, ENOTDIR);
+
+ fail += check_execveat_pathmax(dot_dfd, "execveat", 0);
+ fail += check_execveat_pathmax(dot_dfd, "script", 1);
+ return fail;
+}
+
+static void prerequisites(void)
+{
+ int fd;
+ const char *script = "#!/bin/sh\nexit $*\n";
+
+ /* Create ephemeral copies of files */
+ exe_cp("execveat", "execveat.ephemeral");
+ exe_cp("execveat", "execveat.path.ephemeral");
+ exe_cp("script", "script.ephemeral");
+ mkdir("subdir.ephemeral", 0755);
+
+ fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755);
+ write(fd, script, strlen(script));
+ close(fd);
+}
+
+int main(int argc, char **argv)
+{
+ int ii;
+ int rc;
+ const char *verbose = getenv("VERBOSE");
+
+ if (argc >= 2) {
+ /* If we are invoked with an argument, don't run tests. */
+ const char *in_test = getenv("IN_TEST");
+
+ if (verbose) {
+ printf(" invoked with:");
+ for (ii = 0; ii < argc; ii++)
+ printf(" [%d]='%s'", ii, argv[ii]);
+ printf("\n");
+ }
+
+ /* Check expected environment transferred. */
+ if (!in_test || strcmp(in_test, "yes") != 0) {
+ printf("[FAIL] (no IN_TEST=yes in env)\n");
+ return 1;
+ }
+
+ /* Use the final argument as an exit code. */
+ rc = atoi(argv[argc - 1]);
+ fflush(stdout);
+ } else {
+ prerequisites();
+ if (verbose)
+ envp[1] = "VERBOSE=1";
+ rc = run_tests();
+ if (rc > 0)
+ printf("%d tests failed\n", rc);
+ }
+ return rc;
+}
--
2.1.0.rc2.206.gedb03e5

2014-11-24 12:04:45

by David Drysdale

[permalink] [raw]
Subject: [PATCHv10 2/5] x86: Hook up execveat system call.

Hook up x86-64, i386 and x32 ABIs.

Signed-off-by: David Drysdale <[email protected]>
---
arch/x86/ia32/audit.c | 1 +
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/kernel/audit_64.c | 1 +
arch/x86/kernel/entry_64.S | 28 ++++++++++++++++++++++++++++
arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/syscall_64.tbl | 2 ++
arch/x86/um/sys_call_table_64.c | 1 +
7 files changed, 35 insertions(+)

diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c
index 5d7b381da692..2eccc8932ae6 100644
--- a/arch/x86/ia32/audit.c
+++ b/arch/x86/ia32/audit.c
@@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
case __NR_socketcall:
return 4;
case __NR_execve:
+ case __NR_execveat:
return 5;
default:
return 1;
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index ffe71228fc10..82e8a1d44658 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -480,6 +480,7 @@ GLOBAL(\label)
PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
PTREGSCALL stub32_sigreturn, sys32_sigreturn
PTREGSCALL stub32_execve, compat_sys_execve
+ PTREGSCALL stub32_execveat, compat_sys_execveat
PTREGSCALL stub32_fork, sys_fork
PTREGSCALL stub32_vfork, sys_vfork

diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c
index 06d3e5a14d9d..f3672508b249 100644
--- a/arch/x86/kernel/audit_64.c
+++ b/arch/x86/kernel/audit_64.c
@@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
case __NR_openat:
return 3;
case __NR_execve:
+ case __NR_execveat:
return 5;
default:
return 0;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index df088bb03fb3..40d893c60fcc 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -652,6 +652,20 @@ ENTRY(stub_execve)
CFI_ENDPROC
END(stub_execve)

+ENTRY(stub_execveat)
+ CFI_STARTPROC
+ addq $8, %rsp
+ PARTIAL_FRAME 0
+ SAVE_REST
+ FIXUP_TOP_OF_STACK %r11
+ call sys_execveat
+ RESTORE_TOP_OF_STACK %r11
+ movq %rax,RAX(%rsp)
+ RESTORE_REST
+ jmp int_ret_from_sys_call
+ CFI_ENDPROC
+END(stub_execveat)
+
/*
* sigreturn is special because it needs to restore all registers on return.
* This cannot be done with SYSRET, so use the IRET return path instead.
@@ -697,6 +711,20 @@ ENTRY(stub_x32_execve)
CFI_ENDPROC
END(stub_x32_execve)

+ENTRY(stub_x32_execveat)
+ CFI_STARTPROC
+ addq $8, %rsp
+ PARTIAL_FRAME 0
+ SAVE_REST
+ FIXUP_TOP_OF_STACK %r11
+ call compat_sys_execveat
+ RESTORE_TOP_OF_STACK %r11
+ movq %rax,RAX(%rsp)
+ RESTORE_REST
+ jmp int_ret_from_sys_call
+ CFI_ENDPROC
+END(stub_x32_execveat)
+
#endif

/*
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d002f0..b3560ece1c9f 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,4 @@
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
+358 i386 execveat sys_execveat stub32_execveat
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b539a2..8d656fbb57aa 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,7 @@
319 common memfd_create sys_memfd_create
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
+322 64 execveat stub_execveat

#
# x32-specific system call numbers start at 512 to avoid cache impact
@@ -366,3 +367,4 @@
542 x32 getsockopt compat_sys_getsockopt
543 x32 io_setup compat_sys_io_setup
544 x32 io_submit compat_sys_io_submit
+545 x32 execveat stub_x32_execveat
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index f2f0723070ca..20c3649d0691 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -31,6 +31,7 @@
#define stub_fork sys_fork
#define stub_vfork sys_vfork
#define stub_execve sys_execve
+#define stub_execveat sys_execveat
#define stub_rt_sigreturn sys_rt_sigreturn

#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
--
2.1.0.rc2.206.gedb03e5

2014-11-24 12:04:41

by David Drysdale

[permalink] [raw]
Subject: [PATCHv10 4/5] sparc: Hook up execveat system call.

Signed-off-by: David Drysdale <[email protected]>
---
arch/sparc/include/uapi/asm/unistd.h | 3 ++-
arch/sparc/kernel/syscalls.S | 10 ++++++++++
arch/sparc/kernel/systbls_32.S | 1 +
arch/sparc/kernel/systbls_64.S | 2 ++
4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/include/uapi/asm/unistd.h b/arch/sparc/include/uapi/asm/unistd.h
index 46d83842eddc..6f35f4df17f2 100644
--- a/arch/sparc/include/uapi/asm/unistd.h
+++ b/arch/sparc/include/uapi/asm/unistd.h
@@ -415,8 +415,9 @@
#define __NR_getrandom 347
#define __NR_memfd_create 348
#define __NR_bpf 349
+#define __NR_execveat 350

-#define NR_syscalls 350
+#define NR_syscalls 351

/* Bitmask values returned from kern_features system call. */
#define KERN_FEATURE_MIXED_MODE_STACK 0x00000001
diff --git a/arch/sparc/kernel/syscalls.S b/arch/sparc/kernel/syscalls.S
index 33a17e7b3ccd..bb0008927598 100644
--- a/arch/sparc/kernel/syscalls.S
+++ b/arch/sparc/kernel/syscalls.S
@@ -6,6 +6,11 @@ sys64_execve:
jmpl %g1, %g0
flushw

+sys64_execveat:
+ set sys_execveat, %g1
+ jmpl %g1, %g0
+ flushw
+
#ifdef CONFIG_COMPAT
sunos_execv:
mov %g0, %o2
@@ -13,6 +18,11 @@ sys32_execve:
set compat_sys_execve, %g1
jmpl %g1, %g0
flushw
+
+sys32_execveat:
+ set compat_sys_execveat, %g1
+ jmpl %g1, %g0
+ flushw
#endif

.align 32
diff --git a/arch/sparc/kernel/systbls_32.S b/arch/sparc/kernel/systbls_32.S
index ad0cdf497b78..e31a9056a303 100644
--- a/arch/sparc/kernel/systbls_32.S
+++ b/arch/sparc/kernel/systbls_32.S
@@ -87,3 +87,4 @@ sys_call_table:
/*335*/ .long sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
/*340*/ .long sys_ni_syscall, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
/*345*/ .long sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+/*350*/ .long sys_execveat
diff --git a/arch/sparc/kernel/systbls_64.S b/arch/sparc/kernel/systbls_64.S
index 580cde9370c9..d72f76ae70eb 100644
--- a/arch/sparc/kernel/systbls_64.S
+++ b/arch/sparc/kernel/systbls_64.S
@@ -88,6 +88,7 @@ sys_call_table32:
.word sys_syncfs, compat_sys_sendmmsg, sys_setns, compat_sys_process_vm_readv, compat_sys_process_vm_writev
/*340*/ .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
.word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+/*350*/ .word sys32_execveat

#endif /* CONFIG_COMPAT */

@@ -167,3 +168,4 @@ sys_call_table:
.word sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
/*340*/ .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
.word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+/*350*/ .word sys64_execveat
--
2.1.0.rc2.206.gedb03e5

2014-11-24 12:46:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv10 2/5] x86: Hook up execveat system call.

On Mon, 24 Nov 2014, David Drysdale wrote:

> Hook up x86-64, i386 and x32 ABIs.
>
> Signed-off-by: David Drysdale <[email protected]>

Reviewed-by: Thomas Gleixner <[email protected]>

2014-11-24 17:07:48

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCHv10 2/5] x86: Hook up execveat system call.

On Mon, Nov 24, 2014 at 11:53:56AM +0000, David Drysdale wrote:
> Hook up x86-64, i386 and x32 ABIs.
>
> Signed-off-by: David Drysdale <[email protected]>

This one has been breaking my linux-next build for the past week. I'm
not sure what's going on. I build with a script:

make allmodconfig

cat << EOF >> .config
CONFIG_DYNAMIC_DEBUG=n
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=n
CONFIG_DYNAMIC_DEBUG=y
EOF

make oldconfig

Here are the errors:

CHK include/generated/compile.h
CHECK arch/x86/ia32/audit.c
CC arch/x86/ia32/audit.o
arch/x86/ia32/audit.c: In function ‘ia32_classify_syscall’:
arch/x86/ia32/audit.c:38:7: error: ‘__NR_execveat’ undeclared (first use in this function)
arch/x86/ia32/audit.c:38:7: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [arch/x86/ia32/audit.o] Error 1
make[2]: Target `__build' not remade because of errors.
make[1]: *** [arch/x86/ia32] Error 2
CHECK arch/x86/kernel/audit_64.c
CHK kernel/config_data.h
CC arch/x86/kernel/audit_64.o
arch/x86/kernel/audit_64.c: In function ‘audit_classify_syscall’:
arch/x86/kernel/audit_64.c:53:7: error: ‘__NR_execveat’ undeclared (first use in this function)
arch/x86/kernel/audit_64.c:53:7: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [arch/x86/kernel/audit_64.o] Error 1
make[2]: Target `__build' not remade because of errors.
make[1]: *** [arch/x86/kernel] Error 2
make[1]: Target `__build' not remade because of errors.

regards,
dan carpenter

2014-11-24 18:26:48

by David Drysdale

[permalink] [raw]
Subject: Re: [PATCHv10 2/5] x86: Hook up execveat system call.

On Mon, Nov 24, 2014 at 5:06 PM, Dan Carpenter <[email protected]> wrote:
> On Mon, Nov 24, 2014 at 11:53:56AM +0000, David Drysdale wrote:
>> Hook up x86-64, i386 and x32 ABIs.
>>
>> Signed-off-by: David Drysdale <[email protected]>
>
> This one has been breaking my linux-next build for the past week. I'm
> not sure what's going on.

Hi Dan,

Sorry if this has been causing you problems -- I've not had any
errors from the kbuild robots or my local builds.

> I build with a script:
>
> make allmodconfig
>
> cat << EOF >> .config
> CONFIG_DYNAMIC_DEBUG=n
> CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=n
> CONFIG_DYNAMIC_DEBUG=y
> EOF
>
> make oldconfig
>
> Here are the errors:
>
> CHK include/generated/compile.h
> CHECK arch/x86/ia32/audit.c
> CC arch/x86/ia32/audit.o
> arch/x86/ia32/audit.c: In function ‘ia32_classify_syscall’:
> arch/x86/ia32/audit.c:38:7: error: ‘__NR_execveat’ undeclared (first use in this function)
> arch/x86/ia32/audit.c:38:7: note: each undeclared identifier is reported only once for each function it appears in
> make[2]: *** [arch/x86/ia32/audit.o] Error 1
> make[2]: Target `__build' not remade because of errors.
> make[1]: *** [arch/x86/ia32] Error 2
> CHECK arch/x86/kernel/audit_64.c
> CHK kernel/config_data.h
> CC arch/x86/kernel/audit_64.o
> arch/x86/kernel/audit_64.c: In function ‘audit_classify_syscall’:
> arch/x86/kernel/audit_64.c:53:7: error: ‘__NR_execveat’ undeclared (first use in this function)
> arch/x86/kernel/audit_64.c:53:7: note: each undeclared identifier is reported only once for each function it appears in
> make[2]: *** [arch/x86/kernel/audit_64.o] Error 1
> make[2]: Target `__build' not remade because of errors.
> make[1]: *** [arch/x86/kernel] Error 2
> make[1]: Target `__build' not remade because of errors.

That seems odd -- the generic definition of __NR_execveat is in the first
patch in the series, and the various x86-specific definitions should get
generated from the table entries in the second patch in the series (at
least since the v9 set I sent on 19 Nov, which split out the x86 wiring
from the general implementation).

Are the syscall table generation steps happening in your build? And does
__NR_execveat appear in the various generated x86 unistd*.h headers?

As an aside, I've just built next-20141124 (a4cfa44aa26a) fine from
scratch with your config steps. The build output included the header
generation steps:

SYSTBL arch/x86/syscalls/../include/generated/asm/syscalls_32.h
SYSHDR arch/x86/syscalls/../include/generated/asm/unistd_32_ia32.h
SYSHDR arch/x86/syscalls/../include/generated/asm/unistd_64_x32.h
SYSTBL arch/x86/syscalls/../include/generated/asm/syscalls_64.h
SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_32.h
HOSTCC scripts/basic/bin2c
SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_64.h
SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_x32.h

and the resulting files did include the __NR_execveat constant:

% grep execveat usr/include/asm*/*.h arch/x86/include/generated/uapi/asm/*.h
usr/include/asm-generic/unistd.h:#define __NR_execveat 281
usr/include/asm-generic/unistd.h:__SC_COMP(__NR_execveat,
sys_execveat, compat_sys_execveat)
usr/include/asm/unistd_32.h:#define __NR_execveat 358
usr/include/asm/unistd_64.h:#define __NR_execveat 322
usr/include/asm/unistd_x32.h:#define __NR_execveat (__X32_SYSCALL_BIT + 545)
arch/x86/include/generated/uapi/asm/unistd_32.h:#define __NR_execveat 358
arch/x86/include/generated/uapi/asm/unistd_64.h:#define __NR_execveat 322
arch/x86/include/generated/uapi/asm/unistd_x32.h:#define
__NR_execveat (__X32_SYSCALL_BIT + 545)

So I can't (yet) reproduce your problem I'm afraid...

> regards,
> dan carpenter

2014-11-24 18:36:33

by David Miller

[permalink] [raw]
Subject: Re: [PATCHv10 4/5] sparc: Hook up execveat system call.

From: David Drysdale <[email protected]>
Date: Mon, 24 Nov 2014 11:53:58 +0000

> Signed-off-by: David Drysdale <[email protected]>

Acked-by: David S. Miller <[email protected]>>

2014-11-24 18:56:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv10 2/5] x86: Hook up execveat system call.

On Mon, 24 Nov 2014, Dan Carpenter wrote:

> On Mon, Nov 24, 2014 at 11:53:56AM +0000, David Drysdale wrote:
> > Hook up x86-64, i386 and x32 ABIs.
> >
> > Signed-off-by: David Drysdale <[email protected]>
>
> This one has been breaking my linux-next build for the past week. I'm
> not sure what's going on. I build with a script:
>
> make allmodconfig
>
> cat << EOF >> .config
> CONFIG_DYNAMIC_DEBUG=n
> CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=n
> CONFIG_DYNAMIC_DEBUG=y
> EOF
>
> make oldconfig
>
> Here are the errors:
>
> CHK include/generated/compile.h
> CHECK arch/x86/ia32/audit.c
> CC arch/x86/ia32/audit.o
> arch/x86/ia32/audit.c: In function ‘ia32_classify_syscall’:
> arch/x86/ia32/audit.c:38:7: error: ‘__NR_execveat’ undeclared (first use in this function)
> arch/x86/ia32/audit.c:38:7: note: each undeclared identifier is reported only once for each function it appears in
> make[2]: *** [arch/x86/ia32/audit.o] Error 1
> make[2]: Target `__build' not remade because of errors.
> make[1]: *** [arch/x86/ia32] Error 2
> CHECK arch/x86/kernel/audit_64.c
> CHK kernel/config_data.h
> CC arch/x86/kernel/audit_64.o
> arch/x86/kernel/audit_64.c: In function ‘audit_classify_syscall’:
> arch/x86/kernel/audit_64.c:53:7: error: ‘__NR_execveat’ undeclared (first use in this function)
> arch/x86/kernel/audit_64.c:53:7: note: each undeclared identifier is reported only once for each function it appears in
> make[2]: *** [arch/x86/kernel/audit_64.o] Error 1
> make[2]: Target `__build' not remade because of errors.
> make[1]: *** [arch/x86/kernel] Error 2
> make[1]: Target `__build' not remade because of errors.

I don't know what you're doing.

Tried the above and it rebuilds the relevant unistd*.h files and
compiles happily.

Thanks,

tglx

2014-11-25 12:17:08

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCHv10 2/5] x86: Hook up execveat system call.

On Mon, Nov 24, 2014 at 06:26:24PM +0000, David Drysdale wrote:
> On Mon, Nov 24, 2014 at 5:06 PM, Dan Carpenter <[email protected]> wrote:
> > On Mon, Nov 24, 2014 at 11:53:56AM +0000, David Drysdale wrote:
> >> Hook up x86-64, i386 and x32 ABIs.
> >>
> >> Signed-off-by: David Drysdale <[email protected]>
> >
> > This one has been breaking my linux-next build for the past week. I'm
> > not sure what's going on.
>
> Hi Dan,
>
> Sorry if this has been causing you problems -- I've not had any
> errors from the kbuild robots or my local builds.
>

For some reason I had a stale copy of
arch/x86/include/generated/asm/unistd_32.h and it was using that in
preference to the arch/x86/include/generated/uapi/asm/unistd_32.h file.
Once I did ran:

arch/x86/include/generated/ -rf

Then it builds now.

I'm not sure what that's all about but it's fixed now.

regards,
dan carpenter