Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933348AbaJVLyS (ORCPT ); Wed, 22 Oct 2014 07:54:18 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:33836 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932723AbaJVLyO (ORCPT ); Wed, 22 Oct 2014 07:54:14 -0400 Date: Wed, 22 Oct 2014 04:54:05 -0700 From: Christoph Hellwig To: Andy Lutomirski Cc: David Drysdale , "Eric W. Biederman" , Alexander Viro , Meredydd Luff , "linux-kernel@vger.kernel.org" , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andrew Morton , Kees Cook , Arnd Bergmann , X86 ML , linux-arch , Linux API , Rich Felker Subject: Re: [PATCHv4 RESEND 0/3] syscalls,x86: Add execveat() system call Message-ID: <20141022115405.GA8593@infradead.org> References: <1401975635-6162-1-git-send-email-drysdale@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [adding Rich Felker to the Cc list, who has been very interested in a O_SEARCH implementation for which this would be an important building block] On Fri, Oct 17, 2014 at 02:45:03PM -0700, Andy Lutomirski wrote: > [Added Eric Biederman, since I think your tree might be a reasonable > route forward for these patches.] > > On Thu, Jun 5, 2014 at 6:40 AM, David Drysdale wrote: > > Resending, adding cc:linux-api. > > > > Also, it may help to add a little more background -- this patch is > > needed as a (small) part of implementing Capsicum in the Linux kernel. > > > > Capsicum is a security framework that has been present in FreeBSD since > > version 9.0 (Jan 2012), and is based on concepts from object-capability > > security [1]. > > > > One of the features of Capsicum is capability mode, which locks down > > access to global namespaces such as the filesystem hierarchy. In > > capability mode, /proc is thus inaccessible and so fexecve(3) doesn't > > work -- hence the need for a kernel-space > > I just found myself wanting this syscall for another reason: injecting > programs into sandboxes or otherwise heavily locked-down namespaces. > > For example, I want to be able to reliably do something like nsenter > --namespace-flags-here toybox sh. Toybox's shell is unusual in that > it is more or less fully functional, so this should Just Work (tm), > except that the toybox binary might not exist in the namespace being > entered. If execveat were available, I could rig nsenter or a similar > tool to open it with O_CLOEXEC, enter the namespace, and then call > execveat. > > Is there any reason that these patches can't be merged more or less as > is for 3.19? > > --Andy > > > > > [1] http://www.cl.cam.ac.uk/research/security/capsicum/papers/2010usenix-security-capsicum-website.pdf > > > > ------ > > > > This patch set adds execveat(2) for x86, and is derived from Meredydd > > Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). > > > > The primary aim of adding an execveat syscall is to allow an > > implementation of fexecve(3) that does not rely on the /proc > > filesystem. The current glibc version of fexecve(3) is implemented > > via /proc, which causes problems in sandboxed or otherwise restricted > > environments. > > > > Given the desire for a /proc-free fexecve() implementation, HPA > > suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) > > syscall would be an appropriate generalization. > > > > Also, having a new syscall means that it can take a flags argument > > without back-compatibility concerns. The current implementation just > > defines the AT_SYMLINK_NOFOLLOW flag, but other flags could be added > > in future -- for example, flags for new namespaces (as suggested at > > https://lkml.org/lkml/2006/7/11/474). > > > > Related history: > > - https://lkml.org/lkml/2006/12/27/123 is an example of someone > > realizing that fexecve() is likely to fail in a chroot environment. > > - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered > > documenting the /proc requirement of fexecve(3) in its manpage, to > > "prevent other people from wasting their time". > > - https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that > > it's not possible to fexecve() a file descriptor for a script with > > close-on-exec set (which is possible with the implementation here). > > - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a > > problem where a process that did setuid() could not fexecve() > > because it no longer had access to /proc/self/fd; this has since > > been fixed. > > > > > > Changes since Meredydd's v3 patch: > > - Added a selftest. > > - Added a man page. > > - Left open_exec() signature untouched to reduce patch impact > > elsewhere (as suggested by Al Viro). > > - Filled in bprm->filename with d_path() into a buffer, to avoid use > > of potentially-ephemeral dentry->d_name. > > - Patch against v3.14 (455c6fdbd21916). > > > > > > David Drysdale (2): > > syscalls,x86: implement execveat() system call > > syscalls,x86: add selftest for execveat(2) > > > > arch/x86/ia32/audit.c | 1 + > > arch/x86/ia32/ia32entry.S | 1 + > > arch/x86/kernel/audit_64.c | 1 + > > arch/x86/kernel/entry_64.S | 28 ++++ > > arch/x86/syscalls/syscall_32.tbl | 1 + > > arch/x86/syscalls/syscall_64.tbl | 2 + > > arch/x86/um/sys_call_table_64.c | 1 + > > fs/exec.c | 153 ++++++++++++++++--- > > include/linux/compat.h | 3 + > > include/linux/sched.h | 4 + > > include/linux/syscalls.h | 4 + > > include/uapi/asm-generic/unistd.h | 4 +- > > kernel/sys_ni.c | 3 + > > lib/audit.c | 3 + > > tools/testing/selftests/Makefile | 1 + > > tools/testing/selftests/exec/.gitignore | 6 + > > tools/testing/selftests/exec/Makefile | 32 ++++ > > tools/testing/selftests/exec/execveat.c | 251 ++++++++++++++++++++++++++++++++ > > 18 files changed, 476 insertions(+), 23 deletions(-) > > create mode 100644 tools/testing/selftests/exec/.gitignore > > create mode 100644 tools/testing/selftests/exec/Makefile > > create mode 100644 tools/testing/selftests/exec/execveat.c > > > > -- > > 1.9.1.423.g4596e3a > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-api" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Andy Lutomirski > AMA Capital Management, LLC > -- > To unsubscribe from this list: send the line "unsubscribe linux-api" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/