Received: by 10.223.185.116 with SMTP id b49csp234997wrg; Thu, 8 Mar 2018 16:27:04 -0800 (PST) X-Google-Smtp-Source: AG47ELsDtt9aWhxIxvIitWe/Od510YpAwOVOtkc57oqug46aj07Pi1f2RogY5D4B9bPq5dTEcjsE X-Received: by 10.98.252.22 with SMTP id e22mr28125676pfh.235.1520555224702; Thu, 08 Mar 2018 16:27:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520555224; cv=none; d=google.com; s=arc-20160816; b=u84wFf+VLvYXw2kGa2GQ0vkAbldeC+MgGswnoAt6uX++VDc5ai0KVsT9jP0KBRV89F eH+lGSShxuu9QPZk08yP/gcPLAW+zH8p0Th0q8X/SbKHtIEJFk9EiHNfgtumHYsy+ltr UcZmARUCPNSCHH/UVKCNIh1kQdRrwBXmCQXDgXvdTkYEoTJRSvsOFYZC32PD4y5GIiHO 6r9IQDw1GYujjjfG06gH4qxZ28xIDHNCAZwY1KlkgERGKioe89c1gjdg9pmjbZE7Ihaj i1xl5q0o5IysL4zoleTkOv8LUFploKS7Z67G25GnDeXDwDKDaLVBzvboNA5tA/vHrWXz 1BMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature:dkim-signature :arc-authentication-results; bh=dBkgloX7YfvIN3b96cL/wFypruhdxCY+64W4AsCcMIU=; b=HlOb1P8V8aYmpzeuRZNAPDmSXH/eauVxPK9EBmcoQC/4JX2DzIkZ8oT4na0cUvEXBe fK9JhA8tQjPTeDj5+TxIJyjfynYX+9e7dvfWcvIePmMzyn11AXNiIJjFmnm1G9CBjZj1 K0xOy5cEVD+AGW9YBOhzN0Jxe/LJDibpxVrWEPCNSWY6Aqkbd8VQFncfDDgR7OU4qALl 9kN8r/VZXQpmI6XPPDz7xWPzvXPIE/DbTpsjhUK0ZYVkexsXXrFX41zMxV2p0qOqHJXX Aq1RG6z3D7ubVulGfkgtqgERWUiuUzE2e3i7bps4A9qoCbr6gyvqvhWrRnItyuEFg9iv mHig== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@google.com header.s=20161025 header.b=qlcZQ98C; dkim=fail header.i=@chromium.org header.s=google header.b=T80vAwsG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p8si13843911pgr.674.2018.03.08.16.26.49; Thu, 08 Mar 2018 16:27:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@google.com header.s=20161025 header.b=qlcZQ98C; dkim=fail header.i=@chromium.org header.s=google header.b=T80vAwsG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751434AbeCIAYi (ORCPT + 99 others); Thu, 8 Mar 2018 19:24:38 -0500 Received: from mail-vk0-f49.google.com ([209.85.213.49]:36831 "EHLO mail-vk0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751188AbeCIAYf (ORCPT ); Thu, 8 Mar 2018 19:24:35 -0500 Received: by mail-vk0-f49.google.com with SMTP id q7so1107226vkh.3 for ; Thu, 08 Mar 2018 16:24:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=dBkgloX7YfvIN3b96cL/wFypruhdxCY+64W4AsCcMIU=; b=qlcZQ98CJiNtRIkSAP9pUk8dSlJKAx0Q3TP22PhC6+dU8jqTksoUwt1mpCjcagPy9x a6VeXSkxrK32WBRcoPsuOaLtt5CkPA2/mpjUylMQYdMkTZOaMl2aXerO5jVtJTz6gooA I4DyMsVEDZMx2lOEM50guNcQd+B80xJlA3fd+mxE2uVnWX34nrvBzsC9zbM0TGMEyq5l Fr2bK9PtR3lF75ivzPeUldhInSc2gMe9NQuYa/XGDa61LuLW0WsrMapT/tajuSKDZUkB DnSOum/wrQGUqMTSW4+UP0kko2PtFAZR5sK/WEX4zZSJW0pGhJ2kB5NUBAOVxo2gA3hA Muxw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=dBkgloX7YfvIN3b96cL/wFypruhdxCY+64W4AsCcMIU=; b=T80vAwsGMQRHVaG4mlt//0DrNFcVrRBSWtFyJhbxSAaD+WlQX5HIK3tcQlbXw1NXe5 c9QmShh1eNAqersikF/R5Jf8m4azP981LLrji4hlxH35I10+5Ii6C++yyRDgeE6wptMh B4lDvv+562SX/gmPb4189q8eZmwB/YREI1Do8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=dBkgloX7YfvIN3b96cL/wFypruhdxCY+64W4AsCcMIU=; b=JtHiG34pXGOVkL/OxkOfkKBrBPBAkyMXdN1aagpen6/2/IySQiuv8l9TXhbcFCGajO 8jUnm8lp7MiWxdynDKFQGZ3ouwDOXNAKI8QsU3vTy7D2X04FU73wcBt6ZGhUik5qw7z+ cmRuL+oWCM9Mb9bCUk1YAFTlZ6BhwJWrxNsS6M/AWwvtETNkg2jqPxiFhWQGCtNeKzgZ IShPbeKJdwEzeO0ygPVnFbWAeScxxqs49tYTkSDXGlvp+ckH4EZoOe8VNLoGBMItYFtR VTJzReB1T4gFegm0hgjVhTSqttXFNZ/g6j2CanGt3R8+5yi8QguO7aAZX+O6l7aiegc4 U6lg== X-Gm-Message-State: APf1xPC6g0grO267Adc7TKGKWJJIGXIT69WcfsvcwXCWHy4T2g4cnaJN XuKby2geyaRQe+0TXsh1VjjJ4EsteXaIVCM+5E7zPQ== X-Received: by 10.31.230.132 with SMTP id d126mr20345626vkh.123.1520555074297; Thu, 08 Mar 2018 16:24:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.31.242.140 with HTTP; Thu, 8 Mar 2018 16:24:33 -0800 (PST) In-Reply-To: <20180306013457.1955486-1-ast@kernel.org> References: <20180306013457.1955486-1-ast@kernel.org> From: Kees Cook Date: Thu, 8 Mar 2018 16:24:33 -0800 X-Google-Sender-Auth: aIOr8Qz1UaZpjfYrP1eNJiouIHY Message-ID: Subject: Re: [PATCH net-next] modules: allow modprobe load regular elf binaries To: Alexei Starovoitov , Djalal Harouni , Al Viro Cc: "David S. Miller" , Daniel Borkmann , Linus Torvalds , Greg KH , "Luis R. Rodriguez" , Network Development , LKML , kernel-team@fb.com, Linux API Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org How is this not marked [RFC]? :) On Mon, Mar 5, 2018 at 5:34 PM, Alexei Starovoitov wrote: > As the first step in development of bpfilter project [1] the request_module() > code is extended to allow user mode helpers to be invoked. Idea is that > user mode helpers are built as part of the kernel build and installed as > traditional kernel modules with .ko file extension into distro specified > location, such that from a distribution point of view, they are no different > than regular kernel modules. Thus, allow request_module() logic to load such > user mode helper (umh) modules via: > > request_module("foo") -> > call_umh("modprobe foo") -> > sys_finit_module(FD of /lib/modules/.../foo.ko) -> > call_umh(struct file) Yikes. This means autoloaded artbitrary exec() now, instead of autoloading just loading arbitrary modules? That seems extremely bad. This just extends all the problems we've had with defining security boundaries with modules out to umh too. I would need some major convincing that this can be made safe. It's easy for container to trigger arbitrary module loading, so if it can find/use a similar one of the bugs we've seen in the past to redirect the module loading we don't just get new module-created attack surface added to the kernel, but we again get arbitrary ELF running in the root ns (not in the container). And in the insane case of a container with CAP_SYS_MODULE, this is a trivial escape without even needing to build a special kernel module: the root ns, running with all privileges runs an arbitrary ELF. As it stands, I have to NAK this. At the very least, you need to solve the execution environment problems here: the ELF should run with no greater privileges than what loaded the module, and very importantly, must not be allowed to bypass these checks through autoloading. What _triggered_ the autoload must be the environment, not the "modprobe", since that's running with full privileges. Basically, we need to fix module autoloading first, or at least a significant subset of the problems: https://lkml.org/lkml/2017/11/27/754 > Such approach enables kernel to delegate functionality traditionally done > by kernel modules into user space processes (either root or !root) and > reduces security attack surface of such new code, meaning in case of > potential bugs only the umh would crash but not the kernel. Another > advantage coming with that would be that bpfilter.ko can be debugged and > tested out of user space as well (e.g. opening the possibility to run > all clang sanitizers, fuzzers or test suites for checking translation). > Also, such architecture makes the kernel/user boundary very precise: > control plane is done by the user space while data plane stays in the kernel. > > It's easy to distinguish "umh module" from traditional kernel module: > > $ readelf -h bld/net/bpfilter/bpfilter.ko|grep Type > Type: EXEC (Executable file) As Andy asked earlier, why not DYN too to catch PIE executables? Seems like forcing the userspace helper to be non-PIE would defeat some of the userspace defenses in use in most distros. > $ readelf -h bld/net/ipv4/netfilter/iptable_filter.ko |grep Type > Type: REL (Relocatable file) > > Since umh can crash, can be oom-ed by the kernel, killed by admin, > the subsystem that uses them (like bpfilter) need to manage life > time of umh on its own, so module infra doesn't do any accounting > of them. They don't appear in "lsmod" and cannot be "rmmod". > Multiple request_module("umh") will load multiple umh.ko processes. > > Similar to kernel modules the kernel will be tainted if "umh module" > has invalid signature. > > [1] https://lwn.net/Articles/747551/ > > Signed-off-by: Alexei Starovoitov > --- > fs/exec.c | 40 +++++++++++++++++++++++++++++++--------- > include/linux/binfmts.h | 1 + > include/linux/umh.h | 3 +++ > kernel/module.c | 43 ++++++++++++++++++++++++++++++++++++++----- > kernel/umh.c | 24 +++++++++++++++++++++--- > 5 files changed, 94 insertions(+), 17 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 7eb8d21bcab9..0483c438de7d 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1691,14 +1691,13 @@ static int exec_binprm(struct linux_binprm *bprm) > /* > * sys_execve() executes a new program. > */ > -static int do_execveat_common(int fd, struct filename *filename, > - struct user_arg_ptr argv, > - struct user_arg_ptr envp, > - int flags) > +static int __do_execve_file(int fd, struct filename *filename, > + struct user_arg_ptr argv, > + struct user_arg_ptr envp, > + int flags, struct file *file) > { > char *pathbuf = NULL; > struct linux_binprm *bprm; > - struct file *file; > struct files_struct *displaced; > int retval; > > @@ -1737,7 +1736,8 @@ static int do_execveat_common(int fd, struct filename *filename, > check_unsafe_exec(bprm); > current->in_execve = 1; > > - file = do_open_execat(fd, filename, flags); > + if (!file) > + file = do_open_execat(fd, filename, flags); > retval = PTR_ERR(file); > if (IS_ERR(file)) > goto out_unmark; > @@ -1745,7 +1745,9 @@ static int do_execveat_common(int fd, struct filename *filename, > sched_exec(); > > bprm->file = file; > - if (fd == AT_FDCWD || filename->name[0] == '/') { > + if (!filename) { > + bprm->filename = "/dev/null"; > + } else if (fd == AT_FDCWD || filename->name[0] == '/') { > bprm->filename = filename->name; > } else { > if (filename->name[0] == '\0') > @@ -1811,7 +1813,8 @@ static int do_execveat_common(int fd, struct filename *filename, > task_numa_free(current); > free_bprm(bprm); > kfree(pathbuf); > - putname(filename); > + if (filename) > + putname(filename); > if (displaced) > put_files_struct(displaced); > return retval; > @@ -1834,10 +1837,29 @@ static int do_execveat_common(int fd, struct filename *filename, > if (displaced) > reset_files_struct(displaced); > out_ret: > - putname(filename); > + if (filename) > + putname(filename); > return retval; > } > > +static int do_execveat_common(int fd, struct filename *filename, > + struct user_arg_ptr argv, > + struct user_arg_ptr envp, > + int flags) > +{ > + struct file *file = NULL; > + > + return __do_execve_file(fd, filename, argv, envp, flags, file); > +} > + > +int do_execve_file(struct file *file, void *__argv, void *__envp) > +{ > + struct user_arg_ptr argv = { .ptr.native = __argv }; > + struct user_arg_ptr envp = { .ptr.native = __envp }; > + > + return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file); > +} The exec.c changes should be split into a separate patch. Something feels incorrectly refactored here, though. Passing all three of fd, filename, and file to __do_execve_file() seems wrong; perhaps the fd to file handling needs to happen externally in what you have here as do_execveat_common()? The resulting __do_execve_file() has repeated conditionals on filename... maybe what I object to is being able to pass a NULL filename at all. The filename can be (painfully) reconstructed from the struct file. > [...] > diff --git a/kernel/module.c b/kernel/module.c > index ad2d420024f6..6cfa35795741 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > [...] > @@ -3870,14 +3896,21 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags) > |MODULE_INIT_IGNORE_VERMAGIC)) > return -EINVAL; > > - err = kernel_read_file_from_fd(fd, &hdr, &size, INT_MAX, > - READING_MODULE); > + f = fdget(fd); > + if (!f.file) > + return -EBADF; > + > + err = kernel_read_file(f.file, &hdr, &size, INT_MAX, READING_MODULE); For the LSM subsystem, I think this should also get it's own enum kernel_read_file_id. This is really no longer a kernel module... > if (err) > - return err; > + goto out; > info.hdr = hdr; > info.len = size; > + info.file = f.file; > > - return load_module(&info, uargs, flags); > + err = load_module(&info, uargs, flags); > +out: > + fdput(f); > + return err; > } > > static inline int within(unsigned long addr, void *start, unsigned long size) > diff --git a/kernel/umh.c b/kernel/umh.c > index 18e5fa4b0e71..4361c694bdb1 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -97,9 +97,13 @@ static int call_usermodehelper_exec_async(void *data) > > commit_creds(new); > > - retval = do_execve(getname_kernel(sub_info->path), > - (const char __user *const __user *)sub_info->argv, > - (const char __user *const __user *)sub_info->envp); > + if (sub_info->file) > + retval = do_execve_file(sub_info->file, > + sub_info->argv, sub_info->envp); > + else > + retval = do_execve(getname_kernel(sub_info->path), > + (const char __user *const __user *)sub_info->argv, > + (const char __user *const __user *)sub_info->envp); > out: > sub_info->retval = retval; > /* > @@ -393,6 +397,20 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, > } > EXPORT_SYMBOL(call_usermodehelper_setup); > > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file) > +{ > + struct subprocess_info *sub_info; > + > + sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL); > + if (!sub_info) > + return NULL; > + > + INIT_WORK(&sub_info->work, call_usermodehelper_exec_work); > + sub_info->path = "/dev/null"; > + sub_info->file = file; This use of "/dev/null" here and in execve is just wrong. It _does_ have a path and filename... Anyway, interesting idea. I think it _can_ work, it just needs much much more careful security boundaries and to solve our autoloading exposures too. -Kees -- Kees Cook Pixel Security