Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp4552482imj; Tue, 12 Feb 2019 19:09:57 -0800 (PST) X-Google-Smtp-Source: AHgI3Ib4Gu8sKVMQeGMt//VP9+INh2sPZVrNLZWGn4dFZus3zbtxiFmnhqlHnKpm9HiaUNeIQneq X-Received: by 2002:a17:902:8e8a:: with SMTP id bg10mr7408880plb.192.1550027397510; Tue, 12 Feb 2019 19:09:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550027397; cv=none; d=google.com; s=arc-20160816; b=HfQu3LNB9jp58E6R2ujKzCCfJZHfmxO9TkOxY65ZnPhU04K8QhlXY6IqAmS8Op7c+6 Aw2ysQgw6Yf/UUl11ExIc/pDMsKVPCvo9kAKcxXfM3F0BrxomN43r82MpaFwXQDZ5XAf Lda2JMcSpoycl3mQ8tVvMKIrG9wzc/ijbObdPKF2wJfuvGWpuWaVuBwchXV8xaKqZuHl FxVuOdKxBfwTgXiDBT+Rs4qAp1gLkScDknkF9BM2zi8XPQ2MPtuai5pAGTMvY+iFITQi 94C6NNEGVx1LIDxWuJcVIt+Pj/0Gq0SyO3cBeyviGcjXA4DnNoQaY+HdOLkqd6PuBimI wuhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=w29vNe+LRvRbOpkHCBqMzYsR6c2UUcrM9iFghE40Ur8=; b=l4hdFI+vLnO5rsKxWSqxBOPtMXI+OoL7vMQ1zZspHvXVmswlCTEmypAg3IXpYMJdPO vAY95XDRsbfgMCcg7vvmy0SjQpJyvVw+F1Kzica9VIHX8MrLRUP0xa4Gj+uJc6FisOWu ce1ScvCVJGZFv76nQZYUP8UBTMHdOCFtFuoHYCkRlBqtk+4QieNF7fjEn1ynHmOYR12a LUBoVwzuVeuJzcaOhT2a5ZVAl3ldgnGOJN5/sKyBZfn3OKBJH7XEqS3Y10Hw7xCcaKkW ZtNMp96qEWLK7pWpXDozKzCqkgTVCb6IDh0uIqUM8fW+sJ2vO22hNrYcBX/mKrDJt23V zVog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h5si14405709pls.125.2019.02.12.19.09.41; Tue, 12 Feb 2019 19:09:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387620AbfBMDJ0 (ORCPT + 99 others); Tue, 12 Feb 2019 22:09:26 -0500 Received: from mx2.mailbox.org ([80.241.60.215]:32480 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729210AbfBMDJ0 (ORCPT ); Tue, 12 Feb 2019 22:09:26 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id CCE2EA1194; Wed, 13 Feb 2019 04:09:22 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id fGK-D1SavHjO; Wed, 13 Feb 2019 04:09:11 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Jann Horn , Christian Brauner , David Drysdale , Tycho Andersen , Kees Cook , containers@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Andrew Morton , Alexei Starovoitov , Chanho Min , Oleg Nesterov , Aleksa Sarai , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH v5 0/5] namei: vfs flags to restrict path resolution Date: Wed, 13 Feb 2019 14:08:46 +1100 Message-Id: <20190213030851.1881-1-cyphar@cyphar.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now that the holiday break is over, it's time to re-send this patch series (with a few additions, due to new information we got from CVE-2019-5736 -- which this patchset mostly protected against but had some holes with regards to #!-style scripts). Patch changelog: v5: * In response to CVE-2019-5736 (one of the vectors showed that open(2)+fexec(3) cannot be used to scope binfmt_script's implicit open_exec()), AT_* flags have been re-added and are now piped through to binfmt_script (and other binfmt_* that use open_exec) but are only supported for execveat(2) for now. v4: * Remove AT_* flag reservations, as they require more discussion. * Switch to path_is_under() over __d_path() for breakout checking. * Make O_XDEV no longer block openat("/tmp", "/", O_XDEV) -- dirfd is now ignored for absolute paths to match other flags. * Improve the dirfd_path_init() refactor and move it to a separate commit. * Remove reference to Linux-capsicum. * Switch "proclink" name to "magic link". v3: [resend] v2: * Made ".." resolution with AT_THIS_ROOT and AT_BENEATH safe(r) with some semi-aggressive __d_path checking (see patch 3). * Disallowed "proclinks" with AT_THIS_ROOT and AT_BENEATH, in the hopes they can be re-enabled once safe. * Removed the selftests as they will be reimplemented as xfstests. * Removed stat(2) support, since you can already get it through O_PATH and fstatat(2). The need for some sort of control over VFS's path resolution (to avoid malicious paths resulting in inadvertent breakouts) has been a very long-standing desire of many userspace applications. This patchset is a revival of Al Viro's old AT_NO_JUMPS[1,2] patchset (which was a variant of David Drysdale's O_BENEATH patchset[3] which was a spin-off of the Capsicum project[4]) with a few additions and changes made based on the previous discussion within [5] as well as others I felt were useful. In line with the conclusions of the original discussion of AT_NO_JUMPS, the flag has been split up into separate flags: * O_XDEV blocks all mountpoint crossings (upwards, downwards, or through absolute links). Absolute pathnames alone in openat(2) do not trigger this. * O_NOMAGICLINKS blocks resolution through /proc/$pid/fd-style links. This is done by blocking the usage of nd_jump_link() during resolution in a filesystem. The term "magic links" is used to match with the only reference to these links in Documentation/, but I'm happy to change the name. It should be noted that this is different to the scope of O_NOFOLLOW in that it applies to all path components. However, you can do open(O_NOFOLLOW|O_NOMAGICLINKS|O_PATH) on a "magic link" and it will *not* fail (assuming that no parent component was a "magic link"), and you will have an fd for the "magic link". * O_BENEATH disallows escapes to outside the starting dirfd's tree, using techniques such as ".." or absolute links. Absolute paths in openat(2) are also disallowed. Conceptually this flag is to ensure you "stay below" a certain point in the filesystem tree -- but this requires some additional to protect against various races that would allow escape using ".." (see patch 4 for more detail). Currently O_BENEATH implies O_NOMAGICLINKS, because it can trivially beam you around the filesystem (breaking the protection). In future, there might be similar safety checks as in patch 4, but that requires more discussion. In addition, two new flags were added that expand on the above ideas: * O_NOSYMLINKS does what it says on the tin. No symlink resolution is allowed at all, including "magic links". Just as with O_NOMAGICLINKS this can still be used with (O_PATH|O_NOFOLLOW) to open an fd for the symlink as long as no parent path had a symlink component. * O_THISROOT is an extension of O_BENEATH that, rather than blocking attempts to move past the root, forces all such movements to be scoped to the starting point. This provides chroot(2)-like protection but without the cost of a chroot(2) for each filesystem operation, as well as being safe against race attacks that chroot(2) is not. If a race is detected (as with O_BENEATH) then an error is generated, and similar to O_BENEATH it is not permitted to cross "magic links" with O_THISROOT. The primary need for this is from container runtimes, which currently need to do symlink scoping in userspace[6] when opening paths in a potentially malicious container. There is a long list of CVEs that could have bene mitigated by having O_THISROOT (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a few). In addition, a mirror set of AT_* flags have been added (though currently these are only supported for execveat(2) -- and not for any other syscall). The need for these is explained in the final patch in the series (it's motivated by CVE-2019-5736). Cc: Al Viro Cc: Eric Biederman Cc: Andy Lutomirski Cc: David Howells Cc: Jann Horn Cc: Christian Brauner Cc: David Drysdale Cc: Tycho Andersen Cc: Kees Cook Cc: Cc: Cc: [1]: https://lwn.net/Articles/721443/ [2]: https://lore.kernel.org/patchwork/patch/784221/ [3]: https://lwn.net/Articles/619151/ [4]: https://lwn.net/Articles/603929/ [5]: https://lwn.net/Articles/723057/ [6]: https://github.com/cyphar/filepath-securejoin Aleksa Sarai (5): namei: split out nd->dfd handling to dirfd_path_init namei: O_BENEATH-style path resolution flags namei: O_THISROOT: chroot-like path resolution namei: aggressively check for nd->root escape on ".." resolution binfmt_*: scope path resolution of interpreters fs/binfmt_elf.c | 2 +- fs/binfmt_elf_fdpic.c | 2 +- fs/binfmt_em86.c | 4 +- fs/binfmt_misc.c | 2 +- fs/binfmt_script.c | 2 +- fs/exec.c | 26 +++- fs/fcntl.c | 2 +- fs/namei.c | 205 ++++++++++++++++++++++--------- fs/open.c | 13 +- include/linux/binfmts.h | 1 + include/linux/fcntl.h | 3 +- include/linux/fs.h | 9 +- include/linux/namei.h | 8 ++ include/uapi/asm-generic/fcntl.h | 17 +++ include/uapi/linux/fcntl.h | 6 + 15 files changed, 228 insertions(+), 74 deletions(-) -- 2.20.1