Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp33107ybc; Fri, 15 Nov 2019 16:34:56 -0800 (PST) X-Google-Smtp-Source: APXvYqwQrxZpiIU9atstW+svTUa0YF5H/mGhVtHtKTwU1vhNgX9Imbymbxj8LHVBCe005t8RCNxQ X-Received: by 2002:a17:906:5251:: with SMTP id y17mr5487481ejm.108.1573864496133; Fri, 15 Nov 2019 16:34:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573864496; cv=none; d=google.com; s=arc-20160816; b=EdmzuPyu+SxJXoGTHv6P/9rRTCdDR2zTQv5yKfn6CgwahMMH1xWNA3+eJ7PEoFwQt4 KfxT6/eUerX86iZRwJin3N24B/kwlMFQ1CFJUBHhsgPzMjz3gIM6fYg/pFa1DawUPX6O Afk1Tgcsd7exIIKKmzVwd5xky5GyiwDA89xkdGkVuyQLcjwjKku8iLUt+e1cJUKJmmK/ z3YMS+IEumOo98CzzWaTQWHfJHka0qWPbeFiD2A3HGH6JZ4UKEIQ1g3sy8qwGpq8IFit LVgY4qS0b6ejOMh9IjIecmjiKJQ6jq0Xa7YimrO3XFdtp1SfrXXVvBegoW9K97q/+rdS gH7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=ctUzbLxzu1QzivYg97Pq7mFZw4DG0cdlS6Au08GFjrs=; b=W18CpzGdtEOYqCBtZ5/BU5v0kxxkf0HVSESdPPMVKOUjyw/IB60+Nq2Y95ebwF+UEU D2xfD+zKD8Zkyw26JRpjVcQ+vfllko7e+jF5cODrMTiLRacg+Yq+6afPhy0gsMgGST4L UoE6fU3s8M/NW6S48wdiAsrpHHMigJAsB8Ir0O0bY0p1SfwhAYpc7AaeuBHkzSjWQA/N BUzVNiT+66ibu3fElPQTWfJjofF+WPwv5PNKDv7l+h3DPoGEo6SH3Ux+hOeCgD3dS/Yy D8d+fXHiaCJqTf2slEJfqt6ZqpxVDAtyk807h+BfM3Zz7cDOq3DNyj1IiImhtWd/BgcL kO9A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r7si7188181edm.224.2019.11.15.16.34.30; Fri, 15 Nov 2019 16:34:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727586AbfKPAdc (ORCPT + 99 others); Fri, 15 Nov 2019 19:33:32 -0500 Received: from mout-p-202.mailbox.org ([80.241.56.172]:29454 "EHLO mout-p-202.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727170AbfKPAdb (ORCPT ); Fri, 15 Nov 2019 19:33:31 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 47FGRB6JZbzQlC3; Sat, 16 Nov 2019 01:33:26 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter03.heinlein-hosting.de (spamfilter03.heinlein-hosting.de [80.241.56.117]) (amavisd-new, port 10030) with ESMTP id YEkgPeX8z_f3; Sat, 16 Nov 2019 01:33:22 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , Linus Torvalds , dev@opencontainers.org, containers@lists.linux-foundation.org, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v16 12/12] Documentation: path-lookup: include new LOOKUP flags Date: Sat, 16 Nov 2019 11:28:02 +1100 Message-Id: <20191116002802.6663-13-cyphar@cyphar.com> In-Reply-To: <20191116002802.6663-1-cyphar@cyphar.com> References: <20191116002802.6663-1-cyphar@cyphar.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now that we have new LOOKUP flags, we should document them in the relevant path-walking documentation. And now that we've settled on a common name for nd_jump_link() style symlinks ("magic links"), use that term where magic-link semantics are described. Signed-off-by: Aleksa Sarai --- Documentation/filesystems/path-lookup.rst | 68 +++++++++++++++++++++-- 1 file changed, 62 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst index 434a07b0002b..a3216979298b 100644 --- a/Documentation/filesystems/path-lookup.rst +++ b/Documentation/filesystems/path-lookup.rst @@ -13,6 +13,7 @@ It has subsequently been updated to reflect changes in the kernel including: - per-directory parallel name lookup. +- ``openat2()`` resolution restriction flags. Introduction to pathname lookup =============================== @@ -235,6 +236,13 @@ renamed. If ``d_lookup`` finds that a rename happened while it unsuccessfully scanned a chain in the hash table, it simply tries again. +``rename_lock`` is also used to detect and defend against potential attacks +against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where +the parent directory is moved outside the root, bypassing the ``path_equal()`` +check). If ``rename_lock`` is updated during the lookup and the path encounters +a "..", a potential attack occurred and ``handle_dots()`` will bail out with +``-EAGAIN``. + inode->i_rwsem ~~~~~~~~~~~~~~ @@ -348,6 +356,13 @@ any changes to any mount points while stepping up. This locking is needed to stabilize the link to the mounted-on dentry, which the refcount on the mount itself doesn't ensure. +``mount_lock`` is also used to detect and defend against potential attacks +against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where +the parent directory is moved outside the root, bypassing the ``path_equal()`` +check). If ``mount_lock`` is updated during the lookup and the path encounters +a "..", a potential attack occurred and ``handle_dots()`` will bail out with +``-EAGAIN``. + RCU ~~~ @@ -405,6 +420,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that only one root is in effect for the entire path walk, even if it races with a ``chroot()`` system call. +It should be noted that in the case of ``LOOKUP_IN_ROOT`` or +``LOOKUP_BENEATH``, the effective root becomes the directory file descriptor +passed to ``openat2()`` (which exposes these ``LOOKUP_`` flags). + The root is needed when either of two conditions holds: (1) either the pathname or a symbolic link starts with a "'/'", or (2) a "``..``" component is being handled, since "``..``" from the root must always stay @@ -1149,7 +1168,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and the stack frame discarded. The other case involves things in ``/proc`` that look like symlinks but -aren't really:: +aren't really (and are therefore commonly referred to as "magic-links"):: $ ls -l /proc/self/fd/1 lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4 @@ -1286,7 +1305,9 @@ A few flags A suitable way to wrap up this tour of pathname walking is to list the various flags that can be stored in the ``nameidata`` to guide the lookup process. Many of these are only meaningful on the final -component, others reflect the current state of the pathname lookup. +component, others reflect the current state of the pathname lookup, and some +apply restrictions to all path components encountered in the path lookup. + And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with the others. If this is not set, an empty pathname causes an error very early on. If it is set, empty pathnames are not considered to be @@ -1310,13 +1331,48 @@ longer needed. ``LOOKUP_JUMPED`` means that the current dentry was chosen not because it had the right name but for some other reason. This happens when following "``..``", following a symlink to ``/``, crossing a mount point -or accessing a "``/proc/$PID/fd/$FD``" symlink. In this case the -filesystem has not been asked to revalidate the name (with -``d_revalidate()``). In such cases the inode may still need to be -revalidated, so ``d_op->d_weak_revalidate()`` is called if +or accessing a "``/proc/$PID/fd/$FD``" symlink (also known as a "magic +link"). In this case the filesystem has not been asked to revalidate the +name (with ``d_revalidate()``). In such cases the inode may still need +to be revalidated, so ``d_op->d_weak_revalidate()`` is called if ``LOOKUP_JUMPED`` is set when the look completes - which may be at the final component or, when creating, unlinking, or renaming, at the penultimate component. +Resolution-restriction flags +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to allow userspace to protect itself against certain race conditions +and attack scenarios involving changing path components, a series of flags are +available which apply restrictions to all path components encountered during +path lookup. These flags are exposed through ``openat2()``'s ``resolve`` field. + +``LOOKUP_NO_SYMLINKS`` blocks all symlink traversals (including magic-links). +This is distinctly different from ``LOOKUP_FOLLOW``, because the latter only +relates to restricting the following of trailing symlinks. + +``LOOKUP_NO_MAGICLINKS`` blocks all magic-link traversals. Filesystems must +ensure that they return errors from ``nd_jump_link()``, because that is how +``LOOKUP_NO_MAGICLINKS`` and other magic-link restrictions are implemented. + +``LOOKUP_NO_XDEV`` blocks all ``vfsmount`` traversals (this includes both +bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the +lookup is determined by the first mountpoint the path lookup reaches -- +absolute paths start with the ``vfsmount`` of ``/``, and relative paths start +with the ``dfd``'s ``vfsmount``. Magic-links are only permitted if the +``vfsmount`` of the path is unchanged. + +``LOOKUP_BENEATH`` blocks any path components which resolve outside the +starting point of the resolution. This is done by blocking ``nd_jump_root()`` +as well as blocking ".." if it would jump outside the starting point. +``rename_lock`` and ``mount_lock`` are used to detect attacks against the +resolution of "..". Magic-links are also blocked. + +``LOOKUP_IN_ROOT`` resolves all path components as though the starting point +were the filesystem root. ``nd_jump_root()`` brings the resolution back to to +the starting point, and ".." at the starting point will act as a no-op. As with +``LOOKUP_BENEATH``, ``rename_lock`` and ``mount_lock`` are used to detect +attacks against ".." resolution. Magic-links are also blocked. + Final-component flags ~~~~~~~~~~~~~~~~~~~~~ -- 2.24.0