Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp1050602ybg; Sat, 26 Oct 2019 11:59:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqzwVrBTpToOisoDLJqjGzeOboHDJsVl7YJi98TwFoIHM8MB2ueqigcpB1Dn4sRwzVPxEKb+ X-Received: by 2002:aa7:d304:: with SMTP id p4mr10820355edq.224.1572116391857; Sat, 26 Oct 2019 11:59:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572116391; cv=none; d=google.com; s=arc-20160816; b=UVVlQtjcT9Z2LG+vDPXeGcQzE/0qb8Eo6klc8sJdR9sB/kJ6b7FDb999TMinri8mZM Jd6dhoyH5u3BTx9S1LbIP/p0LP4TQbknpuFo4w4ubofpZGE6bGPTEj7n8+xdt7X42xfp GKBFkQ3ABfdSUd2B/3BCWQ/iqbd1CVLDDJwbvADmroGPoaYAwhnFZ4ma3NAqjps16jh9 uijQdnR76VKpponqLwQEwDEba2cNJFI4FMOhfwoo3aBxjQSUTsryI1P5lIfht2sqsj6v 5AeBhogUcasylDz1MXMwujUI6UlbDHJK9iPOuVbSxBGkJuITcWUy7XJ31cUkaWtaNZGK jjIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=syl/R/HmKzSJ+To4UIo9Ymh/9nxafgcT5fu4QTdKtKQ=; b=po6wV62fQ4KJjzCBypAZ9Wur00WhEQYzXMfsHcFhXzprEIQklyDfr/+5pkre5niXqw RX1IN53RQPBqwSm0pzOAASFJBhfpzQDbm0J+HSsDa5Qvap4ImJRgzInbe3uzhOELMMxl KEsPTHBmMV6IKRqdtIuawR92lay4n8G7vyXanC541Mk50/dg7CIZhvLgEIO1OLqtdNvB SntVthumG6fnQB7VE9xUGMQeGspdzOSI+Zm6Lb0s+6KE5/sZ2E0pNTvMvlWY1yq93Two H4s1O1up87ME9R+5vdar/QnNxZqcKK2t56oq6RuJKh18pdflggYCQvO6LxQEDGYvqQee 30vA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o49si4067085edc.261.2019.10.26.11.59.28; Sat, 26 Oct 2019 11:59:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726386AbfJZS6m (ORCPT + 99 others); Sat, 26 Oct 2019 14:58:42 -0400 Received: from mout-p-101.mailbox.org ([80.241.56.151]:50034 "EHLO mout-p-101.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726318AbfJZS6l (ORCPT ); Sat, 26 Oct 2019 14:58:41 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 470qy436zGzKmsb; Sat, 26 Oct 2019 20:58:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter05.heinlein-hosting.de (spamfilter05.heinlein-hosting.de [80.241.56.123]) (amavisd-new, port 10030) with ESMTP id 0nCeWod63kTi; Sat, 26 Oct 2019 20:58:29 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH RESEND v14 3/6] namei: permit ".." resolution with LOOKUP_{IN_ROOT,BENEATH} Date: Sun, 27 Oct 2019 05:56:57 +1100 Message-Id: <20191026185700.10708-4-cyphar@cyphar.com> In-Reply-To: <20191026185700.10708-1-cyphar@cyphar.com> References: <20191026185700.10708-1-cyphar@cyphar.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch allows for LOOKUP_BENEATH and LOOKUP_IN_ROOT to safely permit ".." resolution (in the case of LOOKUP_BENEATH the resolution will still fail if ".." resolution would resolve a path outside of the root -- while LOOKUP_IN_ROOT will chroot(2)-style scope it). Magic-link jumps are still disallowed entirely[*]. The need for this patch (and the original no-".." restriction) is explained by observing there is a fairly easy-to-exploit race condition with chroot(2) (and thus by extension LOOKUP_IN_ROOT and LOOKUP_BENEATH if ".." is allowed) where a rename(2) of a path can be used to "skip over" nd->root and thus escape to the filesystem above nd->root. thread1 [attacker]: for (;;) renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); thread2 [victim]: for (;;) openat2(dirb, "b/c/../../etc/shadow", { .flags = O_PATH, .resolve = RESOLVE_IN_ROOT } ); With fairly significant regularity, thread2 will resolve to "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar (though somewhat more privileged) attack using MS_MOVE. With this patch, such cases will be detected *during* ".." resolution and will return -EAGAIN for userspace to decide to either retry or abort the lookup. It should be noted that ".." is the weak point of chroot(2) -- walking *into* a subdirectory tautologically cannot result in you walking *outside* nd->root (except through a bind-mount or magic-link). There is also no other way for a directory's parent to change (which is the primary worry with ".." resolution here) other than a rename or MS_MOVE. This is a first-pass implementation, where -EAGAIN will be returned if any rename or mount occurs anywhere on the host (in any namespace). This will result in spurious errors, but there isn't a satisfactory alternative (other than denying ".." altogether). One other possible alternative (which previous versions of this patch used) would be to check with path_is_under() if there was a racing rename or mount (after re-taking the relevant seqlocks). While this does work, it results in possible O(n*m) behaviour if there are many renames or mounts occuring *anywhere on the system*. A variant of the above attack is included in the selftests for openat2(2) later in this patch series. I've run this test on several machines for several days and no instances of a breakout were detected. While this is not concrete proof that this is safe, when combined with the above argument it should lend some trustworthiness to this construction. [*] It may be acceptable in the future to do a path_is_under() check (as with the alternative solution for "..") for magic-links after they are resolved. However this seems unlikely to be a feature that people *really* need -- it can be added later if it turns out a lot of people want it. Signed-off-by: Aleksa Sarai --- fs/namei.c | 43 +++++++++++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 14 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 9d00b138f54c..0d6857ac4e5b 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -491,7 +491,7 @@ struct nameidata { struct path root; struct inode *inode; /* path.dentry.d_inode */ unsigned int flags; - unsigned seq, m_seq; + unsigned seq, m_seq, r_seq; int last_type; unsigned depth; int total_link_count; @@ -1769,22 +1769,35 @@ static inline int handle_dots(struct nameidata *nd, int type) if (type == LAST_DOTDOT) { int error = 0; - /* - * Scoped-lookup flags resolving ".." is not currently safe -- - * races can cause our parent to have moved outside of the root - * and us to skip over it. - */ - if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) - return -EXDEV; if (!nd->root.mnt) { error = set_root(nd); if (error) return error; } - if (nd->flags & LOOKUP_RCU) { - return follow_dotdot_rcu(nd); - } else - return follow_dotdot(nd); + if (nd->flags & LOOKUP_RCU) + error = follow_dotdot_rcu(nd); + else + error = follow_dotdot(nd); + if (error) + return error; + + if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) { + bool m_retry = read_seqretry(&mount_lock, nd->m_seq); + bool r_retry = read_seqretry(&rename_lock, nd->r_seq); + + /* + * If there was a racing rename or mount along our + * path, then we can't be sure that ".." hasn't jumped + * above nd->root (and so userspace should retry or use + * some fallback). + * + * In future we could do a path_is_under() check here + * instead, but there are O(n*m) performance + * considerations with such a setup. + */ + if (unlikely(m_retry || r_retry)) + return -EAGAIN; + } } return 0; } @@ -2254,6 +2267,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->last_type = LAST_ROOT; /* if there are only slashes... */ nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT; nd->depth = 0; + + nd->m_seq = read_seqbegin(&mount_lock); + nd->r_seq = read_seqbegin(&rename_lock); + if (flags & LOOKUP_ROOT) { struct dentry *root = nd->root.dentry; struct inode *inode = root->d_inode; @@ -2275,8 +2292,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->path.mnt = NULL; nd->path.dentry = NULL; - nd->m_seq = read_seqbegin(&mount_lock); - /* LOOKUP_IN_ROOT treats absolute paths as being relative-to-dirfd. */ if (flags & LOOKUP_IN_ROOT) while (*s == '/') -- 2.23.0