Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4142113imu; Mon, 12 Nov 2018 06:30:37 -0800 (PST) X-Google-Smtp-Source: AJdET5d51MeQkBQ42yQKNJtcNHR3MtY3InZwArMMSpoLR266FE42E5yWHMURVBfLDkUCWJgTql0Z X-Received: by 2002:a62:5cc4:: with SMTP id q187-v6mr1108041pfb.47.1542033037022; Mon, 12 Nov 2018 06:30:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542033036; cv=none; d=google.com; s=arc-20160816; b=vq5Q69wNMbz+Ai+VBVetvTNiW97A/eBnzDCgOUQLfuGvhzqR4W8RSRkp27FyJiI0uN Cwfab7asYVdIOiiO5PYFy5F1L3mLEc5ckF46AIiZep+BjBTCnQkBSJmdKvOq4fF0vQqR DlA4Ssj3ppo6vDx/D+o6qRJYXiS+HHqM/g8sCXEj7js4cFrtmscmIwcCzHkaw4Tv2N9O 63Zs90SMliG6MhrzZVABdRffG5ucZpJGicGSDaPEElEMwXH6Ev+CzcHOrt6Q1hc1iG6S ldTewfBMDGQV4DLySUJbxyHyxnAMJtCM9m8jdS6iDOQO+6zg/cc3/fdCdLTCNxb/p3K6 DmiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=AvbeWsGWWyDICbwb2wyFfTmCYdVMT0D1AZemNudcqik=; b=RALXDwgPJyeUYfQkRgxEIGmSe5o6PQRD9P7r2wlmHcuOYZNTr9NVUmoQUhQiVdJHbb gKJzuBlnroGb66fGWzGmCOYWUjBQ2He8xinfPnInNyNwu2rbT62P4iFufQ4Cc+mUyTXr 3YCNKf6qh1/v1a2vKxs+/f1+1TIFoEbfcT/VThAYPCAVn08UyNJ1eN9FqymlmxtPrqoU gCS8uiqhjdj8uVMAztFgv/T5LPRz70XCADL7Du8zJcLzJV+p0t3tUPnXiGhI5MQaYc1h fItJ4W1QMcvp7w7XqpM2DwPtcVi4S0tFqsUZ0LxWRyotp3/1jm1IxwH2aVRDCw0wvucu Vw9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t4si16068420pga.83.2018.11.12.06.30.19; Mon, 12 Nov 2018 06:30:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730079AbeKMAVN (ORCPT + 99 others); Mon, 12 Nov 2018 19:21:13 -0500 Received: from mx1.mailbox.org ([80.241.60.212]:56470 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729794AbeKMAVN (ORCPT ); Mon, 12 Nov 2018 19:21:13 -0500 Received: from smtp1.mailbox.org (unknown [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id 245D84922C; Mon, 12 Nov 2018 15:27:40 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter02.heinlein-hosting.de (spamfilter02.heinlein-hosting.de [80.241.56.116]) (amavisd-new, port 10030) with ESMTP id BcCkq0KFOoOk; Mon, 12 Nov 2018 15:27:39 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells Cc: Aleksa Sarai , Jann Horn , Eric Biederman , Andy Lutomirski , Christian Brauner , David Drysdale , Aleksa Sarai , containers@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH v4 4/4] namei: aggressively check for nd->root escape on ".." resolution Date: Tue, 13 Nov 2018 01:26:54 +1100 Message-Id: <20181112142654.341-5-cyphar@cyphar.com> In-Reply-To: <20181112142654.341-1-cyphar@cyphar.com> References: <20181112142654.341-1-cyphar@cyphar.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch allows for O_BENEATH and O_THISROOT to safely permit ".." resolution (in the case of O_BENEATH the resolution will still fail if ".." resolution would resolve a path outside of the root -- while O_THISROOT will chroot(2)-style scope it). "magic link" jumps are still disallowed entirely because now they could result in inconsistent behaviour if resolution encounters a subsequent "..". The need for this patch is explained by observing there is a fairly easy-to-exploit race condition with chroot(2) (and thus by extension O_THISROOT and O_BENEATH) where a rename(2) of a path can be used to "skip over" nd->root and thus escape to the filesystem above nd->root. thread1 [attacker]: for (;;) renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); thread2 [victim]: for (;;) openat(dirb, "b/c/../../etc/shadow", O_THISROOT); With fairly significant regularity, thread2 will resolve to "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar (though somewhat more privileged) attack using MS_MOVE. With this patch, such cases will be detected *during* ".." resolution (which is the weak point of chroot(2) -- since walking *into* a subdirectory tautologically cannot result in you walking *outside* nd->root -- except through a bind-mount or "magic link"). By detecting this at ".." resolution (rather than checking only at the end of the entire resolution) we can both correct escapes by jumping back to the root (in the case of O_THISROOT), as well as avoid revealing to attackers the structure of the filesystem outside of the root (through timing attacks for instance). In order to avoid a quadratic lookup with each ".." entry, we only activate the slow path if a write through &rename_lock or &mount_lock have occurred during path resolution (&rename_lock and &mount_lock are re-taken to further optimise the lookup). Since the primary attack being protected against is MS_MOVE or rename(2), not doing additional checks unless a mount or rename have occurred avoids making the common case slow. The use of path_is_under() here might seem suspect, but on further inspection of the most important race (a path was *inside* the root but is now *outside*), there appears to be no attack potential. If path_is_under() occurs before the rename, then the path will be resolved but since the path was originally inside the root there is no escape. Subsequent ".." jumps are guaranteed to check path_is_under() (by construction, &rename_lock or &mount_lock must have been taken by the attacker after path_is_under() returned in the victim), and thus will not be able to escape from the previously-inside-root path. Walking down is still safe since the entire subtree was moved (either by rename(2) or MS_MOVE) and because (as discussed above) walking down is safe. Cc: Al Viro Cc: Jann Horn Signed-off-by: Aleksa Sarai --- fs/namei.c | 48 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 459faea5b832..b2c962f149c7 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -493,7 +493,7 @@ struct nameidata { struct path root; struct inode *inode; /* path.dentry.d_inode */ unsigned int flags; - unsigned seq, m_seq; + unsigned seq, m_seq, r_seq; int last_type; unsigned depth; int total_link_count; @@ -1741,19 +1741,35 @@ static inline int may_lookup(struct nameidata *nd) static inline int handle_dots(struct nameidata *nd, int type) { if (type == LAST_DOTDOT) { - /* - * LOOKUP_BENEATH resolving ".." is not currently safe -- races can - * cause our parent to have moved outside of the root and us to skip - * over it. - */ - if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_CHROOT))) - return -EXDEV; + int error = 0; + if (!nd->root.mnt) set_root(nd); - if (nd->flags & LOOKUP_RCU) { - return follow_dotdot_rcu(nd); - } else - return follow_dotdot(nd); + if (nd->flags & LOOKUP_RCU) + error = follow_dotdot_rcu(nd); + else + error = follow_dotdot(nd); + if (error) + return error; + + if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_CHROOT))) { + bool m_retry = read_seqretry(&mount_lock, nd->m_seq); + bool r_retry = read_seqretry(&rename_lock, nd->r_seq); + + /* + * Don't bother checking unless there's a racing + * rename(2) or MS_MOVE. + */ + if (likely(!m_retry && !r_retry)) + return 0; + + if (m_retry && !(nd->flags & LOOKUP_RCU)) + nd->m_seq = read_seqbegin(&mount_lock); + if (r_retry) + nd->r_seq = read_seqbegin(&rename_lock); + if (!path_is_under(&nd->path, &nd->root)) + return -EXDEV; + } } return 0; } @@ -2274,6 +2290,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->last_type = LAST_ROOT; /* if there are only slashes... */ nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT; nd->depth = 0; + + nd->m_seq = read_seqbegin(&mount_lock); + if (unlikely(flags & (LOOKUP_BENEATH | LOOKUP_CHROOT))) + nd->r_seq = read_seqbegin(&rename_lock); + if (flags & LOOKUP_ROOT) { struct dentry *root = nd->root.dentry; struct inode *inode = root->d_inode; @@ -2284,7 +2305,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) if (flags & LOOKUP_RCU) { nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); nd->root_seq = nd->seq; - nd->m_seq = read_seqbegin(&mount_lock); } else { path_get(&nd->path); } @@ -2295,8 +2315,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->path.mnt = NULL; nd->path.dentry = NULL; - nd->m_seq = read_seqbegin(&mount_lock); - if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_CHROOT))) { error = dirfd_path_init(nd); if (unlikely(error)) -- 2.19.1