Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3992623ybi; Fri, 19 Jul 2019 12:45:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqx2mv9oL6e++ST13XJIu/or2uoesg1+KS3aaWAqs8iKpuF9U/eFCg3rB2pgaG+4LMhY6QIr X-Received: by 2002:a17:90a:b312:: with SMTP id d18mr58130569pjr.35.1563565528597; Fri, 19 Jul 2019 12:45:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563565528; cv=none; d=google.com; s=arc-20160816; b=aiQXSkee7DBhfSaolNhy+RzJ/HYhuUmtW+BNSCK3zmAFjkEAjf1PVj/bTYCHriD5+E kSNxoi+9eoxGcAB9UVOk0Iw3tuc1tRNmR3ZKzroF1Pp6reoRnDJ9BF6IPd1zYbwLzYMr j2FFiTmtStGV7NmN/WkcW7nARB/xSuG++RFpO0xa0t9p4pvdZ+oWd9dqvsuSv8rKZ98k L2CPxZCobDvqNgBzwzg90AKdFRrl9Fm8bas3gP4JLtC7+r6rKKFXfFbtaRxAYWmrcCJD ODcYlsAUmNNuat8it74OyF1kd8fnpUoge/L/irjb8zrOex33Pd78e9MYaMYyYla6Fb22 te5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=JBLVoiK4Pljepuuof9CgDvlc0JKBkwThn4evuhTCLSQ=; b=KNiWXTTm++5LLvG89QgfaVl2EsTEM3HYP+te0bRnIwrBcViYmq1u524wj1jV8h5Yp1 dTz+zhkSbXRVVB2+fCtuLrgWUZmUC4PARVjPcSDG3tBDCRyhsaYunMzhXH3KN2Xv3aIQ jNyDoXJ7ne1zdxyo7i+MzfvNjqgfFD5wO8yVoXwaRkrixnaTNblwUhJjGGYPHtlEJtZx 2tO+oA6BqE1eO8LetGzdkW+1F4tNl3nuzZijTkn1sJzcO9sXzCHjqUd8yWicn/TbnDtv LkYypiXRFSY79CvgW+1VpbzdGtn91RUsow+ynufdUIY20pg0qccD/4FHb0LfRpewkyzs d5Uw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t25si525578pgl.7.2019.07.19.12.45.13; Fri, 19 Jul 2019 12:45:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731723AbfGSQor (ORCPT + 99 others); Fri, 19 Jul 2019 12:44:47 -0400 Received: from mx2.mailbox.org ([80.241.60.215]:33966 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731675AbfGSQoq (ORCPT ); Fri, 19 Jul 2019 12:44:46 -0400 Received: from smtp1.mailbox.org (smtp1.mailbox.org [80.241.60.240]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id 9126DA016F; Fri, 19 Jul 2019 18:44:42 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter04.heinlein-hosting.de (spamfilter04.heinlein-hosting.de [80.241.56.122]) (amavisd-new, port 10030) with ESMTP id q25CAhLQyvWt; Fri, 19 Jul 2019 18:44:38 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Jann Horn , Kees Cook , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v10 6/9] namei: aggressively check for nd->root escape on ".." resolution Date: Sat, 20 Jul 2019 02:42:22 +1000 Message-Id: <20190719164225.27083-7-cyphar@cyphar.com> In-Reply-To: <20190719164225.27083-1-cyphar@cyphar.com> References: <20190719164225.27083-1-cyphar@cyphar.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch allows for LOOKUP_BENEATH and LOOKUP_IN_ROOT to safely permit ".." resolution (in the case of LOOKUP_BENEATH the resolution will still fail if ".." resolution would resolve a path outside of the root -- while LOOKUP_IN_ROOT will chroot(2)-style scope it). Magic-link jumps are still disallowed entirely because now they could result in inconsistent behaviour if resolution encounters a subsequent ".."[*]. The need for this patch is explained by observing there is a fairly easy-to-exploit race condition with chroot(2) (and thus by extension LOOKUP_IN_ROOT and LOOKUP_BENEATH if ".." is allowed) where a rename(2) of a path can be used to "skip over" nd->root and thus escape to the filesystem above nd->root. thread1 [attacker]: for (;;) renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); thread2 [victim]: for (;;) resolveat(dirb, "b/c/../../etc/shadow", RESOLVE_IN_ROOT); With fairly significant regularity, thread2 will resolve to "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar (though somewhat more privileged) attack using MS_MOVE. With this patch, such cases will be detected *during* ".." resolution (which is the weak point of chroot(2) -- since walking *into* a subdirectory tautologically cannot result in you walking *outside* nd->root -- except through a bind-mount or magic-link). By detecting this at ".." resolution (rather than checking only at the end of the entire resolution) we can both correct escapes by jumping back to the root (in the case of LOOKUP_IN_ROOT), as well as avoid revealing to attackers the structure of the filesystem outside of the root (through timing attacks for instance). In order to avoid a quadratic lookup with each ".." entry, we only activate the slow path if a write through &rename_lock or &mount_lock has occurred during path resolution (&rename_lock and &mount_lock are re-taken to further optimise the lookup). Since the primary attack being protected against is MS_MOVE or rename(2), not doing additional checks unless a mount or rename have occurred avoids making the common case slow. The use of path_is_under() here might seem suspect, but on further inspection of the most important race (a path was *inside* the root but is now *outside*), there appears to be no attack potential: * If path_is_under() occurs before the rename, then the path will be resolved -- however the path was originally inside the root and thus there is no escape (and to userspace it'd look like the rename occurred after the path was resolved). If path_is_under() occurs afterwards, the resolution is blocked. * Subsequent ".." jumps are guaranteed to check path_is_under() -- by construction, &rename_lock or &mount_lock must have been taken by the attacker after path_is_under() returned in the victim. Thus ".." will not be able to escape from the previously-inside-root path. * Walking down in the moved path is still safe since the entire subtree was moved (either by rename(2) or MS_MOVE) and because (as discussed above) walking down is safe. A variant of the above attack is included in the selftests for openat2(2) later in this patch series. I've run this test on several machines for several days and no instances of a breakout were detected. While this is not concrete proof that this is safe, when combined with the above argument it should lend some trustworthiness to this construction. [*] It may be acceptable in the future to do a path_is_under() check after resolving a magic-link and permit resolution if the nd_jump_link() result is still within the dirfd. However this seems unlikely to be a feature that people *really* need* -- it can be added later if it turns out a lot of people want it. Cc: Al Viro Cc: Jann Horn Cc: Kees Cook Signed-off-by: Aleksa Sarai --- fs/namei.c | 45 +++++++++++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 617fc7d55977..b3abf5754ddd 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -491,7 +491,7 @@ struct nameidata { struct path root; struct inode *inode; /* path.dentry.d_inode */ unsigned int flags; - unsigned seq, m_seq; + unsigned seq, m_seq, r_seq; int last_type; unsigned depth; int total_link_count; @@ -1758,22 +1758,36 @@ static inline int handle_dots(struct nameidata *nd, int type) if (type == LAST_DOTDOT) { int error = 0; - /* - * LOOKUP_BENEATH resolving ".." is not currently safe -- races can - * cause our parent to have moved outside of the root and us to skip - * over it. - */ - if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) - return -EXDEV; if (!nd->root.mnt) { error = set_root(nd); if (error) return error; } - if (nd->flags & LOOKUP_RCU) { - return follow_dotdot_rcu(nd); - } else - return follow_dotdot(nd); + if (nd->flags & LOOKUP_RCU) + error = follow_dotdot_rcu(nd); + else + error = follow_dotdot(nd); + if (error) + return error; + + if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) { + bool m_retry = read_seqretry(&mount_lock, nd->m_seq); + bool r_retry = read_seqretry(&rename_lock, nd->r_seq); + + /* + * Don't bother checking unless there's a racing + * rename(2) or MS_MOVE. + */ + if (likely(!m_retry && !r_retry)) + return 0; + + if (m_retry && !(nd->flags & LOOKUP_RCU)) + nd->m_seq = read_seqbegin(&mount_lock); + if (r_retry) + nd->r_seq = read_seqbegin(&rename_lock); + if (!path_is_under(&nd->path, &nd->root)) + return -EXDEV; + } } return 0; } @@ -2245,6 +2259,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->last_type = LAST_ROOT; /* if there are only slashes... */ nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT; nd->depth = 0; + + nd->m_seq = read_seqbegin(&mount_lock); + if (flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT)) + nd->r_seq = read_seqbegin(&rename_lock); + if (flags & LOOKUP_ROOT) { struct dentry *root = nd->root.dentry; struct inode *inode = root->d_inode; @@ -2266,8 +2285,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->path.mnt = NULL; nd->path.dentry = NULL; - nd->m_seq = read_seqbegin(&mount_lock); - /* LOOKUP_IN_ROOT treats absolute paths as being relative-to-dirfd. */ if (flags & LOOKUP_IN_ROOT) while (*s == '/') -- 2.22.0