Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp5004814imm; Tue, 9 Oct 2018 08:20:20 -0700 (PDT) X-Google-Smtp-Source: ACcGV63IrVPrJn6/gYFwbfenFW6dUXtVXUQbuD/8w5HXUoLPNAAREQ6vS3ZpjbCEOUw5NHszshN2 X-Received: by 2002:a62:35c2:: with SMTP id c185-v6mr26167236pfa.69.1539098420484; Tue, 09 Oct 2018 08:20:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539098420; cv=none; d=google.com; s=arc-20160816; b=ElvtdFNl0IirtqPOmqrQjGQF+zgg6KuNrE5NLgSpxMuBAlapdTiXW8u26VE/ADi2V4 1SsYRbtk85C5rXV5NuGtmZI7DhXOqMJfVLiqLuY+C8aEQxI7s411Cu2oxGUlJfPIym4p rdDsg+CkY0W1yk1bmKKfPIA2TZU5Sxqm4xXVA1rIj4/dH1IpiQssJRjd4/FogZVhdQ+E qS4TMyvcpvPZrv2NDFF/2jvgG7ugsAuaeiag0kAThAtuUFtJ9Wuovf3qMrV9tq5Rbcot Y3MLDfmAlu2Nq6DiGRqodgN+LrXusfaDcf6qvLtwep1vu+n61djk6nEOR3cKrrqCab0J oTUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=DM0/LbGBo+If2gHZLNl7ZvAoOHhVIvi88aFVXyLsOto=; b=pPH6aYi1JlyJ1nV+RBtsfcmqXvgZGibN62VMy8h3uaJa3CDY5VrUTpsIFTvhxWds9J Y73n7MVSSWFviF2D7E7QWLmMugTnVB5iLkxN7w52IVZbeHK5Ki1hlHwWeCwi4obPS8PI hdKn39qvCZ+5E52zvlh0Wn+0ys3DWoA2H/BDTzj94KBqUsRSSpel+52AUFx6bfND4X92 38kmbat1Fn3x2VObbMmMqqQmj0fQilSbr6oxPoOfLiq9AeKAhYv8qFLijWadt16GGQXy fk4QVSK+5EA57XWaIpvDoXMiWObnFq8aLJ2Jk6y+wciYW8TIrEFygcn52KowH1eVhQ1k yxtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=dFH+DjWj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j63-v6si19646471pgd.69.2018.10.09.08.20.04; Tue, 09 Oct 2018 08:20:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=dFH+DjWj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726700AbeJIWg7 (ORCPT + 99 others); Tue, 9 Oct 2018 18:36:59 -0400 Received: from mail-ot1-f68.google.com ([209.85.210.68]:36138 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726434AbeJIWg7 (ORCPT ); Tue, 9 Oct 2018 18:36:59 -0400 Received: by mail-ot1-f68.google.com with SMTP id x4so622675otg.3 for ; Tue, 09 Oct 2018 08:19:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DM0/LbGBo+If2gHZLNl7ZvAoOHhVIvi88aFVXyLsOto=; b=dFH+DjWj3wiF64zTnd3OWHcwL3PGB1lzg2GBckVCKaZJHUYrKQLhbgMVWYD24JpQa/ vV4sKEZ5dqvF+jh6QDVA6KfYXkmX+ZoZC9IOQUn6qheuEVPuFCZhZaJOIjfnWWpX3REu PduiB29K0EDhvvsvOm4GZ11P+o94TKHzZ3jh6Sc1Mr5Oc03f2cle0sSNCESCNoE794oX LzK4JUYQhOhUoT6IYbxANetyxlBcDEaEw+QAQyIvn2we2U8Phb6PGc5Y+XorgbcRO+3c k9xX4bUomOpIhJY/5J8tXX+jd38esDhJrU0FBntNlb3OZDXoaNws9FWPi5i/zW31XO0s 6O6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DM0/LbGBo+If2gHZLNl7ZvAoOHhVIvi88aFVXyLsOto=; b=QrfBq5ZL4zjHTU1hRwRmq2YnsYO1yIb0wWYpCcGAUzcuIN45QYMKcbi1ecFELM+BHm jDACtfxa+ybLrTM28AU1fCHkihS53JMQsfQmTvwNU8eoBTqg8kpmHGEq/Xz7Fb+VPuAw ExleFg5QWARNiDqXUVk14DQ4UwsX3SnprwonjQBMmrnij9XUBjC6H5NHwhNjmV93cTCR l1/O71E58Vk0T5GLOyFL8As51an9g3yxDfvrZqiubxEO1YrqgoEVPdS7CPy3eU7V/Zu5 AhQXulT+f04z5GP9uy/5LuoSbkW8KK5A1XvjEnGVX0AJfkYZ+6D0qnpErYud9r52biuN bzrA== X-Gm-Message-State: ABuFfohJ99ghJAk2GZ2P4MvPomQPwUATwwwJg7GQYRKDkMFsZ9WCEnuu FjPuNf6N0+F+DXAGt9xD/C2/JcgGAsRZzF40tDPDWA== X-Received: by 2002:a9d:2117:: with SMTP id i23mr16425310otb.230.1539098372994; Tue, 09 Oct 2018 08:19:32 -0700 (PDT) MIME-Version: 1.0 References: <20181009070230.12884-1-cyphar@cyphar.com> <20181009070230.12884-4-cyphar@cyphar.com> In-Reply-To: <20181009070230.12884-4-cyphar@cyphar.com> From: Jann Horn Date: Tue, 9 Oct 2018 17:19:06 +0200 Message-ID: Subject: Re: [PATCH v3 3/3] namei: aggressively check for nd->root escape on ".." resolution To: cyphar@cyphar.com Cc: Al Viro , "Eric W. Biederman" , jlayton@kernel.org, Bruce Fields , Arnd Bergmann , Andy Lutomirski , David Howells , christian@brauner.io, Tycho Andersen , David Drysdale , dev@opencontainers.org, containers@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, kernel list , linux-arch , Linux API Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 9, 2018 at 9:03 AM Aleksa Sarai wrote: > This patch allows for AT_BENEATH and AT_THIS_ROOT to safely permit ".." > resolution (in the case of AT_BENEATH the resolution will still fail if > ".." resolution would resolve a path outside of the root -- while > AT_THIS_ROOT will chroot(2)-style scope it). "proclink" jumps are still > disallowed entirely because now they could result in inconsistent > behaviour if resolution encounters a subsequent "..". > > The need for this patch is explained by observing there is a fairly > easy-to-exploit race condition with chroot(2) (and thus by extension > AT_THIS_ROOT and AT_BENEATH) where a rename(2) of a path can be used to > "skip over" nd->root and thus escape to the filesystem above nd->root. > > thread1 [attacker]: > for (;;) > renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); > thread2 [victim]: > for (;;) > openat(dirb, "b/c/../../etc/shadow", O_THISROOT); > > With fairly significant regularity, thread2 will resolve to > "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar > (though somewhat more privileged) attack using MS_MOVE. > > With this patch, such cases will be detected *during* ".." resolution > (which is the weak point of chroot(2) -- since walking *into* a > subdirectory tautologically cannot result in you walking *outside* > nd->root -- except through a bind-mount or "proclink"). By detecting > this at ".." resolution (rather than checking only at the end of the > entire resolution) we can both correct escapes by jumping back to the > root (in the case of AT_THIS_ROOT), as well as avoid revealing to > attackers the structure of the filesystem outside of the root (through > timing attacks for instance). > > In order to avoid a quadratic lookup with each ".." entry, we only > activate the slow path if a write through &rename_lock or &mount_lock > have occurred during path resolution (&rename_lock and &mount_lock are > re-taken to further optimise the lookup). Since the primary attack being > protected against is MS_MOVE or rename(2), not doing additional checks > unless a mount or rename have occurred avoids making the common case > slow. > > The use of __d_path here might seem suspect, but on further inspection > of the most important race (a path was *inside* the root but is now > *outside*), there appears to be no attack potential. If __d_path occurs > before the rename, then the path will be resolved but since the path was > originally inside the root there is no escape. Subsequent ".." jumps are > guaranteed to check __d_path reachable (by construction, &rename_lock or > &mount_lock must have been taken after __d_path returned), "after"? Don't you mean "before"? Otherwise I don't understand what you're saying here. > and thus will > not be able to escape from the previously-inside-root path. Walking down > is still safe since the entire subtree was moved (either by rename(2) or > MS_MOVE) and because (as discussed above) walking down is safe. > > Cc: Al Viro > Cc: Jann Horn > Signed-off-by: Aleksa Sarai > --- > fs/namei.c | 79 +++++++++++++++++++++++++++++++++++++++++------------- > 1 file changed, 61 insertions(+), 18 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index b31aef27df22..12cd8d8987ea 100644 > --- a/fs/namei.c > +++ b/fs/namei.c [...] > +static inline int nd_alloc_dpathbuf(struct nameidata *nd) > +{ > + if (unlikely(!nd->dpathbuf)) { > + if (nd->flags & LOOKUP_RCU) { > + nd->dpathbuf = kmalloc(PATH_MAX, GFP_ATOMIC); > + if (unlikely(!nd->dpathbuf)) > + return -ECHILD; > + } else { > + nd->dpathbuf = kmalloc(PATH_MAX, GFP_KERNEL); > + if (unlikely(!nd->dpathbuf)) > + return -ENOMEM; > + } > + } > + return 0; > +} Note that a fixed-size path buffer means that if the path is very long, e.g. because you followed long symlinks on the way down, this can cause lookup failures. > static void drop_links(struct nameidata *nd) > { > int i = nd->depth; > @@ -1738,18 +1756,40 @@ static inline int may_lookup(struct nameidata *nd) > static inline int handle_dots(struct nameidata *nd, int type) > { > if (type == LAST_DOTDOT) { > - /* > - * AT_BENEATH resolving ".." is not currently safe -- races can cause > - * our parent to have moved outside of the root and us to skip over it. > - */ > - if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_CHROOT))) > - return -EXDEV; > + int error = 0; > + > if (!nd->root.mnt) > set_root(nd); > - if (nd->flags & LOOKUP_RCU) { > - return follow_dotdot_rcu(nd); > - } else > - return follow_dotdot(nd); > + if (nd->flags & LOOKUP_RCU) > + error = follow_dotdot_rcu(nd); > + else > + error = follow_dotdot(nd); > + if (error) > + return error; > + > + if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_CHROOT))) { > + char *pathptr; > + bool m_retry = read_seqretry(&mount_lock, nd->m_seq); > + bool r_retry = read_seqretry(&rename_lock, nd->r_seq); > + > + /* Racing rename(2) or MS_MOVE? */ > + if (likely(!m_retry && !r_retry)) > + return 0; > + if (m_retry && !(nd->flags & LOOKUP_RCU)) > + nd->m_seq = read_seqbegin(&mount_lock); > + if (r_retry) > + nd->r_seq = read_seqbegin(&rename_lock); > + > + error = nd_alloc_dpathbuf(nd); > + if (error) > + return error; > + pathptr = __d_path(&nd->path, &nd->root, nd->dpathbuf, PATH_MAX); > + if (unlikely(!pathptr)) > + /* Breakout -- go back to root! */ > + return nd_jump_root(nd); I find the semantics of this check odd - especially in the LOOKUP_BENEATH case, but also in the LOOKUP_CHROOT case. Wouldn't it make more sense to just throw an error here? Making /.. go back to the root is one thing, but making ".." from anything that has escaped from the root go back to the root seems less logical to me. Imagine, for example: Thread A (a webserver or whatever) looks up "example.org/images/foo/../bar.png" under "/var/www" with LOOKUP_BENEATH. Thread B concurrently moves "/var/www/example.org/images" to "/var/backup/example.org/images". Now thread A can accidentally resolve its path to "/var/www/bar.png" if the race happens the wrong way? > + if (unlikely(IS_ERR(pathptr))) > + return PTR_ERR(pathptr); > + } > } > return 0; > }