Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp5023707imm; Tue, 9 Oct 2018 08:37:00 -0700 (PDT) X-Google-Smtp-Source: ACcGV62EUtLG7K3JD4pIvpBfOMXzz7ndf+Mcf3o+w8JK5eC8DhEFIljUZ8tk2pTtYlmp+LNsir0Z X-Received: by 2002:a17:902:4d45:: with SMTP id o5-v6mr28460275plh.42.1539099420849; Tue, 09 Oct 2018 08:37:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539099420; cv=none; d=google.com; s=arc-20160816; b=vJb7u5+BAcnCSN+fYR67oxAHKb01fDEdrCD4/l+jCNpNqlGKasWBlFcyc/ZIN/guMt 4qzKAn3opZS9HAYz5GBrOBNh3YqQrIcLq30Fd6Am6K0zkSCOBAKOqEbIts5CzI9zOePA qzuLrt/DQv1vjyW98mhz7RSt8B1004atsP7K7uEPXTKxOAvCunAlYRjyxuYL4qY9+Xbu vFTYRQlzs4ry9CXGzxX2sgbv0xUN0PjqPX4ak6vaQuReU7iwXCJbqOvQBXz2MRiovuTI 9EfPjmf3AFYcBoNoA/cKy22i06tqgHeqqhwlVuEt6QuQW9EIpVRoR/pzBs/jkSD5u4c0 ClPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=BlMcKCHMda1o36+L02pge5xCQZVnvEmnfUHPn7UC5VM=; b=Z6h3p210W6ZStUbBKADS+3YYmataH2FoUfyYUi37cC1SQrA4ENJGjn8HjIZvwXtdgD 5sqRhwgjTEvb0UEaMnQTSTyonF9IAtfcn+86XFrFPziH1KvacDB/Z70fX7YGpz8Iourf nqAhtmdLTtLESAUk0Nl08gYLhCHV0AoEvQN7yx50Dg/i4xNryecELz+rOcHYL+2maOjg 2B9jebUppHcPJlF1O5iYBqc3hl36aisvbZmS+mDxjmlgxEtYfzi0cRyritgGhP7cYkXZ +M8CL5ijWjCBipZNDLJ52ZXv7mTkhN1COIoZTiI7pQZxYbrpQuZh2NCZ1vk//r384dKc lHgA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m21-v6si20475473pgh.167.2018.10.09.08.36.46; Tue, 09 Oct 2018 08:37:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726935AbeJIWxe (ORCPT + 99 others); Tue, 9 Oct 2018 18:53:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:47008 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726496AbeJIWxd (ORCPT ); Tue, 9 Oct 2018 18:53:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 73D39ACC7; Tue, 9 Oct 2018 15:36:01 +0000 (UTC) Date: Wed, 10 Oct 2018 02:37:28 +1100 From: Aleksa Sarai To: Jann Horn Cc: cyphar@cyphar.com, Al Viro , "Eric W. Biederman" , jlayton@kernel.org, Bruce Fields , Arnd Bergmann , Andy Lutomirski , David Howells , christian@brauner.io, Tycho Andersen , David Drysdale , dev@opencontainers.org, containers@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, kernel list , linux-arch , Linux API Subject: Re: [PATCH v3 3/3] namei: aggressively check for nd->root escape on ".." resolution Message-ID: <20181009153728.2altaqxclntvyc7b@mikami> References: <20181009070230.12884-1-cyphar@cyphar.com> <20181009070230.12884-4-cyphar@cyphar.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="mr4rhp3oxd4353fx" Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --mr4rhp3oxd4353fx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2018-10-09, 'Jann Horn' via dev wrote: > On Tue, Oct 9, 2018 at 9:03 AM Aleksa Sarai wrote: > > This patch allows for AT_BENEATH and AT_THIS_ROOT to safely permit ".." > > resolution (in the case of AT_BENEATH the resolution will still fail if > > ".." resolution would resolve a path outside of the root -- while > > AT_THIS_ROOT will chroot(2)-style scope it). "proclink" jumps are still > > disallowed entirely because now they could result in inconsistent > > behaviour if resolution encounters a subsequent "..". > > > > The need for this patch is explained by observing there is a fairly > > easy-to-exploit race condition with chroot(2) (and thus by extension > > AT_THIS_ROOT and AT_BENEATH) where a rename(2) of a path can be used to > > "skip over" nd->root and thus escape to the filesystem above nd->root. > > > > thread1 [attacker]: > > for (;;) > > renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); > > thread2 [victim]: > > for (;;) > > openat(dirb, "b/c/../../etc/shadow", O_THISROOT); > > > > With fairly significant regularity, thread2 will resolve to > > "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar > > (though somewhat more privileged) attack using MS_MOVE. > > > > With this patch, such cases will be detected *during* ".." resolution > > (which is the weak point of chroot(2) -- since walking *into* a > > subdirectory tautologically cannot result in you walking *outside* > > nd->root -- except through a bind-mount or "proclink"). By detecting > > this at ".." resolution (rather than checking only at the end of the > > entire resolution) we can both correct escapes by jumping back to the > > root (in the case of AT_THIS_ROOT), as well as avoid revealing to > > attackers the structure of the filesystem outside of the root (through > > timing attacks for instance). > > > > In order to avoid a quadratic lookup with each ".." entry, we only > > activate the slow path if a write through &rename_lock or &mount_lock > > have occurred during path resolution (&rename_lock and &mount_lock are > > re-taken to further optimise the lookup). Since the primary attack being > > protected against is MS_MOVE or rename(2), not doing additional checks > > unless a mount or rename have occurred avoids making the common case > > slow. > > > > The use of __d_path here might seem suspect, but on further inspection > > of the most important race (a path was *inside* the root but is now > > *outside*), there appears to be no attack potential. If __d_path occurs > > before the rename, then the path will be resolved but since the path was > > originally inside the root there is no escape. Subsequent ".." jumps are > > guaranteed to check __d_path reachable (by construction, &rename_lock or > > &mount_lock must have been taken after __d_path returned), >=20 > "after"? Don't you mean "before"? Otherwise I don't understand what > you're saying here. I meant that the attacker doing the rename must've taken &rename_lock or &mount_lock after __d_path returns in the target process (because the race being examined is that the rename occurs *after* __d_path) and thus are guaranteed to be detected). Maybe there's a better way to phrase what I mean... > > +static inline int nd_alloc_dpathbuf(struct nameidata *nd) > > +{ > > + if (unlikely(!nd->dpathbuf)) { > > + if (nd->flags & LOOKUP_RCU) { > > + nd->dpathbuf =3D kmalloc(PATH_MAX, GFP_ATOMIC); > > + if (unlikely(!nd->dpathbuf)) > > + return -ECHILD; > > + } else { > > + nd->dpathbuf =3D kmalloc(PATH_MAX, GFP_KERNEL); > > + if (unlikely(!nd->dpathbuf)) > > + return -ENOMEM; > > + } > > + } > > + return 0; > > +} >=20 > Note that a fixed-size path buffer means that if the path is very > long, e.g. because you followed long symlinks on the way down, this > can cause lookup failures. This is already an issue with __d_path (even if the buffer was larger) because it will not output a path longer than PATH_MAX. I imagine this is a pretty strong argument for why we should refactor __d_path so that we can *just* use the escape checking to avoid -ENAMETOOLONG. I can work on this, but I'll wait until after LPC to see what the discussion there brings up. > > + error =3D nd_alloc_dpathbuf(nd); > > + if (error) > > + return error; > > + pathptr =3D __d_path(&nd->path, &nd->root, nd->= dpathbuf, PATH_MAX); > > + if (unlikely(!pathptr)) > > + /* Breakout -- go back to root! */ > > + return nd_jump_root(nd); >=20 > I find the semantics of this check odd - especially in the > LOOKUP_BENEATH case, but also in the LOOKUP_CHROOT case. Wouldn't it > make more sense to just throw an error here? Making /.. go back to the > root is one thing, but making ".." from anything that has escaped from > the root go back to the root seems less logical to me. For AT_BENEATH, nd_jump_root() will always return -EXDEV -- but I'll take your point for AT_THIS_ROOT below. > Thread A (a webserver or whatever) looks up > "example.org/images/foo/../bar.png" under "/var/www" with > LOOKUP_BENEATH. > Thread B concurrently moves "/var/www/example.org/images" to > "/var/backup/example.org/images". > Now thread A can accidentally resolve its path to "/var/www/bar.png" > if the race happens the wrong way? This is a good point. When I changed this from always being -EXDEV in my other postings, I was thinking about "/.." rather than the case you've outlined where ".." is in the path but it's not actually meant to go to the root. I agree -EXDEV makes much more sense here. I will add this to my tree (but I'll wait until after LPC before I send out a new series). --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --mr4rhp3oxd4353fx Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEXzbGxhtUYBJKdfWmnhiqJn3bjbQFAlu8yzUACgkQnhiqJn3b jbTOFg/6A4Zyz4xnpElf3G/74j+95DNPJAXdrePXZOfjSXVtunzBKX4gbtKTaQZ6 nEK220NmPVDiRtkGPprfcxcStE8icUhsLAdimRoUflFK9Kckk6I3gZjdhpMNdd6+ DzgGxWabet4AmSZPtXpbgd9Nf5Ryzxi5BTateq7HPiMMUxVYmE47Sf7FvthCIHqc vkcM7Bx06vxQ/Yz82fdV3tFWkeK1XHW5/GLnB5ftZYFHultPemJ9IxCU8pBaRmEJ VmsG8jOsPi1T5EUMt69lVY/HhmEXXIQRuHEDbq5CWZasQHYs7+YJ/+1zrQAMuNyl 5dGXCnkCEOGoJ3DOfc6uGTNTW8e/ShAdBtIyz4rpOyaR3YDE3PuwO/LUDNa8zdrA 3eFs3iKybl4x2ssTQYq06ZHd53GA5rZIMIVX0Ntz+W0AgSrNJ9sgKcywo5nfG8Ja M/ZOmBokvg5h+RYK373oNK5Cf578ZZZfrJPPtfPMVZ7UsQ9mp9sp8nDiLJX/JLPi ct0KCPK4jwPNaMiWhdZZUvYoDwT8i6jFuQCERg54WKSVMOQYkTSW3eP9ubBMSwSF h9FXVQxZYAwp33DfUmZGQlTPqCrUzTjI3FJioLBhbDOY+qEPV+iod0gX0SylejaD qUlBUx0G9RFnI7v9Ar0IJ2zYO7TOEaFj2awFSnRAc32MG1Wznfk= =kcHG -----END PGP SIGNATURE----- --mr4rhp3oxd4353fx--