Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp5109020imm; Tue, 9 Oct 2018 09:48:10 -0700 (PDT) X-Google-Smtp-Source: ACcGV6146MzPhi5QJGNgiNrLXzWixfHm7TkIoQc6OHA/YS77opN3uPynMo4uqTUztBacNwk3ocP8 X-Received: by 2002:a62:6414:: with SMTP id y20-v6mr30939444pfb.213.1539103690310; Tue, 09 Oct 2018 09:48:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539103690; cv=none; d=google.com; s=arc-20160816; b=u5Zsh8qs3gHNwEuVAU1qbdmhpiMaAkzMgCBmKDP5yCgPfun5KSojJFdFNHk7usagE+ 9v5Gpv0QJ10yS6i8mIGlNTN446pmG6tXEh/ZwN4c6YyiyRz247Z1g/9ucXBdcPlbF5az 2FCFo+d34QZOiVgCStCbrom2TqwP4caDOSvuLiCC6wSKCGBw8/6qPjgugBlMfe1UHIYT RWNt6VajFQ348eT0EWENUi4BGNrSospzISyK9HTqKORg+i4toI+vxSOM8yhw0mAZu2o3 9QqX5/y45WzFhg+Go00ATeXo2hNTpmlcqyKzwa6c11GPXajmh5i/k0ZQLY5SAErQ5YfK MBMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=o1BaTp0M17T5kDBODSTuCLVUFELZt/q2avjOti0PYLY=; b=HJclO2RA+rVzPVaWqoHdOOT23z6FO4Qr0qyK3fSL8HiDkB+7NvzxPgV5vGgJib8Ai+ L7R9EtXMiSXxKSmchDWZ9WIc8A0uCWHR6Tf2K4V09YWCTMLKTjCTkbA3Hj1lNE9xdHjU cHJQFhrdHm+ZmqBuCwxdWWnZMvZNGQGI8nplf9aWw7bk6w1H8qttajOHOUjSQBHAJHD6 UhMo4Y71/0/kyePq655c2fE+UKZibcdxudD64+HPfJenfJSnbQCGHYrqfLev9qEYi0lR ouvjCUS0+dlz+f0kH5Tt9nCOHhs+9ZFUQOkwWVOlEJKVyT+bIY1njDrG+yAmefWxlRdm 51+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=crxD6DoB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j22-v6si21568289pll.86.2018.10.09.09.47.56; Tue, 09 Oct 2018 09:48:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=crxD6DoB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727006AbeJJAE7 (ORCPT + 99 others); Tue, 9 Oct 2018 20:04:59 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:41553 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726984AbeJJAE6 (ORCPT ); Tue, 9 Oct 2018 20:04:58 -0400 Received: by mail-ot1-f67.google.com with SMTP id c32so2339392otb.8 for ; Tue, 09 Oct 2018 09:47:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=o1BaTp0M17T5kDBODSTuCLVUFELZt/q2avjOti0PYLY=; b=crxD6DoBtPnPEOuJNZPkKla1pwqNuNqXpUmC5B1ClvaF3Zc6kneA2xe2YDNetmXkAG o4irsIJ+OuE9FIYqZBRT/1e+wB9/5mNtfvcpNnNSLbOUOW29tl+YvB6fAgyl7XkRITBQ v5vUhHAeRpatJjhonLcLlCjJ3e4IBy6yfU2Y7V4S2dyNmeswTJKr6RUAByVf5F7X38HK kHnteWGa0c8Ead0C3emXCMouacYfzoxFu5zHRkK4evYyZ2n/7ER+du21Vt5VaHEB/DI7 xCQAxv3b1kJ+8X3bScXWS1KiLhRt7leU8SihcBoy7n3fTvn5X0Lxcln5USEioF8AjbPp 8voA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=o1BaTp0M17T5kDBODSTuCLVUFELZt/q2avjOti0PYLY=; b=az9tZXROerl2dNG38ob1V7KTBxI7o9F8NDyTVCrd+gR9LPyKb2UeS3X6sidQz+lN6E smB9vFgbzN6A1BoKydwQPf2Rj7Tfr09APhzoVrYOCNT6BCUlHaAvYvoMDwba9GtydX1C lUFWOiarEFJ1/Wtwc1DAZ//jQoSL7nUaizcKicyn0s2iILyHaerYbTw7yFXCULXC9X9h YWY4t7CKG1DpSDq3qnDRV7nHd8FbvdTWNtmRem0/FumKNTn+Al19NH2nTfJ2lncmVVCc dhDuqr9Cgw9Xr119eWt0iRg03NuIIflpwF5luIt2oPMxTkEqmaDu6sonDDlIOEEYhOCn CcjQ== X-Gm-Message-State: ABuFfogsgDohzXJPU3JK0yrVNFaOJhpOlpfD/V41m641bq0trIIFcudn nQoK3tprPFdztIi17BrOzoR1C+JdNQfqvbf6PAZYJA== X-Received: by 2002:a9d:5733:: with SMTP id p48mr16358823oth.292.1539103627631; Tue, 09 Oct 2018 09:47:07 -0700 (PDT) MIME-Version: 1.0 References: <20181009070230.12884-1-cyphar@cyphar.com> <20181009070230.12884-4-cyphar@cyphar.com> <20181009153728.2altaqxclntvyc7b@mikami> In-Reply-To: <20181009153728.2altaqxclntvyc7b@mikami> From: Jann Horn Date: Tue, 9 Oct 2018 18:46:41 +0200 Message-ID: Subject: Re: [PATCH v3 3/3] namei: aggressively check for nd->root escape on ".." resolution To: asarai@suse.de Cc: cyphar@cyphar.com, Al Viro , "Eric W. Biederman" , jlayton@kernel.org, Bruce Fields , Arnd Bergmann , Andy Lutomirski , David Howells , christian@brauner.io, Tycho Andersen , David Drysdale , dev@opencontainers.org, containers@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, kernel list , linux-arch , Linux API Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 9, 2018 at 5:36 PM Aleksa Sarai wrote: > On 2018-10-09, 'Jann Horn' via dev wrote: > > On Tue, Oct 9, 2018 at 9:03 AM Aleksa Sarai wrote: > > > This patch allows for AT_BENEATH and AT_THIS_ROOT to safely permit ".." > > > resolution (in the case of AT_BENEATH the resolution will still fail if > > > ".." resolution would resolve a path outside of the root -- while > > > AT_THIS_ROOT will chroot(2)-style scope it). "proclink" jumps are still > > > disallowed entirely because now they could result in inconsistent > > > behaviour if resolution encounters a subsequent "..". > > > > > > The need for this patch is explained by observing there is a fairly > > > easy-to-exploit race condition with chroot(2) (and thus by extension > > > AT_THIS_ROOT and AT_BENEATH) where a rename(2) of a path can be used to > > > "skip over" nd->root and thus escape to the filesystem above nd->root. > > > > > > thread1 [attacker]: > > > for (;;) > > > renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); > > > thread2 [victim]: > > > for (;;) > > > openat(dirb, "b/c/../../etc/shadow", O_THISROOT); > > > > > > With fairly significant regularity, thread2 will resolve to > > > "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar > > > (though somewhat more privileged) attack using MS_MOVE. > > > > > > With this patch, such cases will be detected *during* ".." resolution > > > (which is the weak point of chroot(2) -- since walking *into* a > > > subdirectory tautologically cannot result in you walking *outside* > > > nd->root -- except through a bind-mount or "proclink"). By detecting > > > this at ".." resolution (rather than checking only at the end of the > > > entire resolution) we can both correct escapes by jumping back to the > > > root (in the case of AT_THIS_ROOT), as well as avoid revealing to > > > attackers the structure of the filesystem outside of the root (through > > > timing attacks for instance). > > > > > > In order to avoid a quadratic lookup with each ".." entry, we only > > > activate the slow path if a write through &rename_lock or &mount_lock > > > have occurred during path resolution (&rename_lock and &mount_lock are > > > re-taken to further optimise the lookup). Since the primary attack being > > > protected against is MS_MOVE or rename(2), not doing additional checks > > > unless a mount or rename have occurred avoids making the common case > > > slow. > > > > > > The use of __d_path here might seem suspect, but on further inspection > > > of the most important race (a path was *inside* the root but is now > > > *outside*), there appears to be no attack potential. If __d_path occurs > > > before the rename, then the path will be resolved but since the path was > > > originally inside the root there is no escape. Subsequent ".." jumps are > > > guaranteed to check __d_path reachable (by construction, &rename_lock or > > > &mount_lock must have been taken after __d_path returned), > > > > "after"? Don't you mean "before"? Otherwise I don't understand what > > you're saying here. > > I meant that the attacker doing the rename must've taken &rename_lock > or &mount_lock after __d_path returns in the target process (because the > race being examined is that the rename occurs *after* __d_path) and thus > are guaranteed to be detected). > > Maybe there's a better way to phrase what I mean... Aah, I thought you were referring to what the victim process is doing, not what the racing attacker is doing.