Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp3212047imm; Sun, 30 Sep 2018 14:48:03 -0700 (PDT) X-Google-Smtp-Source: ACcGV63jJdTK6R5ot3pWPngashYHkkc6E+D08LI1IfGsZRbdIn3G48Kz35HDYxFx8WfemPG8KBDk X-Received: by 2002:a17:902:bf0a:: with SMTP id bi10-v6mr6152250plb.163.1538344083420; Sun, 30 Sep 2018 14:48:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538344083; cv=none; d=google.com; s=arc-20160816; b=U1QDBkX8yki/G+bwmsBuTWAb+h7LQRN1gAqZWDPRqqF0HJyy+Dhp0Av/WyQv4zhnto WAIPvCYw9VpizpYxsC0vcd6EHQbCWcWVaOsQpaOnM8NHpDyj9aWAGd+M9teO7upn1AYP B5f8uPWZqXYcJPMZLSOk8uyd2Uio/dEZpp6I5Zf/9PBqFdemj14wyGu5p/GKd/fZIxhO DrMsAw5CPCksWX1d3RGtqR/moFcBny3RDxIQlmQd4O5zyEN+Kqcb1aKBS6Pg+NyjDJp4 LLRHgAopU3DPc3FBbq41FNlqh7yZrBbT07NTSCFeC+uhigFit7IzsdxmO9lsqdnfFybb T3HA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=4eLJSmpVq05G3oAfZ2oUeKHayRkNOjswTF65wFdXrz8=; b=Il4UI7dO6emwkDGgyrqBHmHXE/2hIYRUsxAsKJmxyfdH0igUwYbiWVt8UMfDKlUnYp tJ5vm+6Xt+WPlUkLFsTLq56n15NrVaXc16+ZmZLtuSs6/yWVMNCAd4nQhxMM78XFGZhe W/92ptn5SEEjjMXJuDNb71j83O6x6UPr3PiEIM/Y61mBozmKO6WqtG9oYCMrm9PRRzyW NjXYU37EGVi5Dhdy4dnBMD+kfeIBmt54aV+Iqdj0VwcsejxyGC1jiYzBgIW7Kd95d8Ge f4+srqT6NSXMzpLTsBh3iNkttmGcwabf/iRmzsR11DqowCAFKWbauk9REQoK8xLugkAB DLAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=LUaJrmNU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si11032892plj.15.2018.09.30.14.47.14; Sun, 30 Sep 2018 14:48:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=LUaJrmNU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726364AbeJAEVq (ORCPT + 99 others); Mon, 1 Oct 2018 00:21:46 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:46087 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726204AbeJAEVp (ORCPT ); Mon, 1 Oct 2018 00:21:45 -0400 Received: by mail-ot1-f67.google.com with SMTP id q4-v6so11059211otf.13 for ; Sun, 30 Sep 2018 14:47:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=4eLJSmpVq05G3oAfZ2oUeKHayRkNOjswTF65wFdXrz8=; b=LUaJrmNUPcMw54nzeU4nude71Q03D/rn5qmEV98kbaFygiopTalR8aPGlcMsw2K5Di DnRVek5GbNLmuCrH2M9ugJqh/m9PPOocuY8SmzNT7Ma31gS2Z3B/3CMtoLcz7+q27HES 4YE8J3jVFjwfPUBuAI4LgLY94zdN6lGRDNgv/6tyIwmPh6DbwX3IP4NPV1IRdCo+BqDG /3wpl/LGKX2VJGHAPlNY81WZ5r3AQMZFrCMS0EMJBndzZ5MscFGAPopNV5Tc+ccvp9fC bqbMa75O8NyWRwpICT9IQ9vsNGAQgDTQ2CCMys48VUt0/gCo5fUvmYUjiAakAq+XmSHh 2t0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=4eLJSmpVq05G3oAfZ2oUeKHayRkNOjswTF65wFdXrz8=; b=iLbN2WVKjeUB51t3BulwrqimM7U15tMscEZ4cl1gNPhm4hjD9wyTu1Qle3FxPzPNnO OYOYCMef87qseaiTlL68UGexmJVVE8peHCWngX9wMe1FvlYVaZJYCOaBZhS1cUrRyzIl sQPzQLHv54gfMEOFw5BaMM5C9WxSCEhOhgpWd+S6lGgsdlJI8mrMzuulED3BjkXlmVvR JQrPCAPzO4TfmGq4gHyxAvAWP9OfQ4k6HWiFP+j9PTJy4ipY2ddJPW1ErzHssz+K4dyv b6oQCcdB9ZyMI6iJEL4HEWWsVuNcmfHMkhoVa7GATBx0x/aYvQz1sRoE9HU5IC3h3aNR p2fw== X-Gm-Message-State: ABuFfogxlr5Q9/Nk3/dBBbTF/IcAE7/YmQ2Rw4xlRaXXt81e+FyrHgwv JAMn2Xpl+dLGxWp4dBKIDV1EVzArtjf4Is20QdgIwQ== X-Received: by 2002:a9d:4e94:: with SMTP id v20-v6mr5136924otk.255.1538344027382; Sun, 30 Sep 2018 14:47:07 -0700 (PDT) MIME-Version: 1.0 References: <20180929103453.12025-1-cyphar@cyphar.com> <39d64180-73d5-6f27-e455-956143a5b5d3@digikod.net> In-Reply-To: <39d64180-73d5-6f27-e455-956143a5b5d3@digikod.net> From: Jann Horn Date: Sun, 30 Sep 2018 23:46:41 +0200 Message-ID: Subject: Re: [PATCH 0/3] namei: implement various scoping AT_* flags To: mic@digikod.net Cc: cyphar@cyphar.com, jlayton@kernel.org, Bruce Fields , Al Viro , Arnd Bergmann , shuah@kernel.org, David Howells , Andy Lutomirski , christian@brauner.io, "Eric W. Biederman" , Tycho Andersen , kernel list , linux-fsdevel@vger.kernel.org, linux-arch , linux-kselftest@vger.kernel.org, dev@opencontainers.org, containers@lists.linux-foundation.org, linux-security-module , Kees Cook Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Sep 30, 2018 at 10:39 PM Micka=C3=ABl Sala=C3=BCn = wrote: > As a side note, I'm still working on Landlock which can achieve the same > goal but in a more flexible and dynamic way: https://landlock.io Isn't Landlock mostly intended for userspace that wants to impose a custom Mandatory Access Control policy on itself, restricting the whole process? As far as I can tell, a major usecase for AT_BENEATH are privileged processes that do not want to restrict all filesystem operations they perform, but want to sometimes impose limits on filesystem traversal for the duration of a single system call. For example, a process might want to first open a file from an untrusted filesystem area with AT_BENEATH, and afterwards open a configuration file without AT_BENEATH. How would you do this in Landlock? Use a BPF map to store per-thread filesystem restrictions, and then do bpf() calls before and after every restricted filesystem access to set and unset the policy for the current syscall? > On 9/29/18 12:34, Aleksa Sarai wrote: > > The need for some sort of control over VFS's path resolution (to avoid > > malicious paths resulting in inadvertent breakouts) has been a very > > long-standing desire of many userspace applications. This patchset is a > > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > > in the original thread, along with a further split of AT_NO_PROCLINKS > > which means that each individual property of AT_NO_JUMPS is now a > > separate flag: > > > > * Path-based escapes from the starting-point using "/" or ".." are > > blocked by AT_BENEATH. > > * Mountpoint crossings are blocked by AT_XDEV. > > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > > correctly it actually blocks any user of nd_jump_link() because i= t > > allows out-of-VFS path resolution manipulation). > > > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > > Linus' suggestion in the original thread, I've also implemented > > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > > "proclink" resolution). > > > > An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > > path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > > blocks this as well (feel free to ask me to remove it if you feel this > > is not sane). > > > > Currently I've only enabled these for openat(2) and the stat(2) family. > > I would hope we could enable it for basically every *at(2) syscall -- > > but many of them appear to not have a @flags argument and thus we'll > > need to add several new syscalls to do this. I'm more than happy to sen= d > > those patches, but I'd prefer to know that this preliminary work is > > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > > syscalls. > > > > One additional feature I've implemented is AT_THIS_ROOT (I imagine this > > is probably going to be more contentious than the refresh of > > AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > > describes my reasoning, but the shortened version of the premise is tha= t > > continer runtimes need to have a way to resolve paths within a > > potentially malicious rootfs. Container runtimes currently do this in > > userspace[2] which has implicit race conditions that are not resolvable > > in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > > inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics fo= r > > path resolution, which would be invaluable for us -- and the > > implementation is basically identical to AT_BENEATH (except that we > > don't return errors when someone actually hits the root). > > > > I've added some selftests for this, but it's not clear to me whether > > they should live here or in xfstests (as far as I can tell there are no > > other VFS tests in selftests, while there are some tests that look like > > generic VFS tests in xfstests). If you'd prefer them to be included in > > xfstests, let me know. > > > > [1]: https://lore.kernel.org/patchwork/patch/784221/ > > [2]: https://github.com/cyphar/filepath-securejoin > > > > Aleksa Sarai (3): > > namei: implement O_BENEATH-style AT_* flags > > namei: implement AT_THIS_ROOT chroot-like path resolution > > selftests: vfs: add AT_* path resolution tests > > > > fs/fcntl.c | 2 +- > > fs/namei.c | 158 ++++++++++++------ > > fs/open.c | 10 ++ > > fs/stat.c | 15 +- > > include/linux/fcntl.h | 3 +- > > include/linux/namei.h | 8 + > > include/uapi/asm-generic/fcntl.h | 20 +++ > > include/uapi/linux/fcntl.h | 10 ++ > > tools/testing/selftests/Makefile | 1 + > > tools/testing/selftests/vfs/.gitignore | 1 + > > tools/testing/selftests/vfs/Makefile | 13 ++ > > tools/testing/selftests/vfs/at_flags.h | 40 +++++ > > tools/testing/selftests/vfs/common.sh | 37 ++++ > > .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > > .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > > .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > > .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > > .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > > tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > > 19 files changed, 707 insertions(+), 56 deletions(-) > > create mode 100644 tools/testing/selftests/vfs/.gitignore > > create mode 100644 tools/testing/selftests/vfs/Makefile > > create mode 100644 tools/testing/selftests/vfs/at_flags.h > > create mode 100644 tools/testing/selftests/vfs/common.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.s= h > > create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_procli= nks.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlin= ks.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root= .sh > > create mode 100644 tools/testing/selftests/vfs/vfs_helper.c > > >