Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp1210880pxx; Fri, 30 Oct 2020 05:02:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyzYzWYRVMV4mugD66iptIBzo1dZavkZyRXGARmqCQTB/738p99TuiI+U05PKfARlx3Eehu X-Received: by 2002:a17:906:c20f:: with SMTP id d15mr2008884ejz.341.1604059373702; Fri, 30 Oct 2020 05:02:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1604059373; cv=none; d=google.com; s=arc-20160816; b=LlD7nbxey24IFLuzPdAcE2eNzZReMcyvHOB3LHUYiOrPnM5pI/F5tF+eSMY/S15pIt 5xRuzBcS7yusd2VNow2s+MeHYlUPazc+yjO17ix5HwxH53H5UtktwjiDQlOpCYekPF34 Fn/V7WCi4nguvN4O4tTO/bIG6f/MjaYXtss8jXK+SoqN+YoVxf1OWA3lfocl5TWLMD/x Xl4SM45XQwdqm/OOPBup+xCA8MGDbz81jPjwzsxKtvL6FIgGNzf1ZCvUNHZA2DSpsh7V wl7GNAlkB8P1A7MwfWvhKTIjxoCeq5c/FDyIpX9JjZMRvWUwLhExwg6S4/uHvrj032HD i7UQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=44CgPRnPw/dIhPu5+KNZof+bXqX4rbHtQQnAiHNAAtU=; b=LEj9OXwZolOw5qbXQw6rbFmJkBEzFiLgBa15XtqaiI5GkEcekPa7lX/3AVQugvJLvp SAknkS343/8KZY5ZyuRZp6F7rnzfGBaa1a1g7dLaAzeVkeKWrsKXNdf5dD1X7chmm3h/ 8yMogisyfeyh6krTe+DHs8cXpzw58OiUrX8KxEejlJAz/dSXI+/LFPhaD7MDIIZXGTdB WnWsmeHT5fmMm8YdoQENoX2ukHqm9KiSH2Fo7A5mu7FAI9fFwvL+YyLO8HP5ZsQUIktA 4Kup5PlbQaShsjMFLZyay/rIIBRCkYtktHW6pX/gdFGvqd4LWGMCYJUFiLIQ34QLBpE2 vibw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g1si3902691ejf.271.2020.10.30.05.02.24; Fri, 30 Oct 2020 05:02:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726413AbgJ3MCJ (ORCPT + 99 others); Fri, 30 Oct 2020 08:02:09 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:39709 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725808AbgJ3MCJ (ORCPT ); Fri, 30 Oct 2020 08:02:09 -0400 Received: from ip5f5af0a0.dynamic.kabel-deutschland.de ([95.90.240.160] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kYT6V-0006JV-Px; Fri, 30 Oct 2020 12:01:59 +0000 Date: Fri, 30 Oct 2020 13:01:57 +0100 From: Christian Brauner To: Andy Lutomirski Cc: Alexander Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, John Johansen , James Morris , Mimi Zohar , Dmitry Kasatkin , Stephen Smalley , Casey Schaufler , Arnd Bergmann , Andreas Dilger , OGAWA Hirofumi , Geoffrey Thomas , Mrunal Patel , Josh Triplett , Andy Lutomirski , Amir Goldstein , Miklos Szeredi , Theodore Tso , Alban Crequy , Tycho Andersen , David Howells , James Bottomley , Jann Horn , Seth Forshee , =?utf-8?B?U3TDqXBoYW5l?= Graber , Aleksa Sarai , Lennart Poettering , "Eric W. Biederman" , smbarber@chromium.org, Phil Estes , Serge Hallyn , Kees Cook , Todd Kjos , Jonathan Corbet , containers@lists.linux-foundation.org, linux-security-module@vger.kernel.org, linux-api@vger.kernel.org, linux-ext4@vger.kernel.org, linux-unionfs@vger.kernel.org, linux-audit@redhat.com, linux-integrity@vger.kernel.org, selinux@vger.kernel.org Subject: Re: [PATCH 00/34] fs: idmapped mounts Message-ID: <20201030120157.exz4rxmebruh7bgp@wittgenstein> References: <20201029003252.2128653-1-christian.brauner@ubuntu.com> <8E455D54-FED4-4D06-8CB7-FC6291C64259@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8E455D54-FED4-4D06-8CB7-FC6291C64259@amacapital.net> Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, Oct 29, 2020 at 02:58:55PM -0700, Andy Lutomirski wrote: > > > > On Oct 28, 2020, at 5:35 PM, Christian Brauner wrote: > > > > Hey everyone, > > > > I vanished for a little while to focus on this work here so sorry for > > not being available by mail for a while. > > > > Since quite a long time we have issues with sharing mounts between > > multiple unprivileged containers with different id mappings, sharing a > > rootfs between multiple containers with different id mappings, and also > > sharing regular directories and filesystems between users with different > > uids and gids. The latter use-cases have become even more important with > > the availability and adoption of systemd-homed (cf. [1]) to implement > > portable home directories. > > > > The solutions we have tried and proposed so far include the introduction > > of fsid mappings, a tiny overlay based filesystem, and an approach to > > call override creds in the vfs. None of these solutions have covered all > > of the above use-cases. > > > > The solution proposed here has it's origins in multiple discussions > > during Linux Plumbers 2017 during and after the end of the containers > > microconference. > > To the best of my knowledge this involved Aleksa, Stéphane, Eric, David, > > James, and myself. A variant of the solution proposed here has also been > > discussed, again to the best of my knowledge, after a Linux conference > > in St. Petersburg in Russia between Christoph, Tycho, and myself in 2017 > > after Linux Plumbers. > > I've taken the time to finally implement a working version of this > > solution over the last weeks to the best of my abilities. Tycho has > > signed up for this sligthly crazy endeavour as well and he has helped > > with the conversion of the xattr codepaths. > > > > The core idea is to make idmappings a property of struct vfsmount > > instead of tying it to a process being inside of a user namespace which > > has been the case for all other proposed approaches. > > It means that idmappings become a property of bind-mounts, i.e. each > > bind-mount can have a separate idmapping. This has the obvious advantage > > that idmapped mounts can be created inside of the initial user > > namespace, i.e. on the host itself instead of requiring the caller to be > > located inside of a user namespace. This enables such use-cases as e.g. > > making a usb stick available in multiple locations with different > > idmappings (see the vfat port that is part of this patch series). > > > > The vfsmount struct gains a new struct user_namespace member. The > > idmapping of the user namespace becomes the idmapping of the mount. A > > caller that is either privileged with respect to the user namespace of > > the superblock of the underlying filesystem or a caller that is > > privileged with respect to the user namespace a mount has been idmapped > > with can create a new bind-mount and mark it with a user namespace. > > So one way of thinking about this is that a user namespace that has an idmapped mount can, effectively, create or chown files with *any* on-disk uid or gid by doing it directly (if that uid exists in-namespace, which is likely for interesting ids like 0) or by creating a new userns with that id inside. > > For a file system that is private to a container, this seems moderately safe, although this may depend on what exactly “private” means. We probably want a mechanism such that, if you are outside the namespace, a reference to a file with the namespace’s vfsmnt does not confer suid privilege. > > Imagine the following attack: user creates a namespace with a root user and arranges to get an idmapped fs, e.g. by inserting an ext4 usb stick or using whatever container management tool does this. Inside the namespace, the user creates a suid-root file. > > Now, outside the namespace, the user has privilege over the namespace. (I’m assuming there is some tool that will idmap things in a namespace owned by an unprivileged user, which seems likely.). So the user makes a new bind mount and if maps it to the init namespace. Game over. > > So I think we need to have some control to mitigate this in a comprehensible way. A big hammer would be to require nosuid. A smaller hammer might be to say that you can’t create a new idmapped mount unless you have privilege over the userns that you want to use for the idmap and to say that a vfsmnt’s paths don’t do suid outside the idmap namespace. We already do the latter for the vfsmnt’s mntns’s userns. With this series, in order to create an idmapped mount the user must either be cap_sys_admin in the superblock of the underlying filesystem or if the mount is already idmapped and they want to create another idmapped mount from it they must have cap_sys_admin in the userns that the mount is currrently marked with. It is also not possible to change an idmapped mount once it has been idmapped, i.e. the user must create a new detached bind-mount first. > > Hmm. What happens if we require that an idmap userns equal the vfsmnt’s mntns’s userns? Is that too limiting? > > I hope that whatever solution gets used is straightforward enough to wrap one’s head around. > > > When a file/inode is accessed through an idmapped mount the i_uid and > > i_gid of the inode will be remapped according to the user namespace the > > mount has been marked with. When a new object is created based on the > > fsuid and fsgid of the caller they will similarly be remapped according > > to the user namespace of the mount they care created from. > > By “mapped according to”, I presume you mean that the on-disk uid/gid is the gid as seen in the user namespace in question. If I understand you correctly, then yes.