Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp1409387pxx; Fri, 30 Oct 2020 09:17:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy/TAovpzQEVdxNCmDcSdwbI2emjR8wDbk7oXm35Hmt0ug+MYZ5hnyXIKsBAiuLQtRSw25b X-Received: by 2002:a17:907:43c0:: with SMTP id ok24mr3214885ejb.385.1604074670395; Fri, 30 Oct 2020 09:17:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1604074670; cv=none; d=google.com; s=arc-20160816; b=T1u6xT8TrmhzS7vc+B4lSoNKxarlUUoFsbvrNSxXmskpfsSEUbKTSW+S112IY7irLk f0woKB46wXbSPPO3LAQMAoIsOa2jXHmdLkx6W5LERgMHSFms8TLJrdLnV+m6F3EfPigz cLFETjrV1pt/EDq+A6MEYmPJtkB22OoAIUjVKFDpnvgUn8PNpdd6PsHR3gSt350fyUQd ZeKJZvD5A7YhMdJKG6muJkLPsfz7k/nVctziQZheQYiSQgRTqEdz52slpxVB0fYH1AJh dEZGk1LmPNQDMXdbMMw/SBku68ubsUXMkD0eh1s0YcyVvs0yB6nrkLZ/4SC6T5GetNUm xpWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=PkJzqVKzWmtPaxpGuDyMiVuhv1gx25p11zGuFkn+KFI=; b=u1pCX+NR1ePQWq4id0OJqJGE3sUs1qHdk0zktijN27xJpcl0gQ1AZVlklR1X96KAU1 eBiPuei/6Rk8qiYc0Tc1QUaI7OYhDFlgXARpL758iLlANlhwsFfF7EQfXpiB4bH1tTy6 jWa8/0UDsZkycorP4sGvQgND+FXyToGGo+/w/3Kyt4UMfUVf8YnPoioCMrey2kXffSKE ACp98aIo2XgMt5G+dqFS6WSFioygH+51221yTAXGEdTl7cduZ3LIPhICqYRSYUkZd1j5 vwo+3yo4ugQwZVSWgIfCo8rft/zP1RgCZZlvj5LIbUvwCP/4c41KGzuhwT4IbLTXUmDt qQig== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mm8si3535598ejb.724.2020.10.30.09.17.22; Fri, 30 Oct 2020 09:17:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725939AbgJ3QRR (ORCPT + 99 others); Fri, 30 Oct 2020 12:17:17 -0400 Received: from mail.hallyn.com ([178.63.66.53]:57770 "EHLO mail.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725808AbgJ3QRR (ORCPT ); Fri, 30 Oct 2020 12:17:17 -0400 Received: by mail.hallyn.com (Postfix, from userid 1001) id D02C58FF; Fri, 30 Oct 2020 11:17:12 -0500 (CDT) Date: Fri, 30 Oct 2020 11:17:12 -0500 From: "Serge E. Hallyn" To: Christian Brauner Cc: Andy Lutomirski , Alexander Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, John Johansen , James Morris , Mimi Zohar , Dmitry Kasatkin , Stephen Smalley , Casey Schaufler , Arnd Bergmann , Andreas Dilger , OGAWA Hirofumi , Geoffrey Thomas , Mrunal Patel , Josh Triplett , Andy Lutomirski , Amir Goldstein , Miklos Szeredi , Theodore Tso , Alban Crequy , Tycho Andersen , David Howells , James Bottomley , Jann Horn , Seth Forshee , =?iso-8859-1?Q?St=E9phane?= Graber , Aleksa Sarai , Lennart Poettering , "Eric W. Biederman" , smbarber@chromium.org, Phil Estes , Serge Hallyn , Kees Cook , Todd Kjos , Jonathan Corbet , containers@lists.linux-foundation.org, linux-security-module@vger.kernel.org, linux-api@vger.kernel.org, linux-ext4@vger.kernel.org, linux-unionfs@vger.kernel.org, linux-audit@redhat.com, linux-integrity@vger.kernel.org, selinux@vger.kernel.org Subject: Re: [PATCH 00/34] fs: idmapped mounts Message-ID: <20201030161712.GA30381@mail.hallyn.com> References: <20201029003252.2128653-1-christian.brauner@ubuntu.com> <8E455D54-FED4-4D06-8CB7-FC6291C64259@amacapital.net> <20201030120157.exz4rxmebruh7bgp@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20201030120157.exz4rxmebruh7bgp@wittgenstein> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, Oct 30, 2020 at 01:01:57PM +0100, Christian Brauner wrote: > On Thu, Oct 29, 2020 at 02:58:55PM -0700, Andy Lutomirski wrote: > > > > > > > On Oct 28, 2020, at 5:35 PM, Christian Brauner wrote: > > > > > > Hey everyone, > > > > > > I vanished for a little while to focus on this work here so sorry for > > > not being available by mail for a while. > > > > > > Since quite a long time we have issues with sharing mounts between > > > multiple unprivileged containers with different id mappings, sharing a > > > rootfs between multiple containers with different id mappings, and also > > > sharing regular directories and filesystems between users with different > > > uids and gids. The latter use-cases have become even more important with > > > the availability and adoption of systemd-homed (cf. [1]) to implement > > > portable home directories. > > > > > > The solutions we have tried and proposed so far include the introduction > > > of fsid mappings, a tiny overlay based filesystem, and an approach to > > > call override creds in the vfs. None of these solutions have covered all > > > of the above use-cases. > > > > > > The solution proposed here has it's origins in multiple discussions > > > during Linux Plumbers 2017 during and after the end of the containers > > > microconference. > > > To the best of my knowledge this involved Aleksa, Stéphane, Eric, David, > > > James, and myself. A variant of the solution proposed here has also been > > > discussed, again to the best of my knowledge, after a Linux conference > > > in St. Petersburg in Russia between Christoph, Tycho, and myself in 2017 > > > after Linux Plumbers. > > > I've taken the time to finally implement a working version of this > > > solution over the last weeks to the best of my abilities. Tycho has > > > signed up for this sligthly crazy endeavour as well and he has helped > > > with the conversion of the xattr codepaths. > > > > > > The core idea is to make idmappings a property of struct vfsmount > > > instead of tying it to a process being inside of a user namespace which > > > has been the case for all other proposed approaches. > > > It means that idmappings become a property of bind-mounts, i.e. each > > > bind-mount can have a separate idmapping. This has the obvious advantage > > > that idmapped mounts can be created inside of the initial user > > > namespace, i.e. on the host itself instead of requiring the caller to be > > > located inside of a user namespace. This enables such use-cases as e.g. > > > making a usb stick available in multiple locations with different > > > idmappings (see the vfat port that is part of this patch series). > > > > > > The vfsmount struct gains a new struct user_namespace member. The > > > idmapping of the user namespace becomes the idmapping of the mount. A > > > caller that is either privileged with respect to the user namespace of > > > the superblock of the underlying filesystem or a caller that is > > > privileged with respect to the user namespace a mount has been idmapped > > > with can create a new bind-mount and mark it with a user namespace. > > > > So one way of thinking about this is that a user namespace that has an idmapped mount can, effectively, create or chown files with *any* on-disk uid or gid by doing it directly (if that uid exists in-namespace, which is likely for interesting ids like 0) or by creating a new userns with that id inside. > > > > For a file system that is private to a container, this seems moderately safe, although this may depend on what exactly “private” means. We probably want a mechanism such that, if you are outside the namespace, a reference to a file with the namespace’s vfsmnt does not confer suid privilege. > > > > Imagine the following attack: user creates a namespace with a root user and arranges to get an idmapped fs, e.g. by inserting an ext4 usb stick or using whatever container management tool does this. Inside the namespace, the user creates a suid-root file. > > > > Now, outside the namespace, the user has privilege over the namespace. (I’m assuming there is some tool that will idmap things in a namespace owned by an unprivileged user, which seems likely.). So the user makes a new bind mount and if maps it to the init namespace. Game over. > > > > So I think we need to have some control to mitigate this in a comprehensible way. A big hammer would be to require nosuid. A smaller hammer might be to say that you can’t create a new idmapped mount unless you have privilege over the userns that you want to use for the idmap and to say that a vfsmnt’s paths don’t do suid outside the idmap namespace. We already do the latter for the vfsmnt’s mntns’s userns. > > With this series, in order to create an idmapped mount the user must > either be cap_sys_admin in the superblock of the underlying filesystem > or if the mount is already idmapped and they want to create another > idmapped mount from it they must have cap_sys_admin in the userns that > the mount is currrently marked with. It is also not possible to change > an idmapped mount once it has been idmapped, i.e. the user must create a > new detached bind-mount first. Yeah I spent quite some time last night trying to figure out the scenario you were presenting, but I failed. Andy, could you either rephrase or give a more concrete end to end attack scenario?