Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp2331460ybf; Mon, 2 Mar 2020 06:35:37 -0800 (PST) X-Google-Smtp-Source: APXvYqy9PQw/NO4xKsA90qHbZLrP6gWAPmOJIbepwZWlA7qL2fdqhmL3S45WQoZga49GnDjDVTb4 X-Received: by 2002:a9d:6655:: with SMTP id q21mr7601572otm.70.1583159737441; Mon, 02 Mar 2020 06:35:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583159737; cv=none; d=google.com; s=arc-20160816; b=X8z/QXlbl8UAa+BJVoMLU9svq4TjU7mtBcTMw5E/JTMElkqiWds4yLujYfwsv3iicP FjbrlPB+qoJ/+qC8sN7LlEKBAPacZB6POUuvOemwpPXON5zyQgNMVnAAydF5g2yW9Wa6 c2FpqKDK4qi/+OPx1yzFWKHo55PVNuirPl7C8UnNK1MdNC1gF5HA4mmEY8bqOI0LpfWv RgZM5SiLPGAdKfPehTcBovmr+42lCyiv+KtrpkTJMWb8+GkmV/lPlu7K3ZouR6B+5DRI KI0rB8sfWvT7KLrWfLQucZH471lq2TYsar7tef5xZ++DtMxnsTjfDkcTRa62Ie8gnkuM A3bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=GzjuX8RaIb/i8Psa0tSpDtVWIrum0KNghb9ZhXSNSvc=; b=ZPl4KrJSZCrfHt8B6EUQjc/YBjqfLWOz1snqgHSOT68dGw/uBbfrTappjLdswfeYfP uUCGDnU66XitUWwJq6o7D086ldjunlgQUOKk50rd4Ec7782OyX5k/lDwqiew+8+9nRea EGlFz9PCQYtYOXzezbeR9yWH5a5Vz3pWUXtwnzRCpouSSqFv/dTdxVQq0HHFZFFgF25Z eT2FrSvea9HbrISoOLpdknPpnkxpcaAZEBC2I0vIXNcgVU2oFBrcnlG1g0x6/q+p44qg Li11/NFJ/foPkRXzMxAGhvQ2UsRsZ2+OC5vOZVaee6AW3MHKac0FQ3fsjT/QdXbgdq+q vJLw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o20si6255785otl.60.2020.03.02.06.35.22; Mon, 02 Mar 2020 06:35:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727196AbgCBOeJ (ORCPT + 99 others); Mon, 2 Mar 2020 09:34:09 -0500 Received: from mail.hallyn.com ([178.63.66.53]:46164 "EHLO mail.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727053AbgCBOeI (ORCPT ); Mon, 2 Mar 2020 09:34:08 -0500 Received: by mail.hallyn.com (Postfix, from userid 1001) id 5E334B1A; Mon, 2 Mar 2020 08:34:05 -0600 (CST) Date: Mon, 2 Mar 2020 08:34:05 -0600 From: "Serge E. Hallyn" To: Josef Bacik Cc: Christian Brauner , =?iso-8859-1?Q?St=E9phane?= Graber , "Eric W. Biederman" , Aleksa Sarai , Jann Horn , smbarber@chromium.org, Seth Forshee , Alexander Viro , Alexey Dobriyan , Serge Hallyn , James Morris , Kees Cook , Jonathan Corbet , Phil Estes , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, containers@lists.linux-foundation.org, linux-security-module@vger.kernel.org, linux-api@vger.kernel.org, mpawlowski@fb.com Subject: Re: [PATCH v3 00/25] user_namespace: introduce fsid mappings Message-ID: <20200302143405.GA25432@mail.hallyn.com> References: <20200218143411.2389182-1-christian.brauner@ubuntu.com> <2b0fe94b-036a-919e-219b-cc1ba0641781@toxicpanda.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2b0fe94b-036a-919e-219b-cc1ba0641781@toxicpanda.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 27, 2020 at 02:33:04PM -0500, Josef Bacik wrote: > On 2/18/20 9:33 AM, Christian Brauner wrote: > > Hey everyone, > > > > This is v3 after (off- and online) discussions with Jann the following > > changes were made: > > - To handle nested user namespaces cleanly, efficiently, and with full > > backwards compatibility for non fsid-mapping aware workloads we only > > allow writing fsid mappings as long as the corresponding id mapping > > type has not been written. > > - Split the patch which adds the internal ability in > > kernel/user_namespace to verify and write fsid mappings into tree > > patches: > > 1. [PATCH v3 04/25] fsuidgid: add fsid mapping helpers > > patch to implement core helpers for fsid translations (i.e. > > make_kfs*id(), from_kfs*id{_munged}(), kfs*id_to_k*id(), > > k*id_to_kfs*id() > > 2. [PATCH v3 05/25] user_namespace: refactor map_write() > > patch to refactor map_write() in order to prepare for actual fsid > > mappings changes in the following patch. (This should make it > > easier to review.) > > 3. [PATCH v3 06/25] user_namespace: make map_write() support fsid mappings > > patch to implement actual fsid mappings support in mape_write() > > - Let the keyctl infrastructure only operate on kfsid which are always > > mapped/looked up in the id mappings similar to what we do for > > filesystems that have the same superblock visible in multiple user > > namespaces. > > > > This version also comes with minimal tests which I intend to expand in > > the future. > > > > From pings and off-list questions and discussions at Google Container > > Security Summit there seems to be quite a lot of interest in this > > patchset with use-cases ranging from layer sharing for app containers > > and k8s, as well as data sharing between containers with different id > > mappings. I haven't Cced all people because I don't have all the email > > adresses at hand but I've at least added Phil now. :) > > > I put this into a kernel for our container guys to mess with in order to > validate it would actually be useful for real world uses. I've cc'ed the > guy who did all of the work in case you have specific questions. > > Good news is the interface is acceptable, albeit apparently the whole user > ns interface sucks in general. But you haven't made it worse, so success! Well I very much disagree here :) With the first part! But I do understand the shortcomings. Anyway, I still hope we get to talk about this in person, but IMO this is the right approach (this being - thinking about how to make the uid mappings more flexible without making them too complicated to be safe to use), but a bit too static in terms of target. There are at least two ways that I could see usefully generalizing it From a user space pov, the following goal is indespensible (for my use cases): that the fsuid be selectable based on fs, mountpoint, or file context (as in selinux). From a userns pov, one way to look at it is this: when task t1 signals task t2, it's not only t1's namespace that's considered when filling in the sender uid, but also t2's. Likewise, when writing a file, we should consider both t1's fsuid+userns, and the file's, mount's, or filesystem's userns. From that POV, your patch is a step in the right direction and could be taken as is (modulo any tmpfs fix Josef needs :) From there I would propose adding a 'userns=' bind mount option, so we could create an empty userns with the desired mapping (subject to permissions granted by subuids), get an fd to the uidns, and say mount --bind -o uidns=5 /shared /containers/c1/mnt/shared So now when I write a file /etc/hosts as container fsuid 0, it'll be subject to the container rootfs mount's uid mapping, presumably 100000. When I write /mnt/shared/hello, it'll be subject to the mount's uid mapping, which might be 1000. -serge