Received: by 10.223.185.116 with SMTP id b49csp3434171wrg; Tue, 13 Feb 2018 02:21:10 -0800 (PST) X-Google-Smtp-Source: AH8x224NWCoRVt6pNcmg/34K5rHLR0D8eG8xueaT5npHO9EZMcSba4XFgiytkB+PAlC8LISUJHqd X-Received: by 10.99.149.4 with SMTP id p4mr629464pgd.0.1518517270341; Tue, 13 Feb 2018 02:21:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518517270; cv=none; d=google.com; s=arc-20160816; b=WiB8RGoTGQPiz5Bd4GcI5Kn8y1WHoGAjajQC+bYuWeSfHmx6hJ66fVJ+JPC1Rhl7WL xm3eWnIB6l/iKn7YxfEz4QEbSgfbWfttR2H2Z+oNOVnMBPIUsmYywRWbLK0RpMlS8EGB HQcKYhigoucHQfKyjjLDHTh/AsDBsTvuDXi2EeUyfD2dq+JMaSr41xhmXTkLsD/GMDmb aDbFq8pfKZPMr3HcvKIOFn2zvlLCr0FbSWpaIU9OZRT8kMCcl7ndXgtHQnLGcX+Hsn69 CA+YuZ1SXFKMSBWv7vyITSkMSJRzt7tsZfDa3WmKyCZKDtUlqGVD8I4w8tkFiokzmTjD 9LpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:arc-authentication-results; bh=JsfkxtGzPaBrzt9SSHu9sdcLeGmVFyHdP5KtgQ4bJJM=; b=rtP4pTgLJ4DjUHTaGf9yye141Rz7M+gd6xAHkP25lcYrLMNi+duQdIV3/5Fx+zFLMF PbK3iNBw67D8unK6DatOzH7COzmDu5z9BXVVj51HDqEy7QC5QevRWZtKzOnGn7+Qsapi Mm8VG36o3pd2Khw2CPlEuM7SosmsHhrXCZq4No9/3xyhv5kyOCp5DxVGdpsy0Y9GkZ+B E/77ICw0IPDZ6YZY9Jtzg4ZDr9cF3MvoxlbYRDDfmucgjC0nTvaztwUw0SbU9rly0pBT DW5F2BKtLUJzIHhIDGeBZ6IWd6/309tiWEmOFimd5KYQLNY6TDicsusbziz/L6H3RWDC LHeA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i4si1292483pgt.117.2018.02.13.02.20.54; Tue, 13 Feb 2018 02:21:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934008AbeBMKUL (ORCPT + 99 others); Tue, 13 Feb 2018 05:20:11 -0500 Received: from mail-qk0-f169.google.com ([209.85.220.169]:37279 "EHLO mail-qk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933931AbeBMKUJ (ORCPT ); Tue, 13 Feb 2018 05:20:09 -0500 Received: by mail-qk0-f169.google.com with SMTP id c128so21896879qkb.4 for ; Tue, 13 Feb 2018 02:20:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=JsfkxtGzPaBrzt9SSHu9sdcLeGmVFyHdP5KtgQ4bJJM=; b=FE7Jpa4ZnfYXxuW+F7I/p3PZQm4s4QXGZyqxuK7HZ9/ZxMi8hBlhfnpubee3SC59Fa fxG6xgi6a1INwmV2Mb1qwVGXXX0jd1sgXNX+bhQopYfbOl0aO5fbq5RlL4rvof02ywIP IRYuoNN7SopnLpe5zuM9jmBHdX7PicEsvBnpbPSxp6oocl46ILzg0XIOAGyRuKrYNbtd LW+rjG5JN7/iH7qoX3nFejgxttytAYqNm+MXjC9re2nhjwEwBa2b6KAhlUl8LtG8/ZKt H25wjpmpKuS7tBHdrQx8+jt3en2P3BFgiLByU9y7iQsVEFZ+WkLb9Pha1M++HdWY1nrr sxSQ== X-Gm-Message-State: APf1xPBqAZxJcjuyzXQzj4nU9x16vg6zgc2z0eorZq+Rt9vkj2RIEjs+ ziDTA6TWBUzWFQmW6jd4+06Aycx/ZJyphgDRkEuqgw== X-Received: by 10.55.176.3 with SMTP id z3mr1048540qke.298.1518517208127; Tue, 13 Feb 2018 02:20:08 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.95.17 with HTTP; Tue, 13 Feb 2018 02:20:07 -0800 (PST) In-Reply-To: <87lgfy5fpd.fsf@xmission.com> References: <87lgfy5fpd.fsf@xmission.com> From: Miklos Szeredi Date: Tue, 13 Feb 2018 11:20:07 +0100 Message-ID: Subject: Re: [PATCH 08/11] fuse: Support fuse filesystems outside of init_user_ns To: "Eric W. Biederman" Cc: Dongsu Park , lkml , containers@lists.linux-foundation.org, Alban Crequy , Seth Forshee , Sargun Dhillon , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 12, 2018 at 5:35 PM, Eric W. Biederman wrote: > Miklos Szeredi writes: > >> On Fri, Dec 22, 2017 at 3:32 PM, Dongsu Park wrote: >>> From: Seth Forshee >>> >>> In order to support mounts from namespaces other than >>> init_user_ns, fuse must translate uids and gids to/from the >>> userns of the process servicing requests on /dev/fuse. This >>> patch does that, with a couple of restrictions on the namespace: >>> >>> - The userns for the fuse connection is fixed to the namespace >>> from which /dev/fuse is opened. >>> >>> - The namespace must be the same as s_user_ns. >>> >>> These restrictions simplify the implementation by avoiding the >>> need to pass around userns references and by allowing fuse to >>> rely on the checks in inode_change_ok for ownership changes. >>> Either restriction could be relaxed in the future if needed. >> >> Can we not introduce potential userspace interface regressions? >> >> The issue with pid namespaces fixed in commit 5d6d3a301c4e ("fuse: >> allow server to run in different pid_ns") will probably bite us here >> as well. > > Maybe, but unlike the pid namespace no one has been able to mount > fuse outside of init_user_ns so we are much less exposed. I agree we > should be careful. Have to wrap my head around all the rules here. There's the may_mount() one: ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN) Um, first of all, why isn't it checking current->cred->user_ns? Ah, there it is in sget(): ns_capable(user_ns, CAP_SYS_ADMIN) I get the plain capable(CAP_SYS_ADMIN) check in sget_userns() if fs doesn't have FS_USERNS_MOUNT. This is the one that prevents fuse mounts from being created when (current->cred->user_ns != &init_user_ns). Maybe there's a logic to this web of namespaces, but I don't yet see it. Is it documented somewhere? >> We basically need two modes of operation: >> >> a) old, backward compatible (not introducing any new failure mores), >> created with privileged mount >> b) new, non-backward compatible, created with unprivileged mount >> >> Technically there would still be a risk from breaking userspace, since >> we are using the same entry point for both, but let's hope that no >> practical problems come from that. > > Answering from a 10,000 foot perspective: > > There are two cases. Requests to read/write the filesystem from outside > of s_user_ns. These run no risk of breaking userspace as this mode has > not been implemented before. This comes from the fact that (s_user_ns == &init_user_ns) and all user namespaces are "inside" init_user_ns, right? One question: why does current code use the from_[ug]id_munged() variant, when the conversion can never fail. Or can it? > Restrictions at mount time to ensure we are not dealing with a crazy mix > of namespaces. This has a small chance of breaking someone's crazy > setup. > > > Dropping requests to read/write the filesystem when the requester does > not map into s_user_ns should not be a problem to enable universally. If > s_user_ns is init_user_ns everything maps so there is no restriction. > > > > What we can do if we want to ensure maximum backwards compatibility > is if the fuse filesystem is mounted in init_user_ns but if device for > the communication channel is opened in some other user namespace we > can just force the communication channel to operate in init_user_ns. > > That will be 100% backwards compatible in all cases and as far as I can > see remove the need for having different ``modes'' of operation. Okay. Thanks, Miklos