Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp587049pxb; Wed, 11 Nov 2020 10:59:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJxopm8gQ0zfOwXEvPYB7MUei3dvX6RQCOq18WfVI+0E7Aowup6AbIwLDffTGv3GG6iSIhZC X-Received: by 2002:a17:906:4698:: with SMTP id a24mr27836380ejr.90.1605121153657; Wed, 11 Nov 2020 10:59:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605121153; cv=none; d=google.com; s=arc-20160816; b=IILSQpUxd9Ul1ECoeQFl5dJHLVTN7/1gQHTjzzbfLIZRrr3nqn2PbtI3nY+jy5Jzrb h6lUjUl5clojm+CTIFe/gvvIW54AvEDyzM2JTemgl6JSqddnD0avhCvdDWPd24vl0hX3 XxbVb8w6bznVc1znjnjAZdWxong8vU8gkZoWXoVWdjKHcm3IO0KCLYUbPTjlhW2SmUCQ 4YJRs2yMIFrDFD/g/04J9GY6CFiwF8MUaAMzTXyos/I+M6DKak9MAP1Xhfl2cNJAeswt 6gDZTYwDglUjZkt7ujPkC/+Qeoohdpj2FjQ8inDXDAOuyVe7z+BTNPFlyDKGre9GXdTK 6iqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=8DKuDB5LEEiDs+HKBVBurFjvTqncsTKUmAwonf+c4kk=; b=ElWSubEZJeWegJUhYqodfiKUMUqDMi2ScvBXFu03nuza/Bxkw6Y0HVzLkaNKlstbhR kXhxBw1OOCt5Sk2kCfah6vCDV09+C9I2QetS4CxnZCNbkX+cEmgJQ8eTxyowdfLamvsL bPFKCh69PIPBhJ5XSjcGbQPwWWhYGnr2sM2sxPPDXd2uCjA5ccGJgRaVTuQuY9lGNm9e GknhjBIgYqYtMuumyyDpl0Pm1AMS5ovd5/f44m06sUq4mz28yiBjqHUnJy4VbgSvH+SZ oIwlsMTmba1niFCWyuYiq0gBfEfLJOmXJnimt/sMrC64Jyvm+3S0d365Des7/AiVPE5X S8Tw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b=iQC2M+yd; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u24si1855684eje.619.2020.11.11.10.58.39; Wed, 11 Nov 2020 10:59:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@sargun.me header.s=google header.b=iQC2M+yd; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725916AbgKKS5e (ORCPT + 99 others); Wed, 11 Nov 2020 13:57:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725995AbgKKS5d (ORCPT ); Wed, 11 Nov 2020 13:57:33 -0500 Received: from mail-io1-xd44.google.com (mail-io1-xd44.google.com [IPv6:2607:f8b0:4864:20::d44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B07ECC0613D1 for ; Wed, 11 Nov 2020 10:57:31 -0800 (PST) Received: by mail-io1-xd44.google.com with SMTP id s24so3325220ioj.13 for ; Wed, 11 Nov 2020 10:57:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=8DKuDB5LEEiDs+HKBVBurFjvTqncsTKUmAwonf+c4kk=; b=iQC2M+ydj+IdRnNKU1Bq4uRd1PSu/wbVcdQBf/5ys2NMbeX6thY4YmEJOZBEeBiILh bf3uCKWYh7jhgAQGH0n0+GgfS2nvC00u4GyzBFiKncZIH25Gxp8PGRWGZ+XRwVG3Wbnb 8i/7dGOqJzh3h/k+JvYVjHlomliOq6hqoDeb0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=8DKuDB5LEEiDs+HKBVBurFjvTqncsTKUmAwonf+c4kk=; b=cWiu926oA01JNE5yvZQujHLyHAmioO5tPzllMaYcdvqUJe2dnAW9HFQceBxZFZgTIC SMp7Bys4SnBCqcXZwdOx2/jRusJN6x1IN+EvzhgobLzgiGSZfdjvgEwpp7egewfo+yKr Nl3voj0z3icUIwV7RLWoS7q4ZkbIEEq77sPZXGvjlymJJvDSHMmsEskq5SrmTdBqoNAP z0k8IPTR9SlVCNR0UDH2pvXFRTnvtLNFiIQgOft71KUpKW8EIcRsGj3sjmjD4rEMoHEE xh3pS5UAMVzH+ibZKJjMazZ3PYKuMkdRkkwuACT4H7Eupb3/ElY/b0uLnAC2OZK3947V eQMg== X-Gm-Message-State: AOAM533GJ3VrPDhFESavPdEPIHIjSTToIeetmbiqpPcjMl2XL4jwupy0 +gTBGmHw6dKRVGbGKwi5FgWSWg== X-Received: by 2002:a02:6a59:: with SMTP id m25mr21258822jaf.132.1605121050913; Wed, 11 Nov 2020 10:57:30 -0800 (PST) Received: from ircssh-2.c.rugged-nimbus-611.internal (80.60.198.104.bc.googleusercontent.com. [104.198.60.80]) by smtp.gmail.com with ESMTPSA id n4sm1565046iox.6.2020.11.11.10.57.30 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 11 Nov 2020 10:57:30 -0800 (PST) Date: Wed, 11 Nov 2020 18:57:28 +0000 From: Sargun Dhillon To: Trond Myklebust Cc: "anna.schumaker@netapp.com" , "smayhew@redhat.com" , "chuck.lever@oracle.com" , "linux-fsdevel@vger.kernel.org" , "dhowells@redhat.com" , "schumaker.anna@gmail.com" , "alban.crequy@gmail.com" , "linux-kernel@vger.kernel.org" , "mauricio@kinvolk.io" , "bfields@fieldses.org" , "linux-nfs@vger.kernel.org" Subject: Re: [PATCH v4 0/2] NFS: Fix interaction between fs_context and user namespaces Message-ID: <20201111185727.GA27945@ircssh-2.c.rugged-nimbus-611.internal> References: <20201102174737.2740-1-sargun@sargun.me> <20201111111233.GA21917@ircssh-2.c.rugged-nimbus-611.internal> <8feccf45f6575a204da03e796391cc135283eb88.camel@hammerspace.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8feccf45f6575a204da03e796391cc135283eb88.camel@hammerspace.com> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Nov 11, 2020 at 02:38:11PM +0000, Trond Myklebust wrote: > On Wed, 2020-11-11 at 11:12 +0000, Sargun Dhillon wrote: > > On Tue, Nov 10, 2020 at 08:12:01PM +0000, Trond Myklebust wrote: > > > On Tue, 2020-11-10 at 17:43 +0100, Alban Crequy wrote: > > > > Hi, > > > > > > > > I tested the patches on top of 5.10.0-rc3+ and I could mount an > > > > NFS > > > > share with a different user namespace. fsopen() is done in the > > > > container namespaces (user, mnt and net namespaces) while > > > > fsconfig(), > > > > fsmount() and move_mount() are done on the host namespaces. The > > > > mount > > > > on the host is available in the container via mount propagation > > > > from > > > > the host mount. > > > > > > > > With this, the files on the NFS server with uid 0 are available > > > > in > > > > the > > > > container with uid 0. On the host, they are available with uid > > > > 4294967294 (make_kuid(&init_user_ns, -2)). > > > > > > > > > > Can someone please tell me what is broken with the _current_ design > > > before we start trying to push "fixes" that clearly break it? > > Currently the mechanism of mounting nfs4 in a user namespace is as > > follows: > > > > Parent: fork() > > Child: setns(userns) > > C: fsopen("nfs4") = 3 > > C->P: Send FD 3 > > P: FSConfig... > > P: fsmount... (This is where the CAP_SYS_ADMIN check happens)) > > > > > > Right now, when you mount an NFS filesystem in a non-init user > > namespace, and you have UIDs / GIDs on, the UIDs / GIDs which > > are sent to the server are not the UIDs from the mounting namespace, > > instead they are the UIDs from the init user ns. > > > > The reason for this is that you can call fsopen("nfs4") in the > > unprivileged > > namespace, and that configures fs_context with all the right > > information for > > that user namespace, but we currently require CAP_SYS_ADMIN in the > > init user > > namespace to call fsmount. This means that the superblock's user > > namespace is > > set "correctly" to the container, but there's absolutely no way > > nfs4uidmap > > to consume an unprivileged user namespace. > > > > This behaviour happens "the other way" as well, where the UID in the > > container > > may be 0, but the corresponding kuid is 1000. When a response from an > > NFS > > server comes in we decode it according to the idmap userns[1]. The > > userns > > used to get create idmap is generated at fsmount time, and not as > > fsopen > > time. So, even if the filesystem is in the user namespace, and the > > server > > responds with UID 0, it'll come up with an unmapped UID. > > > > This is because we do > > Server UID 0 -> idmap make_kuid(init_user_ns, 0) -> VFS > > from_kuid(container_ns, 0) -> invalid uid > > > > This is broken behaviour, in my humble opinion as is it makes it > > impossible to > > use NFSv4 (and v3 for that matter) out of the box with unprivileged > > user > > namespaces. At least in our environment, using usernames / GSS isn't > > an option, > > so we have to rely on UIDs being set correctly [at least from the > > container's > > perspective]. > > > > The current code for setting server->cred was developed independently > of fsopen() (and predates it actually). I'm fine with the change to > have server->cred be the cred of the user that called fsopen(). That's > in line with what we used to do for sys_mount(). > Just curious, without FS_USERNS, how were you mounting NFSv4 in an unprivileged user ns? > However all the other stuff to throw errors when the user namespace is > not init_user_ns introduces massive regressions. > I can remove that and respin the patch. How do you feel about that? I would still like to keep the log lines though because it is a uapi change. I am worried that someone might exercise this path with GSS and allow for upcalls into the main namespaces by accident -- or be confused of why they're seeing upcalls "in a different namespace". Are you okay with picking up ("NFS: NFSv2/NFSv3: Use cred from fs_context during mount") without any changes? I can respin ("NFSv4: Refactor NFS to use user namespaces") without: /* * nfs4idmap is not fully isolated by user namespaces. It is currently * only network namespace aware. If upcalls never happen, we do not * need to worry as nfs_client instances aren't shared between * user namespaces. */ if (idmap_userns(server->nfs_client->cl_idmap) != &init_user_ns && !(server->caps & NFS_CAP_UIDGID_NOMAP)) { error = -EINVAL; errorf(fc, "Mount credentials are from non init user namespace and ID mapping is enabled. This is not allowed."); goto error; } (and making it so we can call idmap_userns) > > > > > > The current design assumes that the user namespace being used is > > > the one where > > > the mount itself is performed. That means that the uids and gids or > > > usernames > > > and groupnames that go on the wire match the uids and gids of the > > > container in > > > which the mount occurred. > > > > > > > Right now, NFS does not have the ability for the fsmount() call to be > > called in an unprivileged user namespace. We can change that > > behaviour > > elsewhere if we want, but it's orthogonal to this. > > > > > The assumption is that the server has authenticated that client as > > > belonging to a domain that it recognises (either through strong > > > RPCSEC_GSS/krb5 authentication, or through weaker matching of IP > > > addresses to a list of acceptable clients). > > > > > I added a rejection for upcalls because upcalls can happen in the > > init > > namespaces. We can drop that restriction from the nfs4 patch if you'd > > like. I > > *believe* (and I'm not a little out of my depth) that the request-key > > handler gets called with the *network namespace* of the NFS mount, > > but the userns is a privileged one, allowing for potential hazards. > > > > The idmapper already rejects upcalls to the keyring '/sbin/request-key' > utility if you're running with your own user namespace. > > Quite frankly, switching to using the keyring was a mistake which I'd > undo if I could. Aside from not supporting containers, it is horribly > slow due to requiring a full process startup/teardown for every upcall, > so it scales poorly to large numbers of identities (particularly with > an operation like readdir() in which you're doing serial upcalls). > > However nothing stops you from using the old NFSv4 idmapper daemon > (a.k.a. rpc.idmapd) in the context of the container that called > fsopen() so that it can translate identities correctly using whatever > userspace tools (ldap, sssd, winbind...) that the container has > configured. > 1. We see this as a potential security risk [this being upcalls] into the unconfined portion of the system. Although, I'm sure that the userspace handlers are written perfectly well, it allows for information leakage to occur. 2. Is there a way to do this for NFSv3? 3. Can rpc.idmapd get the user namespace that the call is from (and is the keyring per-userns?). In general, I think that this change follows the principal of least surprise. > > The reason I added that block there is that I didn't imagine anyone > > was running > > NFS in an unprivileged user namespace, and relying on upcalls > > (potentially into > > privileged namespaces) in order to do authz. > > > > > > > If you go ahead and change the user namespace on the client without > > > going through the mount process again to mount a different super > > > block > > > with a different user namespace, then you will now get the exact > > > same > > > behaviour as if you do that with any other filesystem. > > > > Not exactly, because other filesystems *only* use the s_user_ns for > > conversion > > of UIDs, whereas NFS uses the currend_cred() acquired at mount time, > > which > > doesn't match s_user_ns, leading to this behaviour. > > > > 1. Mistranslated UIDs in encoding RPCs > > 2. The UID / GID exposed to VFS do not match the user ns. > > > > > > > > -- > > > Trond Myklebust > > > Linux NFS client maintainer, Hammerspace > > > trond.myklebust@hammerspace.com > > > > > > > > -Thanks, > > Sargun > > > > [1]: > > https://elixir.bootlin.com/linux/v5.9.8/source/fs/nfs/nfs4idmap.c#L782 > > [2]: > > https://elixir.bootlin.com/linux/v5.9.8/source/fs/nfs/nfs4client.c#L1154 > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >