2020-12-14 09:38:56

by Sargun Dhillon

[permalink] [raw]
Subject: [PATCH RESEND v5 0/2] NFS: Fix interaction between fs_context and user namespaces

This is a resend[2] for consideration into the next NFS client merge window.

Right now, it is possible to mount NFS with an non-matching super block
user ns, and NFS sunrpc user ns. This (for the user) results in an awkward
set of interactions if using anything other than auth_null, where the UIDs
being sent to the server are different than the local UIDs being checked.
This can cause "breakage", where if you try to communicate with the NFS
server with any other set of mappings, it breaks.

The reason for this is that you can call fsopen("nfs4") in the unprivileged
namespace, and that configures fs_context with all the right information
for that user namespace. In addition, it also keeps a gets a cred object
associated with the caller -- which should match the user namespace.
Unfortunately, the mount has to be finished in the init_user_ns because we
currently require CAP_SYS_ADMIN in the init user namespace to call fsmount.
This means that the superblock's user namespace is set "correctly" to the
container, but there's absolutely no way nfs4idmap to consume an
unprivileged user namespace because the cred / user_ns that's passed down
to nfs4idmap is the one at fsmount.

How this actually exhibits is let's say that the UID 0 in the user
namespace is mapped to UID 1000 in the init user ns (and kuid space). What
will happen is that nfs4idmap will translate the UID 1000 into UID 0 on the
wire, even if the mount is in entirely in the mount / user namespace of the
container.

So, it looks something like this
Client in unprivileged User NS (UID: 0, KUID: 0)
->Perform open()
...VFS / NFS bits...
nfs_map_uid_to_name ->
from_kuid_munged(init_user_ns, uid) (returns 0)
RPC with UID 0

This behaviour happens "the other way" as well, where the UID in the
container may be 0, but the corresponding kuid is 1000. When a response
from an NFS server comes in we decode it according to the idmap userns.
The way this exhibits is even more odd.

Server responds with file attribute (UID: 0, GID: 0)
->nfs_map_name_to_uid(..., 0)
->make_kuid(init_user_ns, id) (returns 0)
....VFS / NFS Bits...
->from_kuid(container_ns, 0) -> invalid uid
-> EOVERFLOW

This changes the nfs server to use the cred / userns from fs_context, which
is how idmap is constructed. This subsequently is used in the above
described flow of converting uids back-and-forth.

Trond gave the feedback that this behaviour [implemented by this patch] is
how the legacy sys_mount() behaviour worked[1], and that the intended
behaviour is for UIDs to be plumbed through entirely, where the user
namespaces UIDs are what is sent over the wire, and not the init user ns.

[1]: https://lore.kernel.org/linux-nfs/[email protected]/
[2]: https://lore.kernel.org/linux-nfs/[email protected]/

Sargun Dhillon (2):
NFS: NFSv2/NFSv3: Use cred from fs_context during mount
NFSv4: Refactor to use user namespaces for nfs4idmap

fs/nfs/client.c | 4 ++--
fs/nfs/nfs4client.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

--
2.25.1


2020-12-14 09:39:26

by Sargun Dhillon

[permalink] [raw]
Subject: [PATCH RESEND v5 1/2] NFS: NFSv2/NFSv3: Use cred from fs_context during mount

There was refactoring done to use the fs_context for mounting done in:
62a55d088cd87: NFS: Additional refactoring for fs_context conversion

This made it so that the net_ns is fetched from the fs_context (the netns
that fsopen is called in). This change also makes it so that the credential
fetched during fsopen is used as well as the net_ns.

NFS has already had a number of changes to prepare it for user namespaces:
1a58e8a0e5c1: NFS: Store the credential of the mount process in the nfs_server
264d948ce7d0: NFS: Convert NFSv3 to use the container user namespace
c207db2f5da5: NFS: Convert NFSv2 to use the container user namespace

Previously, different credentials could be used for creation of the
fs_context versus creation of the nfs_server, as FSCONFIG_CMD_CREATE did
the actual credential check, and that's where current_creds() were fetched.
This meant that the user namespace which fsopen was called in could be a
non-init user namespace. This still requires that the user that calls
FSCONFIG_CMD_CREATE has CAP_SYS_ADMIN in the init user ns.

This roughly allows a privileged user to mount on behalf of an unprivileged
usernamespace, by forking off and calling fsopen in the unprivileged user
namespace. It can then pass back that fsfd to the privileged process which
can configure the NFS mount, and then it can call FSCONFIG_CMD_CREATE
before switching back into the mount namespace of the container, and finish
up the mounting process and call fsmount and move_mount.

Signed-off-by: Sargun Dhillon <[email protected]>
Tested-by: Alban Crequy <[email protected]>
---
fs/nfs/client.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 4b8cc93913f7..1e6f3b3ed445 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -571,7 +571,7 @@ static int nfs_start_lockd(struct nfs_server *server)
1 : 0,
.net = clp->cl_net,
.nlmclnt_ops = clp->cl_nfs_mod->rpc_ops->nlmclnt_ops,
- .cred = current_cred(),
+ .cred = server->cred,
};

if (nlm_init.nfs_version > 3)
@@ -985,7 +985,7 @@ struct nfs_server *nfs_create_server(struct fs_context *fc)
if (!server)
return ERR_PTR(-ENOMEM);

- server->cred = get_cred(current_cred());
+ server->cred = get_cred(fc->cred);

error = -ENOMEM;
fattr = nfs_alloc_fattr();
--
2.25.1

2020-12-14 09:41:31

by Sargun Dhillon

[permalink] [raw]
Subject: [PATCH RESEND v5 2/2] NFSv4: Refactor to use user namespaces for nfs4idmap

In several patches work has been done to enable NFSv4 to use user
namespaces:
58002399da65: NFSv4: Convert the NFS client idmapper to use the container user namespace
3b7eb5e35d0f: NFS: When mounting, don't share filesystems between different user namespaces

Unfortunately, the userspace APIs were only such that the userspace facing
side of the filesystem (superblock s_user_ns) could be set to a non init
user namespace. This furthers the fs_context related refactoring, and
piggybacks on top of that logic, so the superblock user namespace, and the
NFS user namespace are the same.

Users can still use rpc.idmapd if they choose to, but there are complexities
with user namespaces and request-key that have yet to be addresssed.

Eventually, we will need to at least:
* Separate out the keyring cache by namespace
* Come up with an upcall mechanism that can be triggered inside of the container,
or safely triggered outside, with the requisite context to do the right
mapping. * Handle whatever refactoring needs to be done in net/sunrpc.

Signed-off-by: Sargun Dhillon <[email protected]>
Tested-by: Alban Crequy <[email protected]>
---
fs/nfs/nfs4client.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index be7915c861ce..86acffe7335c 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -1153,7 +1153,7 @@ struct nfs_server *nfs4_create_server(struct fs_context *fc)
if (!server)
return ERR_PTR(-ENOMEM);

- server->cred = get_cred(current_cred());
+ server->cred = get_cred(fc->cred);

auth_probe = ctx->auth_info.flavor_len < 1;

--
2.25.1