2020-10-16 13:15:33

by Sargun Dhillon

[permalink] [raw]
Subject: [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API

This patchset adds some functionality to allow NFS to be used from
NFS namespaces (containers).

Changes since v1:
* Added samples

Sargun Dhillon (3):
NFS: Use cred from fscontext during fsmount
samples/vfs: Split out common code for new syscall APIs
samples/vfs: Add example leveraging NFS with new APIs and user
namespaces

fs/nfs/client.c | 2 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
fs/nfs/nfs4client.c | 2 +-
samples/vfs/.gitignore | 2 +
samples/vfs/Makefile | 5 +-
samples/vfs/test-fsmount.c | 86 +-----------
samples/vfs/test-nfs-userns.c | 181 +++++++++++++++++++++++++
samples/vfs/vfs-helper.c | 43 ++++++
samples/vfs/vfs-helper.h | 55 ++++++++
9 files changed, 289 insertions(+), 88 deletions(-)
create mode 100644 samples/vfs/test-nfs-userns.c
create mode 100644 samples/vfs/vfs-helper.c
create mode 100644 samples/vfs/vfs-helper.h

--
2.25.1


2020-10-16 13:16:09

by Sargun Dhillon

[permalink] [raw]
Subject: [RESEND PATCH v2 3/3] samples/vfs: Add example leveraging NFS with new APIs and user namespaces

This adds an example which assumes you already have an NFS server setup,
but does the work of creating a user namespace, and an NFS mount from
that user namespace which then exposes different UIDs than that of
the init user namespace.

Signed-off-by: Sargun Dhillon <[email protected]>
Cc: J. Bruce Fields <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kyle Anderson <[email protected]>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
samples/vfs/.gitignore | 2 +
samples/vfs/Makefile | 3 +-
samples/vfs/test-nfs-userns.c | 181 +++++++++++++++++++++++++
4 files changed, 186 insertions(+), 1 deletion(-)
create mode 100644 samples/vfs/test-nfs-userns.c

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index f9348ed1bcda..ee45ff7d75ac 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -361,6 +361,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
struct nfs4_layoutget_res *lgr,
gfp_t gfp_flags)
{
+ struct user_namespace *user_ns = lh->plh_lc_cred->user_ns;
struct pnfs_layout_segment *ret;
struct nfs4_ff_layout_segment *fls = NULL;
struct xdr_stream stream;
diff --git a/samples/vfs/.gitignore b/samples/vfs/.gitignore
index 8fdabf7e5373..1d09826b31a6 100644
--- a/samples/vfs/.gitignore
+++ b/samples/vfs/.gitignore
@@ -1,3 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
test-fsmount
test-statx
+test-nfs-userns
+
diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
index 7f76875eaa70..6a2926080c08 100644
--- a/samples/vfs/Makefile
+++ b/samples/vfs/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
test-fsmount-objs := test-fsmount.o vfs-helper.o
-userprogs := test-fsmount test-statx
+test-nfs-userns-objs := test-nfs-userns.o vfs-helper.o
+userprogs := test-fsmount test-statx test-nfs-userns

always-y := $(userprogs)

diff --git a/samples/vfs/test-nfs-userns.c b/samples/vfs/test-nfs-userns.c
new file mode 100644
index 000000000000..108af924cbdd
--- /dev/null
+++ b/samples/vfs/test-nfs-userns.c
@@ -0,0 +1,181 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <linux/unistd.h>
+#include <assert.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include "vfs-helper.h"
+
+
+#define WELL_KNOWN_FD 100
+
+static inline int pidfd_open(pid_t pid, unsigned int flags)
+{
+ return syscall(__NR_pidfd_open, pid, flags);
+}
+
+static inline int pidfd_getfd(int pidfd, int fd, int flags)
+{
+ return syscall(__NR_pidfd_getfd, pidfd, fd, flags);
+}
+
+static void write_to_path(const char *path, const char *str)
+{
+ int fd, len = strlen(str);
+
+ fd = open(path, O_WRONLY);
+ if (fd < 0) {
+ fprintf(stderr, "Can't open %s: %s\n", path, strerror(errno));
+ exit(1);
+ }
+
+ if (write(fd, str, len) != len) {
+ fprintf(stderr, "Can't write string: %s\n", strerror(errno));
+ exit(1);
+ }
+
+ E(close(fd));
+}
+
+static int do_work(int sk)
+{
+ int fsfd;
+
+ E(unshare(CLONE_NEWNS|CLONE_NEWUSER));
+
+ fsfd = fsopen("nfs4", 0);
+ E(fsfd);
+
+ E(send(sk, &fsfd, sizeof(fsfd), 0));
+ // Wait for the other side to close / finish / wrap up
+ recv(sk, &fsfd, sizeof(fsfd), 0);
+ E(close(sk));
+
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ int pidfd, mntfd, fsfd, fsfdnum, status, sk_pair[2];
+ struct statx statxbuf;
+ char buf[1024];
+ pid_t pid;
+
+ if (mkdir("/mnt/share", 0777) && errno != EEXIST) {
+ perror("mkdir");
+ return 1;
+ }
+
+ E(chmod("/mnt/share", 0777));
+
+ if (mkdir("/mnt/nfs", 0755) && errno != EEXIST) {
+ perror("mkdir");
+ return 1;
+ }
+
+ if (unlink("/mnt/share/newfile") && errno != ENOENT) {
+ perror("unlink");
+ return 1;
+ }
+
+ E(creat("/mnt/share/testfile", 0644));
+ E(chown("/mnt/share/testfile", 1001, 1001));
+
+ /* exportfs is idempotent, but expects nfs-server to be running */
+ if (system("exportfs -o no_root_squash,no_subtree_check,rw 127.0.0.0/8:/mnt/share")) {
+ fprintf(stderr,
+ "Could not export /mnt/share. Is NFS the server running?\n");
+ return 1;
+ }
+
+ E(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair));
+
+ pid = fork();
+ E(pid);
+ if (pid == 0) {
+ E(close(sk_pair[0]));
+ return do_work(sk_pair[1]);
+ }
+
+ E(close(sk_pair[1]));
+
+ pidfd = pidfd_open(pid, 0);
+ E(pidfd);
+
+ E(recv(sk_pair[0], &fsfdnum, sizeof(fsfdnum), 0));
+
+ fsfd = pidfd_getfd(pidfd, fsfdnum, 0);
+ if (fsfd == -1) {
+ perror("pidfd_getfd");
+ return 1;
+ }
+
+
+ snprintf(buf, sizeof(buf) - 1, "/proc/%d/uid_map", pid);
+ write_to_path(buf, "0 1000 2");
+ snprintf(buf, sizeof(buf) - 1, "/proc/%d/setgroups", pid);
+ write_to_path(buf, "deny");
+ snprintf(buf, sizeof(buf) - 1, "/proc/%d/gid_map", pid);
+ write_to_path(buf, "0 1000 2");
+
+ /* Now we can proceed to mount */
+ E_fsconfig(fsfd, FSCONFIG_SET_STRING, "vers", "4.1", 0);
+ E_fsconfig(fsfd, FSCONFIG_SET_STRING, "clientaddr", "127.0.0.1", 0);
+ E_fsconfig(fsfd, FSCONFIG_SET_STRING, "addr", "127.0.0.1", 0);
+ E_fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "127.0.0.1:/mnt/share",
+ 0);
+ E_fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+
+ /* Move into the namespace's of the worker */
+ E(setns(pidfd, CLONE_NEWNS|CLONE_NEWUSER));
+ E(close(pidfd));
+
+ /* Close our socket pair indicating the child should exit */
+ E(close(sk_pair[0]));
+ assert(waitpid(pid, &status, 0) == pid);
+ if (!WIFEXITED(status) || WEXITSTATUS(status)) {
+ fprintf(stderr, "worker exited nonzero\n");
+ return 1;
+ }
+
+ E(setuid(0));
+ E(setgid(0));
+
+ /* Now do all the work of moving doing the mount in the child ns */
+ E(syscall(__NR_mount, NULL, "/", NULL, MS_REC|MS_PRIVATE, NULL));
+
+ mntfd = fsmount(fsfd, 0, MS_NODEV);
+ if (mntfd < 0) {
+ E(close(fsfd));
+ mount_error(fsfd, "fsmount");
+ }
+
+ E(move_mount(mntfd, "", AT_FDCWD, "/mnt/nfs", MOVE_MOUNT_F_EMPTY_PATH));
+ E(close(mntfd));
+
+ /* Create the file through NFS */
+ E(creat("/mnt/nfs/newfile", 0644));
+ /* Check what the file's status is on the disk, accessed directly */
+ E(statx(AT_FDCWD, "/mnt/share/newfile", 0, STATX_UID|STATX_GID,
+ &statxbuf));
+ assert(statxbuf.stx_uid == 0);
+ assert(statxbuf.stx_gid == 0);
+
+ E(statx(AT_FDCWD, "/mnt/nfs/testfile", 0, STATX_UID|STATX_GID,
+ &statxbuf));
+ assert(statxbuf.stx_uid == 1);
+ assert(statxbuf.stx_gid == 1);
+
+
+ return 0;
+}
--
2.25.1

2020-10-18 01:37:51

by Sargun Dhillon

[permalink] [raw]
Subject: Re: [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API

On Fri, Oct 16, 2020 at 05:45:47AM -0700, Sargun Dhillon wrote:
> This patchset adds some functionality to allow NFS to be used from
> NFS namespaces (containers).
>
> Changes since v1:
> * Added samples
>
> Sargun Dhillon (3):
> NFS: Use cred from fscontext during fsmount
> samples/vfs: Split out common code for new syscall APIs
> samples/vfs: Add example leveraging NFS with new APIs and user
> namespaces
>
> fs/nfs/client.c | 2 +-
> fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
> fs/nfs/nfs4client.c | 2 +-
> samples/vfs/.gitignore | 2 +
> samples/vfs/Makefile | 5 +-
> samples/vfs/test-fsmount.c | 86 +-----------
> samples/vfs/test-nfs-userns.c | 181 +++++++++++++++++++++++++
> samples/vfs/vfs-helper.c | 43 ++++++
> samples/vfs/vfs-helper.h | 55 ++++++++
> 9 files changed, 289 insertions(+), 88 deletions(-)
> create mode 100644 samples/vfs/test-nfs-userns.c
> create mode 100644 samples/vfs/vfs-helper.c
> create mode 100644 samples/vfs/vfs-helper.h
>
> --
> 2.25.1
>

Digging deeper into this a little bit, I actually found that there is some
problematic aspects of the current behaviour. Because nfs_get_tree_common calls
sget_fc, and sget_fc sets the super block's s_user_ns (via alloc_super) to the
fs_context's user namespace unless the global flag is set (which NFS does not
set), there are a bunch of permissions checks that are done against the super
block's user_ns.

It looks like this was introduced in:
f2aedb713c28: NFS: Add fs_context support[1]

It turns out that unmapped users in the "parent" user namespace just get an
EOVERFLOW error when trying to perform a read, even if the UID sent to the NFS
server to read a file is a valid uid (the uid in the init user ns), and
inode_permission checks permissions against the mapped UID in the namespace,
while the authentication credentials (UIDs, GIDs) sent to the server are
those from the init user ns.

[This is all under the assumption there's not upcalls doing ID mapping]

Although, I do not think this presents any security risk (because you have to
have CAP_SYS_ADMIN in the init user ns to get this far), it definitely seems
like "incorrect" behaviour.

[1]: https://lore.kernel.org/linux-nfs/[email protected]/