2017-09-13 10:27:00

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

v3:
* Documented vsock syntax in exports.man, nfs.man, and nfsd.man
* Added clientaddr autodetection in mount.nfs(8)
* Replaced #ifdefs with a single vsock.h header file
* Tested nfsd serving both IPv4 and vsock at the same time

Status:

* The last revision was somewhat controversial because it's already possible
to share files between a hypervisor and virtual machine using TCP/IP, so why
add AF_VSOCK support to the stack? TCP/IP based solutions require the
virtual machine administrator to be involved in the configuration and are
therefore not suitable for automatic management by OpenStack, oVirt, etc.
Maintainers, is this feature acceptable?

* Need advice on netid: is there agreement to use "tcpv" instead of "vsock" as
Chuck Lever suggested and how to ask IESG to assign it?

The AF_VSOCK address family allows virtual machines to communicate with the
hypervisor using a zero-configuration transport. KVM, VMware, and Hyper-V
hypervisors support AF_VSOCK and it was first introduced in Linux 3.9.

This patch series adds AF_VSOCK support to mount.nfs(8) and rpc.nfsd(8). To
mount an export from the hypervisor (CID 2):

# mount.nfs 2:/srv/vm01 /mnt -o proto=vsock

To serve exports over vsock port 2049:

# nfsd ... --vsock 2049

This series extends exports(5) syntax to handle vsock:<CID> or vsock:*. For
example, the guest with CID 3 can be given access using vsock:3.

nfsd can export over IPv4/IPv6 and vsock at the same time. See the changes to
exports.man, nfs.man, and nfsd.man in the patches for syntax details.

NFSv4 and later are supported.

The code is also available here:
https://github.com/stefanha/nfs-utils/tree/vsock-nfsd

The latest kernel patches are available here:
https://github.com/stefanha/linux/tree/vsock-nfsd

Stefan Hajnoczi (14):
mount: don't use IPPROTO_UDP for address resolution
nfs-utils: add vsock.h
nfs-utils: add AF_VSOCK support to sockaddr.h
mount: present AF_VSOCK addresses
mount: accept AF_VSOCK in nfs_verify_family()
mount: generate AF_VSOCK clientaddr
getport: recognize "vsock" netid
mount: AF_VSOCK address parsing
exportfs: introduce host_freeaddrinfo()
exportfs: add AF_VSOCK address parsing and printing
exportfs: add AF_VSOCK support to set_addrlist()
exportfs: add support for "vsock:" exports(5) syntax
nfsd: add --vsock (-v) option to nfsd
tests: add "vsock:" exports(5) test case

tests/Makefile.am | 3 +-
support/include/exportfs.h | 4 ++
support/include/sockaddr.h | 18 +++++
support/include/vsock.h | 59 +++++++++++++++++
utils/nfsd/nfssvc.h | 1 +
support/export/client.c | 8 +--
support/export/hostname.c | 161 +++++++++++++++++++++++++++++++++++++++++++--
support/nfs/getport.c | 16 +++--
utils/exportfs/exportfs.c | 42 ++++++++++--
utils/mount/network.c | 37 ++++++++++-
utils/mount/stropts.c | 61 ++++++++++++++---
utils/mountd/auth.c | 2 +-
utils/mountd/cache.c | 10 +--
utils/mountd/mountd.c | 4 +-
utils/mountd/rmtab.c | 2 +-
utils/nfsd/nfsd.c | 18 ++++-
utils/nfsd/nfssvc.c | 62 +++++++++++++++++
configure.ac | 3 +
tests/t0002-vsock-basic.sh | 53 +++++++++++++++
utils/exportfs/exports.man | 12 +++-
utils/mount/nfs.man | 20 ++++--
utils/nfsd/nfsd.man | 4 ++
22 files changed, 552 insertions(+), 48 deletions(-)
create mode 100644 support/include/vsock.h
create mode 100755 tests/t0002-vsock-basic.sh

--
2.13.5



2017-09-13 10:27:08

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 02/14] nfs-utils: add vsock.h

AF_VSOCK has been in Linux since 2013 but nfs-utils should compile
cleanly on systems that lack this feature or have incomplete header
files (see vsock.h file in this patch for details).

This patch allows code to #include "vsock.h" to use AF_VSOCK and struct
sockaddr_vm without #ifdefs.

Cc: Jorgen Hansen <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
--
Jorgen: The header file in linux.git is GPLv2-only licensed. nfs-utils
is GPLv2-or-later. The only authors of the Linux header file are
@vmware.com. Could you please post a GPLv2-or-later version of this
file? Thanks!
---
support/include/vsock.h | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
configure.ac | 3 +++
2 files changed, 62 insertions(+)
create mode 100644 support/include/vsock.h

diff --git a/support/include/vsock.h b/support/include/vsock.h
new file mode 100644
index 0000000..8d1bb79
--- /dev/null
+++ b/support/include/vsock.h
@@ -0,0 +1,59 @@
+/*
+ * AF_VSOCK constants and struct definitions
+ *
+ * Copyright (C) 2007-2013 VMware, Inc. All rights reserved.
+ * Copyright (C) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef _VSOCK_H
+#define _VSOCK_H
+
+/*
+ * This header includes the vsock system headers. Distros have been known to
+ * ship with:
+ * 1. vsock-capable kernels but no AF_VSOCK constant
+ * 2. AF_VSOCK but no <linux/vm_sockets.h>
+ *
+ * Define constants and structs ourselves, if necessary. This avoids #ifdefs
+ * in many places throughout the code. If the kernel really does not support
+ * AF_VSOCK then socket(2) returns an EAFNOSUPPORT errno.
+ */
+
+#include <sys/socket.h>
+
+#ifndef AF_VSOCK
+#define AF_VSOCK 40
+#endif
+
+#ifdef HAVE_LINUX_VM_SOCKETS_H
+#include <linux/vm_sockets.h>
+#else /* !HAVE_LINUX_VM_SOCKETS_H */
+
+#define VMADDR_CID_ANY (-1U)
+
+struct sockaddr_vm
+{
+ sa_family_t svm_family;
+ unsigned short svm_reserved1;
+ unsigned int svm_port;
+ unsigned int svm_cid;
+ unsigned char svm_zero[sizeof(struct sockaddr) -
+ sizeof(sa_family_t) -
+ sizeof(unsigned short) -
+ sizeof(unsigned int) - sizeof(unsigned int)];
+};
+
+#define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)
+
+#endif /* !HAVE_LINUX_VM_SOCKETS_H */
+
+#endif /* !_VSOCK_H */
diff --git a/configure.ac b/configure.ac
index 1ca1603..7d82d37 100644
--- a/configure.ac
+++ b/configure.ac
@@ -410,6 +410,9 @@ fi
dnl Check for IPv6 support
AC_IPV6

+dnl Check for AF_VSOCK support
+AC_CHECK_HEADERS([linux/vm_sockets.h], , , [#include <sys/socket.h>])
+
dnl *************************************************************
dnl Check for headers
dnl *************************************************************
--
2.13.5


2017-09-13 10:27:04

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 01/14] mount: don't use IPPROTO_UDP for address resolution

Although getaddrinfo(3) with IPPROTO_UDP works fine for AF_INET and
AF_INET6, the AF_VSOCK address family does not support IPPROTO_UDP and
produces an error.

Drop IPPROTO_UDP and use the 0 default (TCP) which works for all address
families. Modern NFS uses TCP anyway so it's strange to specify UDP.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
---
utils/mount/stropts.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
index 1d30d34..033f254 100644
--- a/utils/mount/stropts.c
+++ b/utils/mount/stropts.c
@@ -919,9 +919,7 @@ static int nfs_try_mount(struct nfsmount_info *mi)
int result = 0;

if (mi->address == NULL) {
- struct addrinfo hint = {
- .ai_protocol = (int)IPPROTO_UDP,
- };
+ struct addrinfo hint = {};
int error;
struct addrinfo *address;

--
2.13.5


2017-09-13 10:27:14

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 04/14] mount: present AF_VSOCK addresses

Format vsock hosts as "vsock:<cid>" so the addresses can be easily
distinguished from IPv4 and IPv6 addresses.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/mount/network.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/utils/mount/network.c b/utils/mount/network.c
index 8ab5be8..0fa9029 100644
--- a/utils/mount/network.c
+++ b/utils/mount/network.c
@@ -330,6 +330,12 @@ int nfs_string_to_sockaddr(const char *address, struct sockaddr *sap,
int nfs_present_sockaddr(const struct sockaddr *sap, const socklen_t salen,
char *buf, const size_t buflen)
{
+ if (sap->sa_family == AF_VSOCK) {
+ snprintf(buf, buflen, "vsock:%u",
+ ((struct sockaddr_vm *)sap)->svm_cid);
+ return 1;
+ }
+
#ifdef HAVE_GETNAMEINFO
int result;

--
2.13.5


2017-09-13 10:27:16

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 05/14] mount: accept AF_VSOCK in nfs_verify_family()

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/mount/network.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utils/mount/network.c b/utils/mount/network.c
index 0fa9029..7b0bc97 100644
--- a/utils/mount/network.c
+++ b/utils/mount/network.c
@@ -1430,7 +1430,7 @@ sa_family_t config_default_family = AF_INET;
static int
nfs_verify_family(sa_family_t family)
{
- if (family != AF_INET)
+ if (family != AF_INET && family != AF_VSOCK)
return 0;

return 1;
--
2.13.5


2017-09-13 10:27:13

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 03/14] nfs-utils: add AF_VSOCK support to sockaddr.h

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/include/sockaddr.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/support/include/sockaddr.h b/support/include/sockaddr.h
index 446b537..dfcc492 100644
--- a/support/include/sockaddr.h
+++ b/support/include/sockaddr.h
@@ -32,6 +32,8 @@
#include <sys/socket.h>
#include <netinet/in.h>

+#include "vsock.h"
+
/*
* This type is for defining buffers that contain network socket
* addresses.
@@ -51,6 +53,7 @@ union nfs_sockaddr {
struct sockaddr sa;
struct sockaddr_in s4;
struct sockaddr_in6 s6;
+ struct sockaddr_vm svm;
};

#if SIZEOF_SOCKLEN_T - 0 == 0
@@ -66,6 +69,8 @@ union nfs_sockaddr {
#define SIZEOF_SOCKADDR_IN6 SIZEOF_SOCKADDR_UNKNOWN
#endif /* !IPV6_SUPPORTED */

+#define SIZEOF_SOCKADDR_VM (socklen_t)sizeof(struct sockaddr_vm)
+
/**
* nfs_sockaddr_length - return the size in bytes of a socket address
* @sap: pointer to socket address
@@ -81,6 +86,8 @@ nfs_sockaddr_length(const struct sockaddr *sap)
return SIZEOF_SOCKADDR_IN;
case AF_INET6:
return SIZEOF_SOCKADDR_IN6;
+ case AF_VSOCK:
+ return SIZEOF_SOCKADDR_VM;
}
return SIZEOF_SOCKADDR_UNKNOWN;
}
@@ -218,6 +225,15 @@ compare_sockaddr6(__attribute__ ((unused)) const struct sockaddr *sa1,
}
#endif /* !IPV6_SUPPORTED */

+static inline _Bool
+compare_sockaddr_vsock(const struct sockaddr *sa1, const struct sockaddr *sa2)
+{
+ const struct sockaddr_vm *svm1 = (const struct sockaddr_vm *)sa1;
+ const struct sockaddr_vm *svm2 = (const struct sockaddr_vm *)sa2;
+
+ return svm1->svm_cid == svm2->svm_cid;
+}
+
/**
* nfs_compare_sockaddr - compare two socket addresses for equality
* @sa1: pointer to a socket address
@@ -238,6 +254,8 @@ nfs_compare_sockaddr(const struct sockaddr *sa1, const struct sockaddr *sa2)
return compare_sockaddr4(sa1, sa2);
case AF_INET6:
return compare_sockaddr6(sa1, sa2);
+ case AF_VSOCK:
+ return compare_sockaddr_vsock(sa1, sa2);
}

return false;
--
2.13.5


2017-09-13 10:27:20

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 06/14] mount: generate AF_VSOCK clientaddr

The mount(8) command should automatically determine the NFS backchannel
address details so the user does not have to specify them on the
command-line. Use the AF_VSOCK ioctl for determining the local CID.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/mount/network.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/utils/mount/network.c b/utils/mount/network.c
index 7b0bc97..1f9ad02 100644
--- a/utils/mount/network.c
+++ b/utils/mount/network.c
@@ -35,6 +35,7 @@
#include <time.h>
#include <grp.h>

+#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
@@ -1129,6 +1130,34 @@ static int nfs_ca_sockname(const struct sockaddr *sap, const socklen_t salen,
int sock, result = 0;
int val;

+ if (sap->sa_family == AF_VSOCK) {
+ struct sockaddr_vm *svm = (struct sockaddr_vm *)buf;
+ unsigned int cid;
+ int fd;
+
+ if (*buflen < sizeof(struct sockaddr_vm)) {
+ errno = EINVAL;
+ return 0;
+ }
+
+ fd = open("/dev/vsock", O_RDONLY);
+ if (fd < 0)
+ return 0;
+
+ if (ioctl(fd, IOCTL_VM_SOCKETS_GET_LOCAL_CID, &cid) < 0) {
+ close(fd);
+ return 0;
+ }
+
+ memset(svm, 0, sizeof(*svm));
+ svm->svm_family = AF_VSOCK;
+ svm->svm_cid = cid;
+
+ *buflen = sizeof(*svm);
+ close(fd);
+ return 1;
+ }
+
sock = socket(sap->sa_family, SOCK_DGRAM, IPPROTO_UDP);
if (sock < 0)
return 0;
--
2.13.5


2017-09-13 10:27:30

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 10/14] exportfs: add AF_VSOCK address parsing and printing

Add code to parse and print AF_VSOCK addresses since the getaddrinfo(3)
family of functions don't handle this address family.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/export/hostname.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 133 insertions(+)

diff --git a/support/export/hostname.c b/support/export/hostname.c
index 7f8a6f8..36e75ff 100644
--- a/support/export/hostname.c
+++ b/support/export/hostname.c
@@ -30,6 +30,105 @@
#include "sockaddr.h"
#include "exportfs.h"

+static char *
+host_ntop_vsock(const struct sockaddr *sap, char *buf, const size_t buflen)
+{
+ struct sockaddr_vm *svm = (struct sockaddr_vm *)sap;
+ snprintf(buf, buflen, "vsock:%u", svm->svm_cid);
+ return buf;
+}
+
+/* Allocate an addrinfo for AF_VSOCK. Free with host_freeaddrinfo(). */
+static struct addrinfo *
+vsock_alloc_addrinfo(struct sockaddr_vm **svm)
+{
+ struct {
+ struct addrinfo ai;
+ struct sockaddr_vm svm;
+ } *vai;
+
+ vai = calloc(1, sizeof(*vai));
+ if (!vai)
+ return NULL;
+
+ vai->ai.ai_family = AF_VSOCK;
+ vai->ai.ai_socktype = SOCK_STREAM;
+ vai->ai.ai_addrlen = sizeof(vai->svm);
+ vai->ai.ai_addr = (struct sockaddr *)&vai->svm;
+ vai->svm.svm_family = AF_VSOCK;
+
+ if (svm)
+ *svm = &vai->svm;
+
+ return &vai->ai;
+}
+
+/* hostname -> addrinfo */
+static struct addrinfo *
+vsock_hostname_addrinfo(const char *hostname)
+{
+ const char *cid_str;
+ char *end_ptr;
+ struct addrinfo *ai;
+ struct sockaddr_vm *svm;
+ long cid;
+
+ cid_str = hostname + strlen("vsock:");
+ cid = strtol(cid_str, &end_ptr, 10);
+ if (end_ptr == cid_str || *end_ptr != '\0')
+ return NULL;
+ if (cid < 0 || cid > UINT32_MAX)
+ return NULL;
+
+ ai = vsock_alloc_addrinfo(&svm);
+ if (!ai)
+ return NULL;
+
+ ai->ai_canonname = strdup(hostname);
+ if (!ai->ai_canonname) {
+ host_freeaddrinfo(ai);
+ return NULL;
+ }
+
+ svm->svm_cid = cid;
+ return ai;
+}
+
+/* sockaddr -> hostname */
+static char *
+vsock_canonname(const struct sockaddr *sap)
+{
+ const struct sockaddr_vm *svm = (const struct sockaddr_vm *)sap;
+ char *canonname;
+
+ if (asprintf(&canonname, "vsock:%u", svm->svm_cid) < 0)
+ return NULL;
+ return canonname;
+}
+
+/* sockaddr -> addrinfo */
+static struct addrinfo *
+vsock_sockaddr_addrinfo(const struct sockaddr *sap)
+{
+ const struct sockaddr_vm *svm = (const struct sockaddr_vm *)sap;
+ struct sockaddr_vm *ai_svm;
+ struct addrinfo *ai;
+
+ ai = vsock_alloc_addrinfo(&ai_svm);
+ if (!ai)
+ return NULL;
+
+ *ai_svm = *svm;
+
+ ai->ai_canonname = vsock_canonname(sap);
+ if (!ai->ai_canonname) {
+ host_freeaddrinfo(ai);
+ return NULL;
+ }
+
+ return ai;
+}
+
/**
* host_ntop - generate presentation address given a sockaddr
* @sap: pointer to socket address
@@ -52,6 +151,9 @@ host_ntop(const struct sockaddr *sap, char *buf, const size_t buflen)
return buf;
}

+ if (sap->sa_family == AF_VSOCK)
+ return host_ntop_vsock(sap, buf, buflen);
+
error = getnameinfo(sap, salen, buf, (socklen_t)buflen,
NULL, 0, NI_NUMERICHOST);
if (error != 0) {
@@ -69,6 +171,9 @@ host_ntop(const struct sockaddr *sap, char *buf, const size_t buflen)

memset(buf, 0, buflen);

+ if (sap->sa_family == AF_VSOCK)
+ return host_ntop_vsock(sap, buf, buflen);
+
if (sin->sin_family != AF_INET) {
(void)strncpy(buf, "bad family", buflen - 1);
return buf;
@@ -120,6 +225,10 @@ host_pton(const char *paddr)
__func__);
return NULL;
}
+
+ if (strncmp(paddr, "vsock:", strlen("vsock:")) == 0)
+ return vsock_hostname_addrinfo(paddr);
+
inet4 = 1;
if (inet_pton(AF_INET, paddr, &sin.sin_addr) == 0)
inet4 = 0;
@@ -174,6 +283,9 @@ host_addrinfo(const char *hostname)
};
int error;

+ if (strncmp(hostname, "vsock:", strlen("vsock:")) == 0)
+ return vsock_hostname_addrinfo(hostname);
+
error = getaddrinfo(hostname, NULL, &hint, &ai);
switch (error) {
case 0:
@@ -202,6 +314,12 @@ host_addrinfo(const char *hostname)
void
host_freeaddrinfo(struct addrinfo *ai)
{
+ if (ai && ai->ai_family == AF_VSOCK) {
+ free(ai->ai_canonname);
+ free(ai);
+ return;
+ }
+
freeaddrinfo(ai);
}

@@ -225,6 +343,9 @@ host_canonname(const struct sockaddr *sap)
char buf[NI_MAXHOST];
int error;

+ if (sap->sa_family == AF_VSOCK)
+ return vsock_canonname(sap);
+
if (salen == 0) {
xlog(D_GENERAL, "%s: unsupported address family %d",
__func__, sap->sa_family);
@@ -260,6 +381,9 @@ host_canonname(const struct sockaddr *sap)
const struct in_addr *addr = &sin->sin_addr;
struct hostent *hp;

+ if (sap->sa_family == AF_VSOCK)
+ return vsock_canonname(sap);
+
if (sap->sa_family != AF_INET)
return NULL;

@@ -291,6 +415,9 @@ host_reliable_addrinfo(const struct sockaddr *sap)
struct addrinfo *ai, *a;
char *hostname;

+ if (sap->sa_family == AF_VSOCK)
+ return vsock_sockaddr_addrinfo(sap);
+
hostname = host_canonname(sap);
if (hostname == NULL)
return NULL;
@@ -340,6 +467,9 @@ host_numeric_addrinfo(const struct sockaddr *sap)
struct addrinfo *ai;
int error;

+ if (sap->sa_family == AF_VSOCK)
+ return vsock_sockaddr_addrinfo(sap);
+
if (salen == 0) {
xlog(D_GENERAL, "%s: unsupported address family %d",
__func__, sap->sa_family);
@@ -388,6 +518,9 @@ host_numeric_addrinfo(const struct sockaddr *sap)
char buf[INET_ADDRSTRLEN];
struct addrinfo *ai;

+ if (sap->sa_family == AF_VSOCK)
+ return vsock_sockaddr_addrinfo(sap);
+
if (sap->sa_family != AF_INET)
return NULL;

--
2.13.5


2017-09-13 10:27:26

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 09/14] exportfs: introduce host_freeaddrinfo()

The AF_VSOCK address family is not supported by the getaddrinfo(3)
family of functions so we will have to arrange our own struct addrinfo
for vsock addresses.

Different libc implementations can allocate struct addrinfo and struct
sockaddr in different ways. Since the memory allocation details of
getaddrinfo(3) results are private to libc we cannot call freeaddrinfo()
on a struct addrinfo we allocated ourselves.

Introduce a freeaddrinfo(3) wrapper function that will be used to safely
free AF_VSOCK struct addrinfo in a later patch. Just introduce the
wrapper function now so this patch is easy to review without
AF_VSOCK-specific changes.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/include/exportfs.h | 1 +
support/export/client.c | 8 ++++----
support/export/hostname.c | 28 +++++++++++++++++++++-------
utils/exportfs/exportfs.c | 10 +++++-----
utils/mountd/auth.c | 2 +-
utils/mountd/cache.c | 10 +++++-----
utils/mountd/mountd.c | 4 ++--
utils/mountd/rmtab.c | 2 +-
8 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/support/include/exportfs.h b/support/include/exportfs.h
index 8af47a8..98d45c5 100644
--- a/support/include/exportfs.h
+++ b/support/include/exportfs.h
@@ -161,6 +161,7 @@ __attribute__((__malloc__))
struct addrinfo * host_reliable_addrinfo(const struct sockaddr *sap);
__attribute__((__malloc__))
struct addrinfo * host_numeric_addrinfo(const struct sockaddr *sap);
+void host_freeaddrinfo(struct addrinfo * ai);

struct nfskey * key_lookup(char *hname);

diff --git a/support/export/client.c b/support/export/client.c
index 2346f99..881c776 100644
--- a/support/export/client.c
+++ b/support/export/client.c
@@ -210,7 +210,7 @@ init_subnetwork(nfs_client *clp)
set_addrlist(clp, 0, ai->ai_addr);
family = ai->ai_addr->sa_family;

- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);

switch (family) {
case AF_INET:
@@ -309,7 +309,7 @@ client_lookup(char *hname, int canonical)
init_addrlist(clp, ai);

out:
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
return clp;
}

@@ -378,7 +378,7 @@ client_freeall(void)
* @sap: pointer to socket address to resolve
*
* Returns an addrinfo structure, or NULL if some problem occurred.
- * Caller must free the result with freeaddrinfo(3).
+ * Caller must free the result with host_freeaddrinfo().
*/
struct addrinfo *
client_resolve(const struct sockaddr *sap)
@@ -673,7 +673,7 @@ check_netgroup(const nfs_client *clp, const struct addrinfo *ai)
tmp = host_pton(hname);
if (tmp != NULL) {
char *cname = host_canonname(tmp->ai_addr);
- freeaddrinfo(tmp);
+ host_freeaddrinfo(tmp);

/* The resulting FQDN may be in our netgroup. */
if (cname != NULL) {
diff --git a/support/export/hostname.c b/support/export/hostname.c
index 5c4c824..7f8a6f8 100644
--- a/support/export/hostname.c
+++ b/support/export/hostname.c
@@ -89,7 +89,7 @@ host_ntop(const struct sockaddr *sap, char *buf, const size_t buflen)
* IP presentation address
*
* Returns address info structure, or NULL if an error occurs. Caller
- * must free the returned structure with freeaddrinfo(3).
+ * must free the returned structure with host_freeaddrinfo().
*/
__attribute__((__malloc__))
struct addrinfo *
@@ -155,7 +155,7 @@ host_pton(const char *paddr)
*
* Returns address info structure with ai_canonname filled in, or NULL
* if no information is available for @hostname. Caller must free the
- * returned structure with freeaddrinfo(3).
+ * returned structure with host_freeaddrinfo().
*/
__attribute__((__malloc__))
struct addrinfo *
@@ -192,6 +192,20 @@ host_addrinfo(const char *hostname)
}

/**
+ * host_freeaddrinfo - free addrinfo obtained from host_*() functions
+ * @ai: pointer to addrinfo to free
+ *
+ * The addrinfos returned by host_*() functions may not have been allocated by
+ * a call to getaddrinfo(3). It is not safe to free them directly with
+ * freeaddrinfo(3). Use this function instead.
+ */
+void
+host_freeaddrinfo(struct addrinfo *ai)
+{
+ freeaddrinfo(ai);
+}
+
+/**
* host_canonname - return canonical hostname bound to an address
* @sap: pointer to socket address to look up
*
@@ -268,7 +282,7 @@ host_canonname(const struct sockaddr *sap)
* ai_canonname filled in. If there is a problem with resolution or
* the resolved records don't match up properly then it returns NULL
*
- * Caller must free the returned structure with freeaddrinfo(3).
+ * Caller must free the returned structure with host_freeaddrinfo().
*/
__attribute__((__malloc__))
struct addrinfo *
@@ -290,7 +304,7 @@ host_reliable_addrinfo(const struct sockaddr *sap)
if (nfs_compare_sockaddr(a->ai_addr, sap))
break;

- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
if (!a)
goto out_free_hostname;

@@ -314,7 +328,7 @@ out_free_hostname:
* @sap: pointer to socket address
*
* Returns address info structure, or NULL if an error occurred.
- * Caller must free the returned structure with freeaddrinfo(3).
+ * Caller must free the returned structure with host_freeaddrinfo().
*/
#ifdef HAVE_GETNAMEINFO
__attribute__((__malloc__))
@@ -357,7 +371,7 @@ host_numeric_addrinfo(const struct sockaddr *sap)
free(ai->ai_canonname); /* just in case */
ai->ai_canonname = strdup(buf);
if (ai->ai_canonname == NULL) {
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
ai = NULL;
}
}
@@ -390,7 +404,7 @@ host_numeric_addrinfo(const struct sockaddr *sap)
if (ai != NULL) {
ai->ai_canonname = strdup(buf);
if (ai->ai_canonname == NULL) {
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
ai = NULL;
}
}
diff --git a/utils/exportfs/exportfs.c b/utils/exportfs/exportfs.c
index beed1b3..3ded733 100644
--- a/utils/exportfs/exportfs.c
+++ b/utils/exportfs/exportfs.c
@@ -282,7 +282,7 @@ exportfs_parsed(char *hname, char *path, char *options, int verbose)
validate_export(exp);

out:
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}

static int exportfs_generic(char *arg, char *options, int verbose)
@@ -395,7 +395,7 @@ unexportfs_parsed(char *hname, char *path, int verbose)
if (!success)
xlog(L_ERROR, "Could not find '%s:%s' to unexport.", hname, path);

- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}

static int unexportfs_generic(char *arg, int verbose)
@@ -588,7 +588,7 @@ address_list(const char *hostname)
if (ai != NULL) {
/* @hostname was a presentation address */
cname = host_canonname(ai->ai_addr);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
if (cname != NULL)
goto out;
}
@@ -639,8 +639,8 @@ matchhostname(const char *hostname1, const char *hostname2)
}

out:
- freeaddrinfo(results1);
- freeaddrinfo(results2);
+ host_freeaddrinfo(results1);
+ host_freeaddrinfo(results2);
return result;
}

diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
index 8299256..dee0f3d 100644
--- a/utils/mountd/auth.c
+++ b/utils/mountd/auth.c
@@ -297,7 +297,7 @@ auth_authenticate(const char *what, const struct sockaddr *caller,
path, epath, error);
}

- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
return exp;
}

diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
index e49300d..e062aea 100644
--- a/utils/mountd/cache.c
+++ b/utils/mountd/cache.c
@@ -113,7 +113,7 @@ static void auth_unix_ip(int f)
ai = client_resolve(tmp->ai_addr);
if (ai) {
client = client_compose(ai);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}
}
bp = buf; blen = sizeof(buf);
@@ -133,7 +133,7 @@ static void auth_unix_ip(int f)
xlog(D_CALL, "auth_unix_ip: client %p '%s'", client, client?client: "DEFAULT");

free(client);
- freeaddrinfo(tmp);
+ host_freeaddrinfo(tmp);

}

@@ -667,7 +667,7 @@ static struct addrinfo *lookup_client_addr(char *dom)
if (tmp == NULL)
return NULL;
ret = client_resolve(tmp->ai_addr);
- freeaddrinfo(tmp);
+ host_freeaddrinfo(tmp);
return ret;
}

@@ -834,7 +834,7 @@ static void nfsd_fh(int f)
out:
if (found_path)
free(found_path);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
free(dom);
xlog(D_CALL, "nfsd_fh: found %p path %s", found, found ? found->e_path : NULL);
}
@@ -1364,7 +1364,7 @@ static void nfsd_export(int f)
xlog(D_CALL, "nfsd_export: found %p path %s", found, path ? path : NULL);
if (dom) free(dom);
if (path) free(path);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}


diff --git a/utils/mountd/mountd.c b/utils/mountd/mountd.c
index 829f803..3193ded 100644
--- a/utils/mountd/mountd.c
+++ b/utils/mountd/mountd.c
@@ -578,10 +578,10 @@ static void prune_clients(nfs_export *exp, struct exportnode *e)
*cp = c->gr_next;
xfree(c->gr_name);
xfree(c);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
continue;
}
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}
cp = &(c->gr_next);
}
diff --git a/utils/mountd/rmtab.c b/utils/mountd/rmtab.c
index 3ae0dbb..99f7474 100644
--- a/utils/mountd/rmtab.c
+++ b/utils/mountd/rmtab.c
@@ -226,7 +226,7 @@ mountlist_list(void)
ai = host_pton(rep->r_client);
if (ai != NULL) {
m->ml_hostname = host_canonname(ai->ai_addr);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}
}
if (m->ml_hostname == NULL)
--
2.13.5


2017-09-13 10:27:22

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 07/14] getport: recognize "vsock" netid

Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. Translate
"vsock" manually in getport.c.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/nfs/getport.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/support/nfs/getport.c b/support/nfs/getport.c
index 081594c..0b857af 100644
--- a/support/nfs/getport.c
+++ b/support/nfs/getport.c
@@ -217,8 +217,7 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
struct protoent *proto;

/*
- * IANA does not define a protocol number for rdma netids,
- * since "rdma" is not an IP protocol.
+ * IANA does not define protocol numbers for non-IP netids.
*/
if (strcmp(netid, "rdma") == 0) {
*family = AF_INET;
@@ -230,6 +229,11 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
*protocol = NFSPROTO_RDMA;
return 1;
}
+ if (strcmp(netid, "vsock") == 0) {
+ *family = AF_VSOCK;
+ *protocol = 0;
+ return 1;
+ }

nconf = getnetconfigent(netid);
if (nconf == NULL)
@@ -258,14 +262,18 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
struct protoent *proto;

/*
- * IANA does not define a protocol number for rdma netids,
- * since "rdma" is not an IP protocol.
+ * IANA does not define protocol numbers for non-IP netids.
*/
if (strcmp(netid, "rdma") == 0) {
*family = AF_INET;
*protocol = NFSPROTO_RDMA;
return 1;
}
+ if (strcmp(netid, "vsock") == 0) {
+ *family = AF_VSOCK;
+ *protocol = 0;
+ return 1;
+ }

proto = getprotobyname(netid);
if (proto == NULL)
--
2.13.5


2017-09-13 10:27:23

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 08/14] mount: AF_VSOCK address parsing

getaddrinfo(3) does not have AF_VSOCK support. Parse the CID in the
hostname option and build a struct sockaddr_vm.

There is one tricky thing here: struct addrinfo is normally allocated by
getaddrinfo(3) and freed by freeaddrinfo(3). The memory allocation of
the struct addrinfo and struct sockaddr are an implementation detail of
libc. Therefore we must avoid freeaddrinfo(3) calls when the addrinfo
details were filled out by us for AF_VSOCK instead of by getaddrinfo(3).

It is now possible to mount a file system from the host (hypervisor)
over AF_VSOCK like this:

(guest)$ mount.nfs 2:/export /mnt -v -o proto=vsock

The hypervisor is CID 2.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/mount/stropts.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++-----
utils/mount/nfs.man | 20 ++++++++++++++----
2 files changed, 68 insertions(+), 9 deletions(-)

diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
index 033f254..be72e1e 100644
--- a/utils/mount/stropts.c
+++ b/utils/mount/stropts.c
@@ -908,6 +908,40 @@ fall_back:
return nfs_try_mount_v3v2(mi, FALSE);
}

+/* There are no guarantees on how getaddrinfo(3) allocates struct addrinfo so
+ * be sure to call free(3) on *address instead of freeaddrinfo(3).
+ */
+static int vsock_getaddrinfo(struct nfsmount_info *mi,
+ struct addrinfo **address)
+{
+ struct {
+ struct addrinfo ai;
+ struct sockaddr_vm svm;
+ } *vai;
+ long cid;
+ char *endptr;
+
+ errno = 0;
+ cid = strtol(mi->hostname, &endptr, 10);
+ if (errno != 0 || *endptr != '\0' ||
+ cid < 0 || cid > UINT_MAX)
+ return EAI_NONAME;
+
+ vai = calloc(1, sizeof(*vai));
+ if (!vai)
+ return EAI_MEMORY;
+
+ vai->ai.ai_family = AF_VSOCK;
+ vai->ai.ai_socktype = SOCK_STREAM;
+ vai->ai.ai_addrlen = sizeof(vai->svm);
+ vai->ai.ai_addr = (struct sockaddr *)&vai->svm;
+ vai->svm.svm_family = AF_VSOCK;
+ vai->svm.svm_cid = cid;
+
+ *address = &vai->ai;
+ return 0;
+}
+
/*
* This is a single pass through the fg/bg loop.
*
@@ -919,12 +953,19 @@ static int nfs_try_mount(struct nfsmount_info *mi)
int result = 0;

if (mi->address == NULL) {
- struct addrinfo hint = {};
- int error;
- struct addrinfo *address;
+ int error = 0;
+ struct addrinfo *address = NULL;
+
+ if (mi->family == AF_VSOCK)
+ error = vsock_getaddrinfo(mi, &address);
+
+ if (error == 0 && !address) {
+ struct addrinfo hint = {};
+
+ hint.ai_family = (int)mi->family;
+ error = getaddrinfo(mi->hostname, NULL, &hint, &address);
+ }

- hint.ai_family = (int)mi->family;
- error = getaddrinfo(mi->hostname, NULL, &hint, &address);
if (error != 0) {
if (error == EAI_AGAIN)
errno = EAGAIN;
@@ -1219,6 +1260,12 @@ int nfsmount_string(const char *spec, const char *node, char *type,
} else
nfs_error(_("%s: internal option parsing error"), progname);

+ /* See vsock_getaddrinfo() for why we cannot use freeaddrinfo(3) */
+ if (mi.address && mi.address->ai_family == AF_VSOCK) {
+ free(mi.address);
+ mi.address = NULL;
+ }
+
freeaddrinfo(mi.address);
free(mi.hostname);
return retval;
diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
index cc6e992..4651826 100644
--- a/utils/mount/nfs.man
+++ b/utils/mount/nfs.man
@@ -58,8 +58,10 @@ are separated by blanks or tabs.
.P
The server's hostname can be an unqualified hostname,
a fully qualified domain name,
-a dotted quad IPv4 address, or
-an IPv6 address enclosed in square brackets.
+a dotted quad IPv4 address,
+an IPv6 address enclosed in square brackets,
+or a vsock address prefixed with
+.BR vsock: .
Link-local and site-local IPv6 addresses must be accompanied by an
interface identifier.
See
@@ -769,7 +771,7 @@ The
.I netid
determines the transport that is used to communicate with the NFS
server. Supported options are
-.BR tcp ", " tcp6 ", and " rdma .
+.BR tcp ", " tcp6 ", " rdma ", and " vsock .
.B tcp6
use IPv6 addresses and is only available if support for TI-RPC is
built in. Both others use IPv4 addresses.
@@ -815,8 +817,11 @@ the behavior of this option in more detail.
.BI clientaddr= n.n.n.n
.TP 1.5i
.BI clientaddr= n:n: ... :n
+.TP 1.5i
+.BI clientaddr= n
Specifies a single IPv4 address (in dotted-quad form),
-or a non-link-local IPv6 address,
+a non-link-local IPv6 address,
+or a vsock address,
that the NFS client advertises to allow servers
to perform NFS version 4 callback requests against
files on this mount point. If the server is unable to
@@ -934,6 +939,13 @@ using a raw IPv6 link-local address.
.ta 8n +40n +5n +4n +9n
[fe80::215:c5ff:fb3e:e2b1%eth0]:/export /mnt nfs defaults 0 0
.fi
+.P
+This example shows how to mount using NFS version 4 over vsock.
+.P
+.nf
+.ta 8n +16n +6n +6n +30n
+ vsock:2:/export /mnt nfs4 defaults 0 0
+.fi
.SH "TRANSPORT METHODS"
NFS clients send requests to NFS servers via
Remote Procedure Calls, or
--
2.13.5


2017-09-13 10:27:38

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 13/14] nfsd: add --vsock (-v) option to nfsd

The following command-line serves NFSv4.1 over AF_VSOCK:

nfsd -TU -N3 -V4.1 -v 2049

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/nfsd/nfssvc.h | 1 +
utils/nfsd/nfsd.c | 18 +++++++++++++---
utils/nfsd/nfssvc.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
utils/nfsd/nfsd.man | 4 ++++
4 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/utils/nfsd/nfssvc.h b/utils/nfsd/nfssvc.h
index 39ebf37..1d251ca 100644
--- a/utils/nfsd/nfssvc.h
+++ b/utils/nfsd/nfssvc.h
@@ -26,6 +26,7 @@ int nfssvc_set_sockets(const unsigned int protobits,
const char *host, const char *port);
void nfssvc_set_time(const char *type, const int seconds);
int nfssvc_set_rdmaport(const char *port);
+int nfssvc_set_vsock(const char *port);
void nfssvc_setvers(unsigned int ctlbits, unsigned int minorvers4, unsigned int minorvers4set);
int nfssvc_threads(int nrservs);
void nfssvc_get_minormask(unsigned int *mask);
diff --git a/utils/nfsd/nfsd.c b/utils/nfsd/nfsd.c
index f973203..7f527db 100644
--- a/utils/nfsd/nfsd.c
+++ b/utils/nfsd/nfsd.c
@@ -53,6 +53,7 @@ static struct option longopts[] =
{ "rdma", 2, 0, 'R' },
{ "grace-time", 1, 0, 'G'},
{ "lease-time", 1, 0, 'L'},
+ { "vsock", 1, 0, 'v' },
{ NULL, 0, 0, 0 }
};

@@ -61,6 +62,7 @@ main(int argc, char **argv)
{
int count = NFSD_NPROC, c, i, error = 0, portnum, fd, found_one;
char *p, *progname, *port, *rdma_port = NULL;
+ char *vsock_port = NULL;
char **haddr = NULL;
int hcounter = 0;
struct conf_list *hosts;
@@ -145,7 +147,7 @@ main(int argc, char **argv)
}
}

- while ((c = getopt_long(argc, argv, "dH:hN:V:p:P:stTituUrG:L:", longopts, NULL)) != EOF) {
+ while ((c = getopt_long(argc, argv, "dH:hN:V:p:P:stTituUrG:L:v:", longopts, NULL)) != EOF) {
switch(c) {
case 'd':
xlog_config(D_ALL, 1);
@@ -180,7 +182,9 @@ main(int argc, char **argv)
else
rdma_port = "nfsrdma";
break;
-
+ case 'v': /* --vsock */
+ vsock_port = optarg;
+ break;
case 'N':
switch((c = strtol(optarg, &p, 0))) {
case 4:
@@ -309,7 +313,8 @@ main(int argc, char **argv)
}

if (NFSCTL_VERISSET(versbits, 4) &&
- !NFSCTL_TCPISSET(protobits)) {
+ !NFSCTL_TCPISSET(protobits) &&
+ !vsock_port) {
xlog(L_ERROR, "version 4 requires the TCP protocol");
exit(1);
}
@@ -353,6 +358,13 @@ main(int argc, char **argv)
if (!error)
socket_up = 1;
}
+
+ if (vsock_port) {
+ error = nfssvc_set_vsock(vsock_port);
+ if (!error)
+ socket_up = 1;
+ }
+
set_threads:
/* don't start any threads if unable to hand off any sockets */
if (!socket_up) {
diff --git a/utils/nfsd/nfssvc.c b/utils/nfsd/nfssvc.c
index e8609c1..2fbdb48 100644
--- a/utils/nfsd/nfssvc.c
+++ b/utils/nfsd/nfssvc.c
@@ -22,6 +22,7 @@
#include <stdlib.h>
#include <string.h>

+#include "vsock.h"
#include "nfslib.h"
#include "xlog.h"
#include "nfssvc.h"
@@ -304,6 +305,67 @@ nfssvc_set_rdmaport(const char *port)
return ret;
}

+int
+nfssvc_set_vsock(const char *port)
+{
+ struct sockaddr_vm svm;
+ int nport;
+ char buf[20];
+ int rc = 1;
+ int sockfd = -1;
+ int fd = -1;
+ char *ep;
+
+ nport = strtol(port, &ep, 10);
+ if (!*port || *ep) {
+ xlog(L_ERROR, "unable to interpret port name %s",
+ port);
+ goto out;
+ }
+
+ sockfd = socket(AF_VSOCK, SOCK_STREAM, 0);
+ if (sockfd < 0) {
+ xlog(L_ERROR, "unable to create AF_VSOCK socket: "
+ "errno %d (%m)", errno);
+ goto out;
+ }
+
+ svm.svm_family = AF_VSOCK;
+ svm.svm_port = nport;
+ svm.svm_cid = VMADDR_CID_ANY;
+
+ if (bind(sockfd, (struct sockaddr*)&svm, sizeof(svm))) {
+ xlog(L_ERROR, "unable to bind AF_VSOCK socket: "
+ "errno %d (%m)", errno);
+ goto out;
+ }
+
+ if (listen(sockfd, 64)) {
+ xlog(L_ERROR, "unable to create listening socket: "
+ "errno %d (%m)", errno);
+ goto out;
+ }
+
+ fd = open(NFSD_PORTS_FILE, O_WRONLY);
+ if (fd < 0) {
+ xlog(L_ERROR, "couldn't open ports file: errno "
+ "%d (%m)", errno);
+ goto out;
+ }
+ snprintf(buf, sizeof(buf), "%d\n", sockfd);
+ if (write(fd, buf, strlen(buf)) != (ssize_t)strlen(buf)) {
+ xlog(L_ERROR, "unable to request vsock services: %m");
+ goto out;
+ }
+ rc = 0;
+out:
+ if (fd != -1)
+ close(fd);
+ if (sockfd != -1)
+ close(sockfd);
+ return rc;
+}
+
void
nfssvc_set_time(const char *type, const int seconds)
{
diff --git a/utils/nfsd/nfsd.man b/utils/nfsd/nfsd.man
index d83ef86..058a252 100644
--- a/utils/nfsd/nfsd.man
+++ b/utils/nfsd/nfsd.man
@@ -52,6 +52,10 @@ Listen for RDMA requests on an alternate port - may be a number or a
name listed in
.BR /etc/services .
.TP
+.BI \-\-vsock= port
+Listen for vsock requests on a given port number.
+Requires NFS version 4.0 or later.
+.TP
.B \-N " or " \-\-no-nfs-version vers
This option can be used to request that
.B rpc.nfsd
--
2.13.5


2017-09-13 10:27:32

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 11/14] exportfs: add AF_VSOCK support to set_addrlist()

Let set_addrlist() set AF_VSOCK client addresses.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/include/exportfs.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/support/include/exportfs.h b/support/include/exportfs.h
index 98d45c5..e73f74e 100644
--- a/support/include/exportfs.h
+++ b/support/include/exportfs.h
@@ -89,6 +89,9 @@ set_addrlist(nfs_client *clp, const int i, const struct sockaddr *sap)
memcpy(&clp->m_addrlist[i].s6, sap, sizeof(struct sockaddr_in6));
break;
#endif
+ case AF_VSOCK:
+ memcpy(&clp->m_addrlist[i].svm, sap, sizeof(struct sockaddr_vm));
+ break;
}
}

--
2.13.5


2017-09-13 10:27:36

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 12/14] exportfs: add support for "vsock:" exports(5) syntax

Allow exports to be restricted to AF_VSOCK clients:

# exportfs vsock:3:/export

and:

# cat /etc/exports
/export vsock:*(rw,no_root_squash,insecure,subtree_check)

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/exportfs/exportfs.c | 32 ++++++++++++++++++++++++++++++++
utils/exportfs/exports.man | 12 ++++++++++--
2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/utils/exportfs/exportfs.c b/utils/exportfs/exportfs.c
index 3ded733..6bf67f1 100644
--- a/utils/exportfs/exportfs.c
+++ b/utils/exportfs/exportfs.c
@@ -299,6 +299,20 @@ static int exportfs_generic(char *arg, char *options, int verbose)
return 0;
}

+static int exportfs_vsock(char *arg, char *options, int verbose)
+{
+ char *path;
+
+ if ((path = strchr(arg + strlen("vsock:"), ':')) != NULL)
+ *path++ = '\0';
+
+ if (!path || *path != '/')
+ return 1;
+
+ exportfs_parsed(arg, path, options, verbose);
+ return 0;
+}
+
static int exportfs_ipv6(char *arg, char *options, int verbose)
{
char *path, *c;
@@ -332,6 +346,8 @@ exportfs(char *arg, char *options, int verbose)

if (*arg == '[')
failed = exportfs_ipv6(arg, options, verbose);
+ else if (strncmp(arg, "vsock:", strlen("vsock:")) == 0)
+ failed = exportfs_vsock(arg, options, verbose);
else
failed = exportfs_generic(arg, options, verbose);
if (failed)
@@ -412,6 +428,20 @@ static int unexportfs_generic(char *arg, int verbose)
return 0;
}

+static int unexportfs_vsock(char *arg, int verbose)
+{
+ char *path;
+
+ if ((path = strchr(arg + strlen("vsock:"), ':')) != NULL)
+ *path++ = '\0';
+
+ if (!path || *path != '/')
+ return 1;
+
+ unexportfs_parsed(arg, path, verbose);
+ return 0;
+}
+
static int unexportfs_ipv6(char *arg, int verbose)
{
char *path, *c;
@@ -445,6 +475,8 @@ unexportfs(char *arg, int verbose)

if (*arg == '[')
failed = unexportfs_ipv6(arg, verbose);
+ else if (strncmp(arg, "vsock:", strlen("vsock:")) == 0)
+ failed = unexportfs_vsock(arg, verbose);
else
failed = unexportfs_generic(arg, verbose);
if (failed)
diff --git a/utils/exportfs/exports.man b/utils/exportfs/exports.man
index d8de6be..35b5612 100644
--- a/utils/exportfs/exports.man
+++ b/utils/exportfs/exports.man
@@ -47,7 +47,9 @@ NFS clients may be specified in a number of ways:
.IP "single host
You may specify a host either by an
abbreviated name recognized be the resolver, the fully qualified domain
-name, an IPv4 address, or an IPv6 address. IPv6 addresses must not be
+name, an IPv4 address, an IPv6 address, or a vsock address prefixed with
+.BR vsock: .
+IPv6 addresses must not be
inside square brackets in /etc/exports lest they be confused with
character-class wildcard matches.
.IP "IP networks
@@ -492,6 +494,12 @@ export entry for
.B /home/joe
in the example section below, which maps all requests to uid 150 (which
is supposedly that of user joe).
+.SS Multiple Address Families
+When machines are specified using IPv4, IPv6, or vsock addresses they have
+access from the given network addresses. The wildcard \fI*\fR by itself
+matches machines of all address families.
+.BR vsock:*
+can be used to match only vsock machines.
.SS Extra Export Tables
After reading
.I /etc/exports
@@ -510,7 +518,7 @@ The format for extra export tables is the same as
.nf
.ta +3i
# sample /etc/exports file
-/ master(rw) trusty(rw,no_root_squash)
+/ master(rw) trusty(rw,no_root_squash) vsock:3(rw)
/projects proj*.local.domain(rw)
/usr *.local.domain(ro) @trusted(rw)
/home/joe pc001(rw,all_squash,anonuid=150,anongid=100)
--
2.13.5


2017-09-13 10:27:39

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v3 14/14] tests: add "vsock:" exports(5) test case

This simple test case checks that the new syntax works for AF_VSOCK.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
tests/Makefile.am | 3 ++-
tests/t0002-vsock-basic.sh | 53 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 1 deletion(-)
create mode 100755 tests/t0002-vsock-basic.sh

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 1f96264..c4e2792 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -10,5 +10,6 @@ SUBDIRS = nsm_client

MAINTAINERCLEANFILES = Makefile.in

-TESTS = t0001-statd-basic-mon-unmon.sh
+TESTS = t0001-statd-basic-mon-unmon.sh \
+ t0002-vsock-basic.sh
EXTRA_DIST = test-lib.sh $(TESTS)
diff --git a/tests/t0002-vsock-basic.sh b/tests/t0002-vsock-basic.sh
new file mode 100755
index 0000000..21a3884
--- /dev/null
+++ b/tests/t0002-vsock-basic.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+#
+# t0002-vsock-basic.sh -- test basic NFSv4 over AF_VSOCK functionality
+#
+# Copyright (C) 2017 Red Hat, Stefan Hajnoczi <[email protected]>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 0211-1301 USA
+#
+
+. ./test-lib.sh
+
+check_root
+
+test_exportfs() {
+ client_addr="$1"
+ export_spec="$client_addr:$(realpath .)"
+
+ echo "TEST: $client_addr"
+
+ "$srcdir/../utils/exportfs/exportfs" "$export_spec"
+ if [ $? -ne 0 ]; then
+ echo "FAIL: exportfs failed"
+ exit 1
+ fi
+
+ expected_etab="$(realpath .) $client_addr("
+ grep --fixed-strings -q "$expected_etab" /var/lib/nfs/etab
+ if [ $? -ne 0 ]; then
+ echo "FAIL: etab doesn't contain entry"
+ exit 1
+ fi
+
+ "$srcdir/../utils/exportfs/exportfs" -u "$export_spec"
+ if [ $? -ne 0 ]; then
+ echo "FAIL: exportfs -u failed"
+ exit 1
+ fi
+}
+
+test_exportfs "vsock:3"
+test_exportfs "vsock:*"
--
2.13.5


2017-09-13 16:21:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

Please get your VSOCK NFS transport into the ietf NFSv4 working group
first before moving forward with Linux support - we should not implement
non-standardized extensions.

On Wed, Sep 13, 2017 at 11:26:36AM +0100, Stefan Hajnoczi wrote:
> * The last revision was somewhat controversial because it's already possible
> to share files between a hypervisor and virtual machine using TCP/IP, so why
> add AF_VSOCK support to the stack? TCP/IP based solutions require the
> virtual machine administrator to be involved in the configuration and are
> therefore not suitable for automatic management by OpenStack, oVirt, etc.
> Maintainers, is this feature acceptable?
>
> * Need advice on netid: is there agreement to use "tcpv" instead of "vsock" as
> Chuck Lever suggested and how to ask IESG to assign it?
>
> The AF_VSOCK address family allows virtual machines to communicate with the
> hypervisor using a zero-configuration transport. KVM, VMware, and Hyper-V
> hypervisors support AF_VSOCK and it was first introduced in Linux 3.9.
>
> This patch series adds AF_VSOCK support to mount.nfs(8) and rpc.nfsd(8). To
> mount an export from the hypervisor (CID 2):
>
> # mount.nfs 2:/srv/vm01 /mnt -o proto=vsock
>
> To serve exports over vsock port 2049:
>
> # nfsd ... --vsock 2049
>
> This series extends exports(5) syntax to handle vsock:<CID> or vsock:*. For
> example, the guest with CID 3 can be given access using vsock:3.
>
> nfsd can export over IPv4/IPv6 and vsock at the same time. See the changes to
> exports.man, nfs.man, and nfsd.man in the patches for syntax details.
>
> NFSv4 and later are supported.
>
> The code is also available here:
> https://github.com/stefanha/nfs-utils/tree/vsock-nfsd
>
> The latest kernel patches are available here:
> https://github.com/stefanha/linux/tree/vsock-nfsd
>
> Stefan Hajnoczi (14):
> mount: don't use IPPROTO_UDP for address resolution
> nfs-utils: add vsock.h
> nfs-utils: add AF_VSOCK support to sockaddr.h
> mount: present AF_VSOCK addresses
> mount: accept AF_VSOCK in nfs_verify_family()
> mount: generate AF_VSOCK clientaddr
> getport: recognize "vsock" netid
> mount: AF_VSOCK address parsing
> exportfs: introduce host_freeaddrinfo()
> exportfs: add AF_VSOCK address parsing and printing
> exportfs: add AF_VSOCK support to set_addrlist()
> exportfs: add support for "vsock:" exports(5) syntax
> nfsd: add --vsock (-v) option to nfsd
> tests: add "vsock:" exports(5) test case
>
> tests/Makefile.am | 3 +-
> support/include/exportfs.h | 4 ++
> support/include/sockaddr.h | 18 +++++
> support/include/vsock.h | 59 +++++++++++++++++
> utils/nfsd/nfssvc.h | 1 +
> support/export/client.c | 8 +--
> support/export/hostname.c | 161 +++++++++++++++++++++++++++++++++++++++++++--
> support/nfs/getport.c | 16 +++--
> utils/exportfs/exportfs.c | 42 ++++++++++--
> utils/mount/network.c | 37 ++++++++++-
> utils/mount/stropts.c | 61 ++++++++++++++---
> utils/mountd/auth.c | 2 +-
> utils/mountd/cache.c | 10 +--
> utils/mountd/mountd.c | 4 +-
> utils/mountd/rmtab.c | 2 +-
> utils/nfsd/nfsd.c | 18 ++++-
> utils/nfsd/nfssvc.c | 62 +++++++++++++++++
> configure.ac | 3 +
> tests/t0002-vsock-basic.sh | 53 +++++++++++++++
> utils/exportfs/exports.man | 12 +++-
> utils/mount/nfs.man | 20 ++++--
> utils/nfsd/nfsd.man | 4 ++
> 22 files changed, 552 insertions(+), 48 deletions(-)
> create mode 100644 support/include/vsock.h
> create mode 100755 tests/t0002-vsock-basic.sh
>
> --
> 2.13.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---

2017-09-13 18:22:32

by Chuck Lever III

[permalink] [raw]
Subject: Re: [nfsv4] [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 13, 2017, at 11:18 AM, David Noveck <[email protected]> wrote:
>
> > and how to ask IESG to assign it?
>
> The way to get the IESG to assign it would be to write an RFC and get it approved as a Proposed Standard but I don't think you need to do that. There is a portion of the netid registry that is assigned on a first-come-first-served basis (see RFCs 5665 and 5226) and if you are OK with that, the IESG doesn't have to be involved. You simply have to ask IANA to assign it, providing the information (pretty limited) mentioned in those RFCs.

Stefan also needs to define a universal address format.

And somewhere we need to specify how this new RPC transport works.
In hand-waving mode, it's basically TCP (with the same connection
and record-marking semantics) but using a different address family.

So it may not be as simple as a single IANA action.


> On Wed, Sep 13, 2017 at 12:21 PM, Christoph Hellwig <[email protected]> wrote:
> Please get your VSOCK NFS transport into the ietf NFSv4 working group
> first before moving forward with Linux support - we should not implement
> non-standardized extensions.
>
> On Wed, Sep 13, 2017 at 11:26:36AM +0100, Stefan Hajnoczi wrote:
> > * The last revision was somewhat controversial because it's already possible
> > to share files between a hypervisor and virtual machine using TCP/IP, so why
> > add AF_VSOCK support to the stack? TCP/IP based solutions require the
> > virtual machine administrator to be involved in the configuration and are
> > therefore not suitable for automatic management by OpenStack, oVirt, etc.
> > Maintainers, is this feature acceptable?
> >
> > * Need advice on netid: is there agreement to use "tcpv" instead of "vsock" as
> > Chuck Lever suggested and how to ask IESG to assign it?
> >
> > The AF_VSOCK address family allows virtual machines to communicate with the
> > hypervisor using a zero-configuration transport. KVM, VMware, and Hyper-V
> > hypervisors support AF_VSOCK and it was first introduced in Linux 3.9.
> >
> > This patch series adds AF_VSOCK support to mount.nfs(8) and rpc.nfsd(8). To
> > mount an export from the hypervisor (CID 2):
> >
> > # mount.nfs 2:/srv/vm01 /mnt -o proto=vsock
> >
> > To serve exports over vsock port 2049:
> >
> > # nfsd ... --vsock 2049
> >
> > This series extends exports(5) syntax to handle vsock:<CID> or vsock:*. For
> > example, the guest with CID 3 can be given access using vsock:3.
> >
> > nfsd can export over IPv4/IPv6 and vsock at the same time. See the changes to
> > exports.man, nfs.man, and nfsd.man in the patches for syntax details.
> >
> > NFSv4 and later are supported.
> >
> > The code is also available here:
> > https://github.com/stefanha/nfs-utils/tree/vsock-nfsd
> >
> > The latest kernel patches are available here:
> > https://github.com/stefanha/linux/tree/vsock-nfsd
> >
> > Stefan Hajnoczi (14):
> > mount: don't use IPPROTO_UDP for address resolution
> > nfs-utils: add vsock.h
> > nfs-utils: add AF_VSOCK support to sockaddr.h
> > mount: present AF_VSOCK addresses
> > mount: accept AF_VSOCK in nfs_verify_family()
> > mount: generate AF_VSOCK clientaddr
> > getport: recognize "vsock" netid
> > mount: AF_VSOCK address parsing
> > exportfs: introduce host_freeaddrinfo()
> > exportfs: add AF_VSOCK address parsing and printing
> > exportfs: add AF_VSOCK support to set_addrlist()
> > exportfs: add support for "vsock:" exports(5) syntax
> > nfsd: add --vsock (-v) option to nfsd
> > tests: add "vsock:" exports(5) test case
> >
> > tests/Makefile.am | 3 +-
> > support/include/exportfs.h | 4 ++
> > support/include/sockaddr.h | 18 +++++
> > support/include/vsock.h | 59 +++++++++++++++++
> > utils/nfsd/nfssvc.h | 1 +
> > support/export/client.c | 8 +--
> > support/export/hostname.c | 161 +++++++++++++++++++++++++++++++++++++++++++--
> > support/nfs/getport.c | 16 +++--
> > utils/exportfs/exportfs.c | 42 ++++++++++--
> > utils/mount/network.c | 37 ++++++++++-
> > utils/mount/stropts.c | 61 ++++++++++++++---
> > utils/mountd/auth.c | 2 +-
> > utils/mountd/cache.c | 10 +--
> > utils/mountd/mountd.c | 4 +-
> > utils/mountd/rmtab.c | 2 +-
> > utils/nfsd/nfsd.c | 18 ++++-
> > utils/nfsd/nfssvc.c | 62 +++++++++++++++++
> > configure.ac | 3 +
> > tests/t0002-vsock-basic.sh | 53 +++++++++++++++
> > utils/exportfs/exports.man | 12 +++-
> > utils/mount/nfs.man | 20 ++++--
> > utils/nfsd/nfsd.man | 4 ++
> > 22 files changed, 552 insertions(+), 48 deletions(-)
> > create mode 100644 support/include/vsock.h
> > create mode 100755 tests/t0002-vsock-basic.sh
> >
> > --
> > 2.13.5
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> ---end quoted text---
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4

--
Chuck Lever




2017-09-13 22:40:06

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


Please don't send these patches to me. You know what I think of the
whole project.

NeilBrown


On Wed, Sep 13 2017, Stefan Hajnoczi wrote:

> v3:
> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> * Added clientaddr autodetection in mount.nfs(8)
> * Replaced #ifdefs with a single vsock.h header file
> * Tested nfsd serving both IPv4 and vsock at the same time
>
> Status:
>
> * The last revision was somewhat controversial because it's already possible
> to share files between a hypervisor and virtual machine using TCP/IP, so why
> add AF_VSOCK support to the stack? TCP/IP based solutions require the
> virtual machine administrator to be involved in the configuration and are
> therefore not suitable for automatic management by OpenStack, oVirt, etc.
> Maintainers, is this feature acceptable?
>
> * Need advice on netid: is there agreement to use "tcpv" instead of "vsock" as
> Chuck Lever suggested and how to ask IESG to assign it?
>
> The AF_VSOCK address family allows virtual machines to communicate with the
> hypervisor using a zero-configuration transport. KVM, VMware, and Hyper-V
> hypervisors support AF_VSOCK and it was first introduced in Linux 3.9.
>
> This patch series adds AF_VSOCK support to mount.nfs(8) and rpc.nfsd(8). To
> mount an export from the hypervisor (CID 2):
>
> # mount.nfs 2:/srv/vm01 /mnt -o proto=vsock
>
> To serve exports over vsock port 2049:
>
> # nfsd ... --vsock 2049
>
> This series extends exports(5) syntax to handle vsock:<CID> or vsock:*. For
> example, the guest with CID 3 can be given access using vsock:3.
>
> nfsd can export over IPv4/IPv6 and vsock at the same time. See the changes to
> exports.man, nfs.man, and nfsd.man in the patches for syntax details.
>
> NFSv4 and later are supported.
>
> The code is also available here:
> https://github.com/stefanha/nfs-utils/tree/vsock-nfsd
>
> The latest kernel patches are available here:
> https://github.com/stefanha/linux/tree/vsock-nfsd
>
> Stefan Hajnoczi (14):
> mount: don't use IPPROTO_UDP for address resolution
> nfs-utils: add vsock.h
> nfs-utils: add AF_VSOCK support to sockaddr.h
> mount: present AF_VSOCK addresses
> mount: accept AF_VSOCK in nfs_verify_family()
> mount: generate AF_VSOCK clientaddr
> getport: recognize "vsock" netid
> mount: AF_VSOCK address parsing
> exportfs: introduce host_freeaddrinfo()
> exportfs: add AF_VSOCK address parsing and printing
> exportfs: add AF_VSOCK support to set_addrlist()
> exportfs: add support for "vsock:" exports(5) syntax
> nfsd: add --vsock (-v) option to nfsd
> tests: add "vsock:" exports(5) test case
>
> tests/Makefile.am | 3 +-
> support/include/exportfs.h | 4 ++
> support/include/sockaddr.h | 18 +++++
> support/include/vsock.h | 59 +++++++++++++++++
> utils/nfsd/nfssvc.h | 1 +
> support/export/client.c | 8 +--
> support/export/hostname.c | 161 +++++++++++++++++++++++++++++++++++++++++++--
> support/nfs/getport.c | 16 +++--
> utils/exportfs/exportfs.c | 42 ++++++++++--
> utils/mount/network.c | 37 ++++++++++-
> utils/mount/stropts.c | 61 ++++++++++++++---
> utils/mountd/auth.c | 2 +-
> utils/mountd/cache.c | 10 +--
> utils/mountd/mountd.c | 4 +-
> utils/mountd/rmtab.c | 2 +-
> utils/nfsd/nfsd.c | 18 ++++-
> utils/nfsd/nfssvc.c | 62 +++++++++++++++++
> configure.ac | 3 +
> tests/t0002-vsock-basic.sh | 53 +++++++++++++++
> utils/exportfs/exports.man | 12 +++-
> utils/mount/nfs.man | 20 ++++--
> utils/nfsd/nfsd.man | 4 ++
> 22 files changed, 552 insertions(+), 48 deletions(-)
> create mode 100644 support/include/vsock.h
> create mode 100755 tests/t0002-vsock-basic.sh
>
> --
> 2.13.5


Attachments:
signature.asc (832.00 B)

2017-09-14 15:39:12

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

Hello

On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> v3:
> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> * Added clientaddr autodetection in mount.nfs(8)
> * Replaced #ifdefs with a single vsock.h header file
> * Tested nfsd serving both IPv4 and vsock at the same time
Just curious as to the status of the kernel patches... Are
they slated for any particular release?

steved.

>
> Status:
>
> * The last revision was somewhat controversial because it's already possible
> to share files between a hypervisor and virtual machine using TCP/IP, so why
> add AF_VSOCK support to the stack? TCP/IP based solutions require the
> virtual machine administrator to be involved in the configuration and are
> therefore not suitable for automatic management by OpenStack, oVirt, etc.
> Maintainers, is this feature acceptable?
>
> * Need advice on netid: is there agreement to use "tcpv" instead of "vsock" as
> Chuck Lever suggested and how to ask IESG to assign it?
>
> The AF_VSOCK address family allows virtual machines to communicate with the
> hypervisor using a zero-configuration transport. KVM, VMware, and Hyper-V
> hypervisors support AF_VSOCK and it was first introduced in Linux 3.9.
>
> This patch series adds AF_VSOCK support to mount.nfs(8) and rpc.nfsd(8). To
> mount an export from the hypervisor (CID 2):
>
> # mount.nfs 2:/srv/vm01 /mnt -o proto=vsock
>
> To serve exports over vsock port 2049:
>
> # nfsd ... --vsock 2049
>
> This series extends exports(5) syntax to handle vsock:<CID> or vsock:*. For
> example, the guest with CID 3 can be given access using vsock:3.
>
> nfsd can export over IPv4/IPv6 and vsock at the same time. See the changes to
> exports.man, nfs.man, and nfsd.man in the patches for syntax details.
>
> NFSv4 and later are supported.
>
> The code is also available here:
> https://github.com/stefanha/nfs-utils/tree/vsock-nfsd
>
> The latest kernel patches are available here:
> https://github.com/stefanha/linux/tree/vsock-nfsd
>
> Stefan Hajnoczi (14):
> mount: don't use IPPROTO_UDP for address resolution
> nfs-utils: add vsock.h
> nfs-utils: add AF_VSOCK support to sockaddr.h
> mount: present AF_VSOCK addresses
> mount: accept AF_VSOCK in nfs_verify_family()
> mount: generate AF_VSOCK clientaddr
> getport: recognize "vsock" netid
> mount: AF_VSOCK address parsing
> exportfs: introduce host_freeaddrinfo()
> exportfs: add AF_VSOCK address parsing and printing
> exportfs: add AF_VSOCK support to set_addrlist()
> exportfs: add support for "vsock:" exports(5) syntax
> nfsd: add --vsock (-v) option to nfsd
> tests: add "vsock:" exports(5) test case
>
> tests/Makefile.am | 3 +-
> support/include/exportfs.h | 4 ++
> support/include/sockaddr.h | 18 +++++
> support/include/vsock.h | 59 +++++++++++++++++
> utils/nfsd/nfssvc.h | 1 +
> support/export/client.c | 8 +--
> support/export/hostname.c | 161 +++++++++++++++++++++++++++++++++++++++++++--
> support/nfs/getport.c | 16 +++--
> utils/exportfs/exportfs.c | 42 ++++++++++--
> utils/mount/network.c | 37 ++++++++++-
> utils/mount/stropts.c | 61 ++++++++++++++---
> utils/mountd/auth.c | 2 +-
> utils/mountd/cache.c | 10 +--
> utils/mountd/mountd.c | 4 +-
> utils/mountd/rmtab.c | 2 +-
> utils/nfsd/nfsd.c | 18 ++++-
> utils/nfsd/nfssvc.c | 62 +++++++++++++++++
> configure.ac | 3 +
> tests/t0002-vsock-basic.sh | 53 +++++++++++++++
> utils/exportfs/exports.man | 12 +++-
> utils/mount/nfs.man | 20 ++++--
> utils/nfsd/nfsd.man | 4 ++
> 22 files changed, 552 insertions(+), 48 deletions(-)
> create mode 100644 support/include/vsock.h
> create mode 100755 tests/t0002-vsock-basic.sh
>

2017-09-14 15:56:02

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support



On 09/14/2017 11:39 AM, Steve Dickson wrote:
> Hello
>
> On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
>> v3:
>> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
>> * Added clientaddr autodetection in mount.nfs(8)
>> * Replaced #ifdefs with a single vsock.h header file
>> * Tested nfsd serving both IPv4 and vsock at the same time
> Just curious as to the status of the kernel patches... Are
> they slated for any particular release?
Maybe I should have read the thread before replying ;-)

I now see the status of the patches... not good! 8-)

steved
>
> steved.
>
>>
>> Status:
>>
>> * The last revision was somewhat controversial because it's already possible
>> to share files between a hypervisor and virtual machine using TCP/IP, so why
>> add AF_VSOCK support to the stack? TCP/IP based solutions require the
>> virtual machine administrator to be involved in the configuration and are
>> therefore not suitable for automatic management by OpenStack, oVirt, etc.
>> Maintainers, is this feature acceptable?
>>
>> * Need advice on netid: is there agreement to use "tcpv" instead of "vsock" as
>> Chuck Lever suggested and how to ask IESG to assign it?
>>
>> The AF_VSOCK address family allows virtual machines to communicate with the
>> hypervisor using a zero-configuration transport. KVM, VMware, and Hyper-V
>> hypervisors support AF_VSOCK and it was first introduced in Linux 3.9.
>>
>> This patch series adds AF_VSOCK support to mount.nfs(8) and rpc.nfsd(8). To
>> mount an export from the hypervisor (CID 2):
>>
>> # mount.nfs 2:/srv/vm01 /mnt -o proto=vsock
>>
>> To serve exports over vsock port 2049:
>>
>> # nfsd ... --vsock 2049
>>
>> This series extends exports(5) syntax to handle vsock:<CID> or vsock:*. For
>> example, the guest with CID 3 can be given access using vsock:3.
>>
>> nfsd can export over IPv4/IPv6 and vsock at the same time. See the changes to
>> exports.man, nfs.man, and nfsd.man in the patches for syntax details.
>>
>> NFSv4 and later are supported.
>>
>> The code is also available here:
>> https://github.com/stefanha/nfs-utils/tree/vsock-nfsd
>>
>> The latest kernel patches are available here:
>> https://github.com/stefanha/linux/tree/vsock-nfsd
>>
>> Stefan Hajnoczi (14):
>> mount: don't use IPPROTO_UDP for address resolution
>> nfs-utils: add vsock.h
>> nfs-utils: add AF_VSOCK support to sockaddr.h
>> mount: present AF_VSOCK addresses
>> mount: accept AF_VSOCK in nfs_verify_family()
>> mount: generate AF_VSOCK clientaddr
>> getport: recognize "vsock" netid
>> mount: AF_VSOCK address parsing
>> exportfs: introduce host_freeaddrinfo()
>> exportfs: add AF_VSOCK address parsing and printing
>> exportfs: add AF_VSOCK support to set_addrlist()
>> exportfs: add support for "vsock:" exports(5) syntax
>> nfsd: add --vsock (-v) option to nfsd
>> tests: add "vsock:" exports(5) test case
>>
>> tests/Makefile.am | 3 +-
>> support/include/exportfs.h | 4 ++
>> support/include/sockaddr.h | 18 +++++
>> support/include/vsock.h | 59 +++++++++++++++++
>> utils/nfsd/nfssvc.h | 1 +
>> support/export/client.c | 8 +--
>> support/export/hostname.c | 161 +++++++++++++++++++++++++++++++++++++++++++--
>> support/nfs/getport.c | 16 +++--
>> utils/exportfs/exportfs.c | 42 ++++++++++--
>> utils/mount/network.c | 37 ++++++++++-
>> utils/mount/stropts.c | 61 ++++++++++++++---
>> utils/mountd/auth.c | 2 +-
>> utils/mountd/cache.c | 10 +--
>> utils/mountd/mountd.c | 4 +-
>> utils/mountd/rmtab.c | 2 +-
>> utils/nfsd/nfsd.c | 18 ++++-
>> utils/nfsd/nfssvc.c | 62 +++++++++++++++++
>> configure.ac | 3 +
>> tests/t0002-vsock-basic.sh | 53 +++++++++++++++
>> utils/exportfs/exports.man | 12 +++-
>> utils/mount/nfs.man | 20 ++++--
>> utils/nfsd/nfsd.man | 4 ++
>> 22 files changed, 552 insertions(+), 48 deletions(-)
>> create mode 100644 support/include/vsock.h
>> create mode 100755 tests/t0002-vsock-basic.sh
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2017-09-14 17:37:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
>
>
> On 09/14/2017 11:39 AM, Steve Dickson wrote:
> > Hello
> >
> > On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> >> v3:
> >> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> >> * Added clientaddr autodetection in mount.nfs(8)
> >> * Replaced #ifdefs with a single vsock.h header file
> >> * Tested nfsd serving both IPv4 and vsock at the same time
> > Just curious as to the status of the kernel patches... Are
> > they slated for any particular release?
> Maybe I should have read the thread before replying ;-)
>
> I now see the status of the patches... not good! 8-)

To be specific, the code itself is probably fine, it's just that nobody
on the NFS side seems convinced that NFS/VSOCK is necessary.

--b.

2017-09-15 11:07:10

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Thu, 2017-09-14 at 13:37 -0400, J . Bruce Fields wrote:
> On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
> >
> >
> > On 09/14/2017 11:39 AM, Steve Dickson wrote:
> > > Hello
> > >
> > > On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> > > > v3:
> > > > * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> > > > * Added clientaddr autodetection in mount.nfs(8)
> > > > * Replaced #ifdefs with a single vsock.h header file
> > > > * Tested nfsd serving both IPv4 and vsock at the same time
> > >
> > > Just curious as to the status of the kernel patches... Are
> > > they slated for any particular release?
> >
> > Maybe I should have read the thread before replying ;-)
> >
> > I now see the status of the patches... not good! 8-)
>
> To be specific, the code itself is probably fine, it's just that nobody
> on the NFS side seems convinced that NFS/VSOCK is necessary.
>

...and to be even more clear, the problem you've outlined (having a zero
config network between an HV and guest) is a valid one. The issue here
is that the solution in these patches is horribly invasive and will
create an ongoing maintenance burden.

What would be much cleaner (IMNSHO) is a new type of virtual network
interface driver that has similar communication characteristics (only
allowing HV<->guest communication) and that autoconfigures itself when
plugged in (or only does so with minimal setup).

Then you could achieve the same result without having to completely
rework all of this code. That's also something potentially backportable
to earlier kernels, which is a nice bonus.
--
Jeff Layton <[email protected]>

2017-09-15 11:52:25

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [nfsv4] [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 13, 2017 at 11:21:36AM -0700, Chuck Lever wrote:
>
> > On Sep 13, 2017, at 11:18 AM, David Noveck <[email protected]> wrote:
> >
> > > and how to ask IESG to assign it?
> >
> > The way to get the IESG to assign it would be to write an RFC and get it approved as a Proposed Standard but I don't think you need to do that. There is a portion of the netid registry that is assigned on a first-come-first-served basis (see RFCs 5665 and 5226) and if you are OK with that, the IESG doesn't have to be involved. You simply have to ask IANA to assign it, providing the information (pretty limited) mentioned in those RFCs.
>
> Stefan also needs to define a universal address format.
>
> And somewhere we need to specify how this new RPC transport works.
> In hand-waving mode, it's basically TCP (with the same connection
> and record-marking semantics) but using a different address family.
>
> So it may not be as simple as a single IANA action.

Thanks Christoph, David, and Chuck. I'll reread the RFCs and try to
make progress on this.

Stefan

2017-09-15 13:12:27

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Thu, Sep 14, 2017 at 01:37:30PM -0400, J . Bruce Fields wrote:
> On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
> > On 09/14/2017 11:39 AM, Steve Dickson wrote:
> > > Hello
> > >
> > > On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> > >> v3:
> > >> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> > >> * Added clientaddr autodetection in mount.nfs(8)
> > >> * Replaced #ifdefs with a single vsock.h header file
> > >> * Tested nfsd serving both IPv4 and vsock at the same time
> > > Just curious as to the status of the kernel patches... Are
> > > they slated for any particular release?
> > Maybe I should have read the thread before replying ;-)
> >
> > I now see the status of the patches... not good! 8-)
>
> To be specific, the code itself is probably fine, it's just that nobody
> on the NFS side seems convinced that NFS/VSOCK is necessary.

Yes, the big question is whether the Linux NFS maintainers can see this
feature being merged. It allows host<->guest file sharing in a way that
management tools can automate.

I have gotten feedback multiple times that NFS over TCP/IP is not an
option for management tools like libvirt to automate. They want
something that is both zero-configuration and not prone to breakage
inside the guest. AF_VSOCK has those qualities.

Can you give a verdict on NFS over AF_VSOCK as Linux NFS maintainers?

If the verdict is yes, then I'll submit the kernel patch series and get
to work on netid registration. If no, then everyone can move on and
I'll figure out what to do next.

The latest kernel code is here:
https://github.com/stefanha/linux/tree/vsock-nfsd

Stefan

PS: I removed Neil Brown from CC because he requested not to be included
on this patch series.

2017-09-15 13:31:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 15, 2017 at 02:12:24PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 14, 2017 at 01:37:30PM -0400, J . Bruce Fields wrote:
> > On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
> > > On 09/14/2017 11:39 AM, Steve Dickson wrote:
> > > > Hello
> > > >
> > > > On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> > > >> v3:
> > > >> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> > > >> * Added clientaddr autodetection in mount.nfs(8)
> > > >> * Replaced #ifdefs with a single vsock.h header file
> > > >> * Tested nfsd serving both IPv4 and vsock at the same time
> > > > Just curious as to the status of the kernel patches... Are
> > > > they slated for any particular release?
> > > Maybe I should have read the thread before replying ;-)
> > >
> > > I now see the status of the patches... not good! 8-)
> >
> > To be specific, the code itself is probably fine, it's just that nobody
> > on the NFS side seems convinced that NFS/VSOCK is necessary.
>
> Yes, the big question is whether the Linux NFS maintainers can see this
> feature being merged. It allows host<->guest file sharing in a way that
> management tools can automate.
>
> I have gotten feedback multiple times that NFS over TCP/IP is not an
> option for management tools like libvirt to automate.

We're having trouble understanding why this is.

Maybe it would help if you could put us directly in touch with the
sources of those feedback?

--b.

> They want
> something that is both zero-configuration and not prone to breakage
> inside the guest. AF_VSOCK has those qualities.
>
> Can you give a verdict on NFS over AF_VSOCK as Linux NFS maintainers?
>
> If the verdict is yes, then I'll submit the kernel patch series and get
> to work on netid registration. If no, then everyone can move on and
> I'll figure out what to do next.
>
> The latest kernel code is here:
> https://github.com/stefanha/linux/tree/vsock-nfsd
>
> Stefan
>
> PS: I removed Neil Brown from CC because he requested not to be included
> on this patch series.

2017-09-15 13:59:55

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 15, 2017, at 6:31 AM, J . Bruce Fields <[email protected]> wrote:
>
> On Fri, Sep 15, 2017 at 02:12:24PM +0100, Stefan Hajnoczi wrote:
>> On Thu, Sep 14, 2017 at 01:37:30PM -0400, J . Bruce Fields wrote:
>>> On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
>>>> On 09/14/2017 11:39 AM, Steve Dickson wrote:
>>>>> Hello
>>>>>
>>>>> On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
>>>>>> v3:
>>>>>> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
>>>>>> * Added clientaddr autodetection in mount.nfs(8)
>>>>>> * Replaced #ifdefs with a single vsock.h header file
>>>>>> * Tested nfsd serving both IPv4 and vsock at the same time
>>>>> Just curious as to the status of the kernel patches... Are
>>>>> they slated for any particular release?
>>>> Maybe I should have read the thread before replying ;-)
>>>>
>>>> I now see the status of the patches... not good! 8-)
>>>
>>> To be specific, the code itself is probably fine, it's just that nobody
>>> on the NFS side seems convinced that NFS/VSOCK is necessary.
>>
>> Yes, the big question is whether the Linux NFS maintainers can see this
>> feature being merged. It allows host<->guest file sharing in a way that
>> management tools can automate.
>>
>> I have gotten feedback multiple times that NFS over TCP/IP is not an
>> option for management tools like libvirt to automate.
>
> We're having trouble understanding why this is.

I'm also having trouble understanding why NFS is a better solution
in this case than a virtual disk, which does not require any net-
working to be configured. What exactly is expected to be shared
between the hypervisor and each guest?

I do understand the use cases for a full-featured NFS server in
the hypervisor, but not why it needs to be zero-config.


> Maybe it would help if you could put us directly in touch with the
> sources of those feedback?
>
> --b.
>
>> They want
>> something that is both zero-configuration and not prone to breakage
>> inside the guest. AF_VSOCK has those qualities.
>>
>> Can you give a verdict on NFS over AF_VSOCK as Linux NFS maintainers?
>>
>> If the verdict is yes, then I'll submit the kernel patch series and get
>> to work on netid registration. If no, then everyone can move on and
>> I'll figure out what to do next.
>>
>> The latest kernel code is here:
>> https://github.com/stefanha/linux/tree/vsock-nfsd
>>
>> Stefan
>>
>> PS: I removed Neil Brown from CC because he requested not to be included
>> on this patch series.

--
Chuck Lever




2017-09-15 15:17:56

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 15, 2017 at 07:07:06AM -0400, Jeff Layton wrote:
> On Thu, 2017-09-14 at 13:37 -0400, J . Bruce Fields wrote:
> > On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
> > >
> > >
> > > On 09/14/2017 11:39 AM, Steve Dickson wrote:
> > > > Hello
> > > >
> > > > On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> > > > > v3:
> > > > > * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> > > > > * Added clientaddr autodetection in mount.nfs(8)
> > > > > * Replaced #ifdefs with a single vsock.h header file
> > > > > * Tested nfsd serving both IPv4 and vsock at the same time
> > > >
> > > > Just curious as to the status of the kernel patches... Are
> > > > they slated for any particular release?
> > >
> > > Maybe I should have read the thread before replying ;-)
> > >
> > > I now see the status of the patches... not good! 8-)
> >
> > To be specific, the code itself is probably fine, it's just that nobody
> > on the NFS side seems convinced that NFS/VSOCK is necessary.
> >
>
> ...and to be even more clear, the problem you've outlined (having a zero
> config network between an HV and guest) is a valid one. The issue here
> is that the solution in these patches is horribly invasive and will
> create an ongoing maintenance burden.
>
> What would be much cleaner (IMNSHO) is a new type of virtual network
> interface driver that has similar communication characteristics (only
> allowing HV<->guest communication) and that autoconfigures itself when
> plugged in (or only does so with minimal setup).
>
> Then you could achieve the same result without having to completely
> rework all of this code. That's also something potentially backportable
> to earlier kernels, which is a nice bonus.

We're talking about NFS/VSOCK here, but everything you've said would
apply to any protocol over VSOCK.

And yet, we have VSOCK. So I still feel like we must be missing
some perspective.

I wonder if part of the problem is that we're imagining that the typical
VM has a sysadmin. Isn't it more likely that you build the VM
automatically from some root image that you don't even maintain
yourself? So fixing it to not, say, block all network traffic on every
interface, isn't something you can automate--you've no idea where the
iptables configuration lives in the image.

--b.

2017-09-15 16:45:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 15, 2017 at 06:59:45AM -0700, Chuck Lever wrote:
>
> > On Sep 15, 2017, at 6:31 AM, J . Bruce Fields <[email protected]> wrote:
> >
> > On Fri, Sep 15, 2017 at 02:12:24PM +0100, Stefan Hajnoczi wrote:
> >> On Thu, Sep 14, 2017 at 01:37:30PM -0400, J . Bruce Fields wrote:
> >>> On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
> >>>> On 09/14/2017 11:39 AM, Steve Dickson wrote:
> >>>>> Hello
> >>>>>
> >>>>> On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> >>>>>> v3:
> >>>>>> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> >>>>>> * Added clientaddr autodetection in mount.nfs(8)
> >>>>>> * Replaced #ifdefs with a single vsock.h header file
> >>>>>> * Tested nfsd serving both IPv4 and vsock at the same time
> >>>>> Just curious as to the status of the kernel patches... Are
> >>>>> they slated for any particular release?
> >>>> Maybe I should have read the thread before replying ;-)
> >>>>
> >>>> I now see the status of the patches... not good! 8-)
> >>>
> >>> To be specific, the code itself is probably fine, it's just that nobody
> >>> on the NFS side seems convinced that NFS/VSOCK is necessary.
> >>
> >> Yes, the big question is whether the Linux NFS maintainers can see this
> >> feature being merged. It allows host<->guest file sharing in a way that
> >> management tools can automate.
> >>
> >> I have gotten feedback multiple times that NFS over TCP/IP is not an
> >> option for management tools like libvirt to automate.
> >
> > We're having trouble understanding why this is.
>
> I'm also having trouble understanding why NFS is a better solution
> in this case than a virtual disk, which does not require any net-
> working to be configured. What exactly is expected to be shared
> between the hypervisor and each guest?

They have said before there are uses for storage that's actually shared.
(And I assume it would be mainly shared between guests rather than
between guest and hypervisor?)

> I do understand the use cases for a full-featured NFS server in
> the hypervisor, but not why it needs to be zero-config.

"It" in that question refers to the client, not the server, right?

--b.

2017-09-15 23:29:52

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 15 2017, J . Bruce Fields wrote:

> On Fri, Sep 15, 2017 at 07:07:06AM -0400, Jeff Layton wrote:
>> On Thu, 2017-09-14 at 13:37 -0400, J . Bruce Fields wrote:
>> > On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
>> > >
>> > >
>> > > On 09/14/2017 11:39 AM, Steve Dickson wrote:
>> > > > Hello
>> > > >
>> > > > On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
>> > > > > v3:
>> > > > > * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
>> > > > > * Added clientaddr autodetection in mount.nfs(8)
>> > > > > * Replaced #ifdefs with a single vsock.h header file
>> > > > > * Tested nfsd serving both IPv4 and vsock at the same time
>> > > >
>> > > > Just curious as to the status of the kernel patches... Are
>> > > > they slated for any particular release?
>> > >
>> > > Maybe I should have read the thread before replying ;-)
>> > >
>> > > I now see the status of the patches... not good! 8-)
>> >
>> > To be specific, the code itself is probably fine, it's just that nobody
>> > on the NFS side seems convinced that NFS/VSOCK is necessary.
>> >
>>
>> ...and to be even more clear, the problem you've outlined (having a zero
>> config network between an HV and guest) is a valid one. The issue here
>> is that the solution in these patches is horribly invasive and will
>> create an ongoing maintenance burden.
>>
>> What would be much cleaner (IMNSHO) is a new type of virtual network
>> interface driver that has similar communication characteristics (only
>> allowing HV<->guest communication) and that autoconfigures itself when
>> plugged in (or only does so with minimal setup).
>>
>> Then you could achieve the same result without having to completely
>> rework all of this code. That's also something potentially backportable
>> to earlier kernels, which is a nice bonus.
>
> We're talking about NFS/VSOCK here, but everything you've said would
> apply to any protocol over VSOCK.
>
> And yet, we have VSOCK. So I still feel like we must be missing
> some perspective.

Being in the kernel doesn't prove much. devfs was in the kernel, so was
the tux http service. configfs is still in the kernel ! :-)

Possibly some perspective is missing. A charitable reading of the
situation is that the proponents of VSOCK aren't very good at
communicating their requirements and vision. A less charitable
interpretation is that they have too much invested in VSOCK to be able
to conceive an alternative.

The thing we hear about is "zero-conf" and that is obviously a good
idea, but can be done without VSOCK. What we hear less about is
"fool-proof" which is hard to define and hard to sell, yet it seems to
be an important part of their agenda.

>
> I wonder if part of the problem is that we're imagining that the typical
> VM has a sysadmin. Isn't it more likely that you build the VM
> automatically from some root image that you don't even maintain
> yourself? So fixing it to not, say, block all network traffic on every
> interface, isn't something you can automate--you've no idea where the
> iptables configuration lives in the image.

You are describing a situation where someone builds an important part of
their infrastructure from something they don't understand and cannot
maintain. Obviously that happens, but when it does I would expect there
to be someone who does understand. A vendor or support organization
probably. If the end result doesn't fit the customer's needs, then that
creates a market opportunity for someone to fill.

It does seem that the unpredictability of network filtering is part of
the goal of VSOCK, though creating a new network path that cannot be
filtered doesn't seem to me like the cleverest idea in the long term.
There seems to be more though. There was a suggestion that some people
don't want any network interface at all (but still want to use a
networked file system). This sounds like superstition, but was not
backed up with data so I cannot be sure.

It does seem justifiable to want a simple and reliable way to ensure
that traffic from the NFS client to the host is not filtered. My
feeling is that talking to network/firewall people is the best way to
achieve that. The approach that has been taken looks like an end-run
around exactly the people who are in the best position to help.

How do network namespaces work? Does each namespace get separate
iptables? Could we perform an NFS mount in an unfiltered namespace,
then make everything else run with filters in place?

NeilBrown


Attachments:
signature.asc (832.00 B)

2017-09-16 14:55:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Sat, Sep 16, 2017 at 09:29:38AM +1000, NeilBrown wrote:
> On Fri, Sep 15 2017, J . Bruce Fields wrote:
>
> > On Fri, Sep 15, 2017 at 07:07:06AM -0400, Jeff Layton wrote:
> >> On Thu, 2017-09-14 at 13:37 -0400, J . Bruce Fields wrote:
> >> > On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
> >> > >
> >> > >
> >> > > On 09/14/2017 11:39 AM, Steve Dickson wrote:
> >> > > > Hello
> >> > > >
> >> > > > On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> >> > > > > v3:
> >> > > > > * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> >> > > > > * Added clientaddr autodetection in mount.nfs(8)
> >> > > > > * Replaced #ifdefs with a single vsock.h header file
> >> > > > > * Tested nfsd serving both IPv4 and vsock at the same time
> >> > > >
> >> > > > Just curious as to the status of the kernel patches... Are
> >> > > > they slated for any particular release?
> >> > >
> >> > > Maybe I should have read the thread before replying ;-)
> >> > >
> >> > > I now see the status of the patches... not good! 8-)
> >> >
> >> > To be specific, the code itself is probably fine, it's just that nobody
> >> > on the NFS side seems convinced that NFS/VSOCK is necessary.
> >> >
> >>
> >> ...and to be even more clear, the problem you've outlined (having a zero
> >> config network between an HV and guest) is a valid one. The issue here
> >> is that the solution in these patches is horribly invasive and will
> >> create an ongoing maintenance burden.
> >>
> >> What would be much cleaner (IMNSHO) is a new type of virtual network
> >> interface driver that has similar communication characteristics (only
> >> allowing HV<->guest communication) and that autoconfigures itself when
> >> plugged in (or only does so with minimal setup).
> >>
> >> Then you could achieve the same result without having to completely
> >> rework all of this code. That's also something potentially backportable
> >> to earlier kernels, which is a nice bonus.
> >
> > We're talking about NFS/VSOCK here, but everything you've said would
> > apply to any protocol over VSOCK.
> >
> > And yet, we have VSOCK. So I still feel like we must be missing
> > some perspective.
>
> Being in the kernel doesn't prove much. devfs was in the kernel, so was
> the tux http service. configfs is still in the kernel ! :-)
>
> Possibly some perspective is missing. A charitable reading of the
> situation is that the proponents of VSOCK aren't very good at
> communicating their requirements and vision. A less charitable
> interpretation is that they have too much invested in VSOCK to be able
> to conceive an alternative.
>
> The thing we hear about is "zero-conf" and that is obviously a good
> idea, but can be done without VSOCK. What we hear less about is
> "fool-proof" which is hard to define and hard to sell, yet it seems to
> be an important part of their agenda.
>
> >
> > I wonder if part of the problem is that we're imagining that the typical
> > VM has a sysadmin. Isn't it more likely that you build the VM
> > automatically from some root image that you don't even maintain
> > yourself? So fixing it to not, say, block all network traffic on every
> > interface, isn't something you can automate--you've no idea where the
> > iptables configuration lives in the image.
>
> You are describing a situation where someone builds an important part of
> their infrastructure from something they don't understand and cannot
> maintain.

That describes applications pretty well from the point of view of most
users?

And I thought applications were increasingly get packaged with their
OS's. I don't know, I haven't been following this stuff so I'm just
speculating.

> Obviously that happens, but when it does I would expect there
> to be someone who does understand. A vendor or support organization
> probably. If the end result doesn't fit the customer's needs, then that
> creates a market opportunity for someone to fill.

So that means getting those vendos to all agree to leave NFS traffic, or
particular classes of network interfaces, or something, unfiltered?

> It does seem that the unpredictability of network filtering is part of
> the goal of VSOCK, though creating a new network path that cannot be
> filtered doesn't seem to me like the cleverest idea in the long term.
> There seems to be more though. There was a suggestion that some people
> don't want any network interface at all (but still want to use a
> networked file system). This sounds like superstition, but was not
> backed up with data so I cannot be sure.
>
> It does seem justifiable to want a simple and reliable way to ensure
> that traffic from the NFS client to the host is not filtered. My
> feeling is that talking to network/firewall people is the best way to
> achieve that. The approach that has been taken looks like an end-run
> around exactly the people who are in the best position to help.
>
> How do network namespaces work? Does each namespace get separate
> iptables? Could we perform an NFS mount in an unfiltered namespace,
> then make everything else run with filters in place?

I don't know. I'm hoping we can figure out who's closest to the source
of these requirements and sit down with some of the ideas and have them
explain to us whether they work or if not, why not.

--b.

2017-09-16 15:55:29

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 15, 2017, at 9:42 AM, J. Bruce Fields <[email protected]> wrote:
>
> On Fri, Sep 15, 2017 at 06:59:45AM -0700, Chuck Lever wrote:
>>
>>> On Sep 15, 2017, at 6:31 AM, J . Bruce Fields <[email protected]> wrote:
>>>
>>> On Fri, Sep 15, 2017 at 02:12:24PM +0100, Stefan Hajnoczi wrote:
>>>> On Thu, Sep 14, 2017 at 01:37:30PM -0400, J . Bruce Fields wrote:
>>>>> On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
>>>>>> On 09/14/2017 11:39 AM, Steve Dickson wrote:
>>>>>>> Hello
>>>>>>>
>>>>>>> On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
>>>>>>>> v3:
>>>>>>>> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
>>>>>>>> * Added clientaddr autodetection in mount.nfs(8)
>>>>>>>> * Replaced #ifdefs with a single vsock.h header file
>>>>>>>> * Tested nfsd serving both IPv4 and vsock at the same time
>>>>>>> Just curious as to the status of the kernel patches... Are
>>>>>>> they slated for any particular release?
>>>>>> Maybe I should have read the thread before replying ;-)
>>>>>>
>>>>>> I now see the status of the patches... not good! 8-)
>>>>>
>>>>> To be specific, the code itself is probably fine, it's just that nobody
>>>>> on the NFS side seems convinced that NFS/VSOCK is necessary.
>>>>
>>>> Yes, the big question is whether the Linux NFS maintainers can see this
>>>> feature being merged. It allows host<->guest file sharing in a way that
>>>> management tools can automate.
>>>>
>>>> I have gotten feedback multiple times that NFS over TCP/IP is not an
>>>> option for management tools like libvirt to automate.
>>>
>>> We're having trouble understanding why this is.
>>
>> I'm also having trouble understanding why NFS is a better solution
>> in this case than a virtual disk, which does not require any net-
>> working to be configured. What exactly is expected to be shared
>> between the hypervisor and each guest?
>
> They have said before there are uses for storage that's actually shared.
> (And I assume it would be mainly shared between guests rather than
> between guest and hypervisor?)

But this works today with IP-based networking. We certainly use
this kind of arrangement with OVM (Oracle's Xen-based hypervisor).
I agree NFS in the hypervisor is useful in interesting cases, but
I'm separating the need for a local NFS service with the need for
it to be zero-configuration.

The other use case that's been presented for NFS/VSOCK is an NFS
share that contains configuration information for each guest (in
particular, network configuration information). This is the case
I refer to above when I ask whether this can be done with a
virtual disk.

I don't see any need for concurrent access by the hypervisor and
guest, and one presumably should not share a guest's specific
configuration information with other guests. There would be no
sharing requirement, and therefore I would expect a virtual disk
filesystem would be adequate in this case and perhaps even
preferred, being more secure and less complex.


>> I do understand the use cases for a full-featured NFS server in
>> the hypervisor, but not why it needs to be zero-config.
>
> "It" in that question refers to the client, not the server, right?

The hypervisor gets a VSOCK address too, which makes it zero-
configuration for any access via VSOCK transports from its guests.
I probably don't understand your question.

Note that an NFS server could also run in one of the guests, but
I assume that the VSOCK use cases are in particular about an NFS
service in the hypervisor that can be made available very early
in the life of a guest instance. I make that guess because all
the guests have the same VSOCK address (as I understand it), so
that would make it difficult to discover and access an NFS/VSOCK
service in another guest.


--
Chuck Lever




2017-09-19 09:13:15

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Sat, Sep 16, 2017 at 08:55:21AM -0700, Chuck Lever wrote:
>
> > On Sep 15, 2017, at 9:42 AM, J. Bruce Fields <[email protected]> wrote:
> >
> > On Fri, Sep 15, 2017 at 06:59:45AM -0700, Chuck Lever wrote:
> >>
> >>> On Sep 15, 2017, at 6:31 AM, J . Bruce Fields <[email protected]> wrote:
> >>>
> >>> On Fri, Sep 15, 2017 at 02:12:24PM +0100, Stefan Hajnoczi wrote:
> >>>> On Thu, Sep 14, 2017 at 01:37:30PM -0400, J . Bruce Fields wrote:
> >>>>> On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
> >>>>>> On 09/14/2017 11:39 AM, Steve Dickson wrote:
> >>>>>>> Hello
> >>>>>>>
> >>>>>>> On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> >>>>>>>> v3:
> >>>>>>>> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> >>>>>>>> * Added clientaddr autodetection in mount.nfs(8)
> >>>>>>>> * Replaced #ifdefs with a single vsock.h header file
> >>>>>>>> * Tested nfsd serving both IPv4 and vsock at the same time
> >>>>>>> Just curious as to the status of the kernel patches... Are
> >>>>>>> they slated for any particular release?
> >>>>>> Maybe I should have read the thread before replying ;-)
> >>>>>>
> >>>>>> I now see the status of the patches... not good! 8-)
> >>>>>
> >>>>> To be specific, the code itself is probably fine, it's just that nobody
> >>>>> on the NFS side seems convinced that NFS/VSOCK is necessary.
> >>>>
> >>>> Yes, the big question is whether the Linux NFS maintainers can see this
> >>>> feature being merged. It allows host<->guest file sharing in a way that
> >>>> management tools can automate.
> >>>>
> >>>> I have gotten feedback multiple times that NFS over TCP/IP is not an
> >>>> option for management tools like libvirt to automate.
> >>>
> >>> We're having trouble understanding why this is.
> >>
> >> I'm also having trouble understanding why NFS is a better solution
> >> in this case than a virtual disk, which does not require any net-
> >> working to be configured. What exactly is expected to be shared
> >> between the hypervisor and each guest?
> >
> > They have said before there are uses for storage that's actually shared.
> > (And I assume it would be mainly shared between guests rather than
> > between guest and hypervisor?)
>
> But this works today with IP-based networking. We certainly use
> this kind of arrangement with OVM (Oracle's Xen-based hypervisor).
> I agree NFS in the hypervisor is useful in interesting cases, but
> I'm separating the need for a local NFS service with the need for
> it to be zero-configuration.
>
> The other use case that's been presented for NFS/VSOCK is an NFS
> share that contains configuration information for each guest (in
> particular, network configuration information). This is the case
> I refer to above when I ask whether this can be done with a
> virtual disk.
>
> I don't see any need for concurrent access by the hypervisor and
> guest, and one presumably should not share a guest's specific
> configuration information with other guests. There would be no
> sharing requirement, and therefore I would expect a virtual disk
> filesystem would be adequate in this case and perhaps even
> preferred, being more secure and less complex.

There are 2 main use cases:

1. Easy file sharing between host & guest

It's true that a disk image can be used but that's often inconvenient
when the data comes in individual files. Making throwaway ISO or
disk image from those files requires extra disk space, is slow, etc.

From a user perspective it's much nicer to point to a directory and
have it shared with the guest.

2. Using NFS over AF_VSOCK as an interface for a distributed file system
like Ceph or Gluster.

Hosting providers don't necessarily want to expose their distributed
file system directly to the guest. An NFS frontend presents an NFS
file system to the guest. The guest doesn't have access to the
distributed file system configuration details or network access. The
hosting provider can even switch backend file systems without
requiring guest configuration changes.

> >> I do understand the use cases for a full-featured NFS server in
> >> the hypervisor, but not why it needs to be zero-config.
> >
> > "It" in that question refers to the client, not the server, right?
>
> The hypervisor gets a VSOCK address too, which makes it zero-
> configuration for any access via VSOCK transports from its guests.
> I probably don't understand your question.
>
> Note that an NFS server could also run in one of the guests, but
> I assume that the VSOCK use cases are in particular about an NFS
> service in the hypervisor that can be made available very early
> in the life of a guest instance. I make that guess because all
> the guests have the same VSOCK address (as I understand it), so
> that would make it difficult to discover and access an NFS/VSOCK
> service in another guest.

Guest cannot communicate with each other. AF_VSOCK is host<->guest
only.

The host always uses the well-known CID (address) 2.

Guests have host-wide unique addresses (from 3 onwards).

Stefan

2017-09-19 09:31:46

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Mon, Sep 18, 2017 at 07:09:27PM +0100, Stefan Hajnoczi wrote:
> On Sat, Sep 16, 2017 at 08:55:21AM -0700, Chuck Lever wrote:
> >
> > > On Sep 15, 2017, at 9:42 AM, J. Bruce Fields <[email protected]> wrote:
> > >
> > > On Fri, Sep 15, 2017 at 06:59:45AM -0700, Chuck Lever wrote:
> > >>
> > >>> On Sep 15, 2017, at 6:31 AM, J . Bruce Fields <[email protected]> wrote:
> > >>>
> > >>> On Fri, Sep 15, 2017 at 02:12:24PM +0100, Stefan Hajnoczi wrote:
> > >>>> On Thu, Sep 14, 2017 at 01:37:30PM -0400, J . Bruce Fields wrote:
> > >>>>> On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
> > >>>>>> On 09/14/2017 11:39 AM, Steve Dickson wrote:
> > >>>>>>> Hello
> > >>>>>>>
> > >>>>>>> On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
> > >>>>>>>> v3:
> > >>>>>>>> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
> > >>>>>>>> * Added clientaddr autodetection in mount.nfs(8)
> > >>>>>>>> * Replaced #ifdefs with a single vsock.h header file
> > >>>>>>>> * Tested nfsd serving both IPv4 and vsock at the same time
> > >>>>>>> Just curious as to the status of the kernel patches... Are
> > >>>>>>> they slated for any particular release?
> > >>>>>> Maybe I should have read the thread before replying ;-)
> > >>>>>>
> > >>>>>> I now see the status of the patches... not good! 8-)
> > >>>>>
> > >>>>> To be specific, the code itself is probably fine, it's just that nobody
> > >>>>> on the NFS side seems convinced that NFS/VSOCK is necessary.
> > >>>>
> > >>>> Yes, the big question is whether the Linux NFS maintainers can see this
> > >>>> feature being merged. It allows host<->guest file sharing in a way that
> > >>>> management tools can automate.
> > >>>>
> > >>>> I have gotten feedback multiple times that NFS over TCP/IP is not an
> > >>>> option for management tools like libvirt to automate.
> > >>>
> > >>> We're having trouble understanding why this is.
> > >>
> > >> I'm also having trouble understanding why NFS is a better solution
> > >> in this case than a virtual disk, which does not require any net-
> > >> working to be configured. What exactly is expected to be shared
> > >> between the hypervisor and each guest?
> > >
> > > They have said before there are uses for storage that's actually shared.
> > > (And I assume it would be mainly shared between guests rather than
> > > between guest and hypervisor?)
> >
> > But this works today with IP-based networking. We certainly use
> > this kind of arrangement with OVM (Oracle's Xen-based hypervisor).
> > I agree NFS in the hypervisor is useful in interesting cases, but
> > I'm separating the need for a local NFS service with the need for
> > it to be zero-configuration.
> >
> > The other use case that's been presented for NFS/VSOCK is an NFS
> > share that contains configuration information for each guest (in
> > particular, network configuration information). This is the case
> > I refer to above when I ask whether this can be done with a
> > virtual disk.
> >
> > I don't see any need for concurrent access by the hypervisor and
> > guest, and one presumably should not share a guest's specific
> > configuration information with other guests. There would be no
> > sharing requirement, and therefore I would expect a virtual disk
> > filesystem would be adequate in this case and perhaps even
> > preferred, being more secure and less complex.
>
> There are 2 main use cases:
>
> 1. Easy file sharing between host & guest
>
> It's true that a disk image can be used but that's often inconvenient
> when the data comes in individual files. Making throwaway ISO or
> disk image from those files requires extra disk space, is slow, etc.

More critically, it cannot be easily live-updated for a running guest.
Not all of the setup data that the hypervisor wants to share with the
guest is boot-time only - some may be access repeatedly post boot &
have a need to update it dynamically. Currently OpenStack can only
satisfy this if using its network based metadata REST service, but
many cloud operators refuse to deploy this because they are not happy
with the guest and host sharing a LAN, leaving only the virtual disk
option which can not support dynamic update.

If the admin takes any live snapshots of the guest, then this throwaway
disk image has to be kept around for the lifetime of the snapshot too.
We cannot just throw it away & re-generate it later when restoring the
snapshot, because we canot guarantee the newly generated image would be
byte-for-byte identical to the original one we generated due to possible
changes in mkfs related tools.

> From a user perspective it's much nicer to point to a directory and
> have it shared with the guest.
>
> 2. Using NFS over AF_VSOCK as an interface for a distributed file system
> like Ceph or Gluster.
>
> Hosting providers don't necessarily want to expose their distributed
> file system directly to the guest. An NFS frontend presents an NFS
> file system to the guest. The guest doesn't have access to the
> distributed file system configuration details or network access. The
> hosting provider can even switch backend file systems without
> requiring guest configuration changes.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-19 14:36:00

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 19, 2017, at 5:31 AM, Daniel P. Berrange <[email protected]> wrote:
>
> On Mon, Sep 18, 2017 at 07:09:27PM +0100, Stefan Hajnoczi wrote:
>> On Sat, Sep 16, 2017 at 08:55:21AM -0700, Chuck Lever wrote:
>>>
>>>> On Sep 15, 2017, at 9:42 AM, J. Bruce Fields <[email protected]> wrote:
>>>>
>>>> On Fri, Sep 15, 2017 at 06:59:45AM -0700, Chuck Lever wrote:
>>>>>
>>>>>> On Sep 15, 2017, at 6:31 AM, J . Bruce Fields <[email protected]> wrote:
>>>>>>
>>>>>> On Fri, Sep 15, 2017 at 02:12:24PM +0100, Stefan Hajnoczi wrote:
>>>>>>> On Thu, Sep 14, 2017 at 01:37:30PM -0400, J . Bruce Fields wrote:
>>>>>>>> On Thu, Sep 14, 2017 at 11:55:51AM -0400, Steve Dickson wrote:
>>>>>>>>> On 09/14/2017 11:39 AM, Steve Dickson wrote:
>>>>>>>>>> Hello
>>>>>>>>>>
>>>>>>>>>> On 09/13/2017 06:26 AM, Stefan Hajnoczi wrote:
>>>>>>>>>>> v3:
>>>>>>>>>>> * Documented vsock syntax in exports.man, nfs.man, and nfsd.man
>>>>>>>>>>> * Added clientaddr autodetection in mount.nfs(8)
>>>>>>>>>>> * Replaced #ifdefs with a single vsock.h header file
>>>>>>>>>>> * Tested nfsd serving both IPv4 and vsock at the same time
>>>>>>>>>> Just curious as to the status of the kernel patches... Are
>>>>>>>>>> they slated for any particular release?
>>>>>>>>> Maybe I should have read the thread before replying ;-)
>>>>>>>>>
>>>>>>>>> I now see the status of the patches... not good! 8-)
>>>>>>>>
>>>>>>>> To be specific, the code itself is probably fine, it's just that nobody
>>>>>>>> on the NFS side seems convinced that NFS/VSOCK is necessary.
>>>>>>>
>>>>>>> Yes, the big question is whether the Linux NFS maintainers can see this
>>>>>>> feature being merged. It allows host<->guest file sharing in a way that
>>>>>>> management tools can automate.
>>>>>>>
>>>>>>> I have gotten feedback multiple times that NFS over TCP/IP is not an
>>>>>>> option for management tools like libvirt to automate.
>>>>>>
>>>>>> We're having trouble understanding why this is.
>>>>>
>>>>> I'm also having trouble understanding why NFS is a better solution
>>>>> in this case than a virtual disk, which does not require any net-
>>>>> working to be configured. What exactly is expected to be shared
>>>>> between the hypervisor and each guest?
>>>>
>>>> They have said before there are uses for storage that's actually shared.
>>>> (And I assume it would be mainly shared between guests rather than
>>>> between guest and hypervisor?)
>>>
>>> But this works today with IP-based networking. We certainly use
>>> this kind of arrangement with OVM (Oracle's Xen-based hypervisor).
>>> I agree NFS in the hypervisor is useful in interesting cases, but
>>> I'm separating the need for a local NFS service with the need for
>>> it to be zero-configuration.
>>>
>>> The other use case that's been presented for NFS/VSOCK is an NFS
>>> share that contains configuration information for each guest (in
>>> particular, network configuration information). This is the case
>>> I refer to above when I ask whether this can be done with a
>>> virtual disk.
>>>
>>> I don't see any need for concurrent access by the hypervisor and
>>> guest, and one presumably should not share a guest's specific
>>> configuration information with other guests. There would be no
>>> sharing requirement, and therefore I would expect a virtual disk
>>> filesystem would be adequate in this case and perhaps even
>>> preferred, being more secure and less complex.
>>
>> There are 2 main use cases:
>>
>> 1. Easy file sharing between host & guest
>>
>> It's true that a disk image can be used but that's often inconvenient
>> when the data comes in individual files. Making throwaway ISO or
>> disk image from those files requires extra disk space, is slow, etc.
>
> More critically, it cannot be easily live-updated for a running guest.
> Not all of the setup data that the hypervisor wants to share with the
> guest is boot-time only - some may be access repeatedly post boot &
> have a need to update it dynamically. Currently OpenStack can only
> satisfy this if using its network based metadata REST service, but
> many cloud operators refuse to deploy this because they are not happy
> with the guest and host sharing a LAN, leaving only the virtual disk
> option which can not support dynamic update.

Hi Daniel-

OK, but why can't the REST service run on VSOCK, for instance?

How is VSOCK different than guests and hypervisor sharing a LAN?
Would it be OK if the hypervisor and each guest shared a virtual
point-to-point IP network?

Can you elaborate on "they are not happy with the guests and host
sharing a LAN" ?


> If the admin takes any live snapshots of the guest, then this throwaway
> disk image has to be kept around for the lifetime of the snapshot too.
> We cannot just throw it away & re-generate it later when restoring the
> snapshot, because we canot guarantee the newly generated image would be
> byte-for-byte identical to the original one we generated due to possible
> changes in mkfs related tools.

Seems like you could create a loopback mount of a small file to
store configuration data. That would consume very little local
storage. I've done this already in the fedfs-utils-server package,
which creates small loopback mounted filesystems to contain FedFS
domain root directories, for example.

Sharing the disk serially is a little awkward, but not difficult.
You could use an automounter in the guest to grab that filesystem
when needed, then release it after a period of not being used.


>> From a user perspective it's much nicer to point to a directory and
>> have it shared with the guest.
>>
>> 2. Using NFS over AF_VSOCK as an interface for a distributed file system
>> like Ceph or Gluster.
>>
>> Hosting providers don't necessarily want to expose their distributed
>> file system directly to the guest. An NFS frontend presents an NFS
>> file system to the guest. The guest doesn't have access to the
>> distributed file system configuration details or network access. The
>> hosting provider can even switch backend file systems without
>> requiring guest configuration changes.

Notably, NFS can already support hypervisor file sharing and
gateway-ing to Ceph and Gluster. We agree that those are useful.
However VSOCK is not a pre-requisite for either of those use
cases.


--
Chuck Lever




2017-09-19 15:11:02

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 19, 2017 at 10:35:49AM -0400, Chuck Lever wrote:
>
> > On Sep 19, 2017, at 5:31 AM, Daniel P. Berrange <[email protected]> wrote:
> >
> > On Mon, Sep 18, 2017 at 07:09:27PM +0100, Stefan Hajnoczi wrote:
> >> There are 2 main use cases:
> >>
> >> 1. Easy file sharing between host & guest
> >>
> >> It's true that a disk image can be used but that's often inconvenient
> >> when the data comes in individual files. Making throwaway ISO or
> >> disk image from those files requires extra disk space, is slow, etc.
> >
> > More critically, it cannot be easily live-updated for a running guest.
> > Not all of the setup data that the hypervisor wants to share with the
> > guest is boot-time only - some may be access repeatedly post boot &
> > have a need to update it dynamically. Currently OpenStack can only
> > satisfy this if using its network based metadata REST service, but
> > many cloud operators refuse to deploy this because they are not happy
> > with the guest and host sharing a LAN, leaving only the virtual disk
> > option which can not support dynamic update.
>
> Hi Daniel-
>
> OK, but why can't the REST service run on VSOCK, for instance?

That is a possibility, though cloud-init/OpenStack maintainers are
reluctant to add support for new features for the metadata REST
service, because the spec being followed is defined by Amazon (as
part of EC2), not by OpenStack. So adding new features would be
effectively forking the spec by adding stuff Amazon doesn't (yet)
support - this is why its IPv4 only, with no IPv6 support too,
as Amazon has not defined a standardized IPv6 address for the
metadata service at this time.

> How is VSOCK different than guests and hypervisor sharing a LAN?

VSOCK requires no guest configuration, it won't be broken accidentally
by NetworkManager (or equivalent), it won't be mistakenly blocked by
guest admin/OS adding "deny all" default firewall policy. Similar
applies on the host side, and since there's separation from IP networking,
there is no possibility of the guest ever getting a channel out to the
LAN, even if the host is mis-configurated.

> Would it be OK if the hypervisor and each guest shared a virtual
> point-to-point IP network?

No - per above / below text

> Can you elaborate on "they are not happy with the guests and host
> sharing a LAN" ?

The security of the host management LAN is so critical to the cloud,
that they're not willing to allow any guest network interface to have
an IP visible to/from the host, even if it were locked down with
firewall rules. It is just one administrative mis-configuration
away from disaster.

> > If the admin takes any live snapshots of the guest, then this throwaway
> > disk image has to be kept around for the lifetime of the snapshot too.
> > We cannot just throw it away & re-generate it later when restoring the
> > snapshot, because we canot guarantee the newly generated image would be
> > byte-for-byte identical to the original one we generated due to possible
> > changes in mkfs related tools.
>
> Seems like you could create a loopback mount of a small file to
> store configuration data. That would consume very little local
> storage. I've done this already in the fedfs-utils-server package,
> which creates small loopback mounted filesystems to contain FedFS
> domain root directories, for example.
>
> Sharing the disk serially is a little awkward, but not difficult.
> You could use an automounter in the guest to grab that filesystem
> when needed, then release it after a period of not being used.


With QEMU's previous 9p-over-virtio filesystem support people have
built tools which run virtual machines where the root FS is directly
running against a 9p share from the host filesystem. It isn't possible
to share the host filesystem's /dev/sda (or whatever) to the guest
because its a holding a non-cluster filesystem so can't be mounted
twice. Likewise you don't want to copy the host filesystems entire
contents into a block device and mount that, as its simply impratical

With 9p-over-virtio, or NFS-over-VSOCK, we can execute commands
present in the host's filesystem, sandboxed inside a QEMU guest
by simply sharing the host's '/' FS to the guest and have the
guest mount that as its own / (typically it would be read-only,
and then a further FS share would be added for writeable areas).
For this to be reliable we can't use host IP networking because
there's too many ways for that to fail, and if spawning the sandbox
as non-root we can't influence the host networking setup at all.
Currently it uses 9p-over-virtio for this reason, which works
great, except that distros hate the idea of supporting a 9p
filesystem driver in the kernel - a NFS driver capable of
running over virtio is a much smaller incremental support
burden.

> >> From a user perspective it's much nicer to point to a directory and
> >> have it shared with the guest.
> >>
> >> 2. Using NFS over AF_VSOCK as an interface for a distributed file system
> >> like Ceph or Gluster.
> >>
> >> Hosting providers don't necessarily want to expose their distributed
> >> file system directly to the guest. An NFS frontend presents an NFS
> >> file system to the guest. The guest doesn't have access to the
> >> distributed file system configuration details or network access. The
> >> hosting provider can even switch backend file systems without
> >> requiring guest configuration changes.
>
> Notably, NFS can already support hypervisor file sharing and
> gateway-ing to Ceph and Gluster. We agree that those are useful.
> However VSOCK is not a pre-requisite for either of those use
> cases.

This again requires that the NFS server which runs on the management LAN
be visible to the guest network. So this hits the same problem above with
cloud providers wanting those networks completely separate.

The desire from OpenStack is to have an NFS server on the compute host,
which exposes the Ceph filesystem to the guest over VSOCK

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-19 15:48:20

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
>
> On Tue, Sep 19, 2017 at 10:35:49AM -0400, Chuck Lever wrote:
>>
>>> On Sep 19, 2017, at 5:31 AM, Daniel P. Berrange <[email protected]> wrote:
>>>
>>> On Mon, Sep 18, 2017 at 07:09:27PM +0100, Stefan Hajnoczi wrote:
>>>> There are 2 main use cases:
>>>>
>>>> 1. Easy file sharing between host & guest
>>>>
>>>> It's true that a disk image can be used but that's often inconvenient
>>>> when the data comes in individual files. Making throwaway ISO or
>>>> disk image from those files requires extra disk space, is slow, etc.
>>>
>>> More critically, it cannot be easily live-updated for a running guest.
>>> Not all of the setup data that the hypervisor wants to share with the
>>> guest is boot-time only - some may be access repeatedly post boot &
>>> have a need to update it dynamically. Currently OpenStack can only
>>> satisfy this if using its network based metadata REST service, but
>>> many cloud operators refuse to deploy this because they are not happy
>>> with the guest and host sharing a LAN, leaving only the virtual disk
>>> option which can not support dynamic update.
>>
>> Hi Daniel-
>>
>> OK, but why can't the REST service run on VSOCK, for instance?
>
> That is a possibility, though cloud-init/OpenStack maintainers are
> reluctant to add support for new features for the metadata REST
> service, because the spec being followed is defined by Amazon (as
> part of EC2), not by OpenStack. So adding new features would be
> effectively forking the spec by adding stuff Amazon doesn't (yet)
> support - this is why its IPv4 only, with no IPv6 support too,
> as Amazon has not defined a standardized IPv6 address for the
> metadata service at this time.

You guys are asking the NFS community for a similar kind of
specification change here. We would prefer that you seek that change
with the relevant authority (the IETF) first before trying to merge
an implementation of it.

As a first step we have to define RPC operation on VSOCK transports.
That's the smallest piece of it. Dealing with some of the NFS issues
(like, what happens to filehandles and lock state during a live
guest migration) is an even larger challenge.

Sorry, but you can't avoid one interoperability problem (Amazon)
by introducing another (NFS).


>> How is VSOCK different than guests and hypervisor sharing a LAN?
>
> VSOCK requires no guest configuration, it won't be broken accidentally
> by NetworkManager (or equivalent), it won't be mistakenly blocked by
> guest admin/OS adding "deny all" default firewall policy. Similar
> applies on the host side, and since there's separation from IP networking,
> there is no possibility of the guest ever getting a channel out to the
> LAN, even if the host is mis-configurated.

We don't seem to have configuration fragility problems with other
deployments that scale horizontally.

IMO you should focus on making IP reliable rather than trying to
move familiar IP-based services to other network fabrics.

Or, look at existing transports:

- We have an RPC-over-named-pipes transport that could be used

- NFS/RDMA with a virtual RoCE adapter would also work, and
could perform better than something based on TCP streams


>> Would it be OK if the hypervisor and each guest shared a virtual
>> point-to-point IP network?
>
> No - per above / below text
>
>> Can you elaborate on "they are not happy with the guests and host
>> sharing a LAN" ?
>
> The security of the host management LAN is so critical to the cloud,
> that they're not willing to allow any guest network interface to have
> an IP visible to/from the host, even if it were locked down with
> firewall rules. It is just one administrative mis-configuration
> away from disaster.

For the general case of sharing NFS files, I'm not suggesting that
the guest and hypervisor share a management LAN for NFS. Rather,
the suggestion is to have a separate point-to-point storage area
network that has narrow trust relationships.


>>> If the admin takes any live snapshots of the guest, then this throwaway
>>> disk image has to be kept around for the lifetime of the snapshot too.
>>> We cannot just throw it away & re-generate it later when restoring the
>>> snapshot, because we canot guarantee the newly generated image would be
>>> byte-for-byte identical to the original one we generated due to possible
>>> changes in mkfs related tools.
>>
>> Seems like you could create a loopback mount of a small file to
>> store configuration data. That would consume very little local
>> storage. I've done this already in the fedfs-utils-server package,
>> which creates small loopback mounted filesystems to contain FedFS
>> domain root directories, for example.
>>
>> Sharing the disk serially is a little awkward, but not difficult.
>> You could use an automounter in the guest to grab that filesystem
>> when needed, then release it after a period of not being used.
>
>
> With QEMU's previous 9p-over-virtio filesystem support people have
> built tools which run virtual machines where the root FS is directly
> running against a 9p share from the host filesystem. It isn't possible
> to share the host filesystem's /dev/sda (or whatever) to the guest
> because its a holding a non-cluster filesystem so can't be mounted
> twice. Likewise you don't want to copy the host filesystems entire
> contents into a block device and mount that, as its simply impratical
>
> With 9p-over-virtio, or NFS-over-VSOCK, we can execute commands
> present in the host's filesystem, sandboxed inside a QEMU guest
> by simply sharing the host's '/' FS to the guest and have the
> guest mount that as its own / (typically it would be read-only,
> and then a further FS share would be added for writeable areas).
> For this to be reliable we can't use host IP networking because
> there's too many ways for that to fail, and if spawning the sandbox
> as non-root we can't influence the host networking setup at all.
> Currently it uses 9p-over-virtio for this reason, which works
> great, except that distros hate the idea of supporting a 9p
> filesystem driver in the kernel - a NFS driver capable of
> running over virtio is a much smaller incremental support
> burden.

OK, so this is not the use case that has been described before.
This is talking about more than just configuration information.

Let's call this use case "NFSROOT without IP networking".

Still, the practice I was aware of is that OpenStack would
provide a copy of a golden root filesystem to the hypervisor,
who would then hydrate that on its local storage, and provide
it to a new guest as a virtual disk.

In terms of storage management, I still don't see much benefit
in using local NFS storage for this use case (except for the
real-time sharing of configuration changes).


>>>> From a user perspective it's much nicer to point to a directory and
>>>> have it shared with the guest.
>>>>
>>>> 2. Using NFS over AF_VSOCK as an interface for a distributed file system
>>>> like Ceph or Gluster.
>>>>
>>>> Hosting providers don't necessarily want to expose their distributed
>>>> file system directly to the guest. An NFS frontend presents an NFS
>>>> file system to the guest. The guest doesn't have access to the
>>>> distributed file system configuration details or network access. The
>>>> hosting provider can even switch backend file systems without
>>>> requiring guest configuration changes.
>>
>> Notably, NFS can already support hypervisor file sharing and
>> gateway-ing to Ceph and Gluster. We agree that those are useful.
>> However VSOCK is not a pre-requisite for either of those use
>> cases.
>
> This again requires that the NFS server which runs on the management LAN
> be visible to the guest network. So this hits the same problem above with
> cloud providers wanting those networks completely separate.

No, it doesn't require using the management LAN at all.


> The desire from OpenStack is to have an NFS server on the compute host,
> which exposes the Ceph filesystem to the guest over VSOCK

The use of VSOCK is entirely separate from the existence of an
NFS service on the hypervisor, since NFS can already gateway
Ceph. Let's just say that the requirement is access to an NFS
service without IP networking.

So, thanks for responding, Dan, this has cleared up some
troublesome details. But I think more discussion is needed.



--
Chuck Lever




2017-09-19 16:44:35

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
>
> > On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
> >
> > On Tue, Sep 19, 2017 at 10:35:49AM -0400, Chuck Lever wrote:
> >>
> >>> On Sep 19, 2017, at 5:31 AM, Daniel P. Berrange <[email protected]> wrote:
> >>>
> >>> On Mon, Sep 18, 2017 at 07:09:27PM +0100, Stefan Hajnoczi wrote:
> >>>> There are 2 main use cases:
> >>>>
> >>>> 1. Easy file sharing between host & guest
> >>>>
> >>>> It's true that a disk image can be used but that's often inconvenient
> >>>> when the data comes in individual files. Making throwaway ISO or
> >>>> disk image from those files requires extra disk space, is slow, etc.
> >>>
> >>> More critically, it cannot be easily live-updated for a running guest.
> >>> Not all of the setup data that the hypervisor wants to share with the
> >>> guest is boot-time only - some may be access repeatedly post boot &
> >>> have a need to update it dynamically. Currently OpenStack can only
> >>> satisfy this if using its network based metadata REST service, but
> >>> many cloud operators refuse to deploy this because they are not happy
> >>> with the guest and host sharing a LAN, leaving only the virtual disk
> >>> option which can not support dynamic update.
> >>
> >> Hi Daniel-
> >>
> >> OK, but why can't the REST service run on VSOCK, for instance?
> >
> > That is a possibility, though cloud-init/OpenStack maintainers are
> > reluctant to add support for new features for the metadata REST
> > service, because the spec being followed is defined by Amazon (as
> > part of EC2), not by OpenStack. So adding new features would be
> > effectively forking the spec by adding stuff Amazon doesn't (yet)
> > support - this is why its IPv4 only, with no IPv6 support too,
> > as Amazon has not defined a standardized IPv6 address for the
> > metadata service at this time.
>
> You guys are asking the NFS community for a similar kind of
> specification change here. We would prefer that you seek that change
> with the relevant authority (the IETF) first before trying to merge
> an implementation of it.
>
> As a first step we have to define RPC operation on VSOCK transports.
> That's the smallest piece of it. Dealing with some of the NFS issues
> (like, what happens to filehandles and lock state during a live
> guest migration) is an even larger challenge.
>
> Sorry, but you can't avoid one interoperability problem (Amazon)
> by introducing another (NFS).

Agreed, I can't argue with that. It does feel overdue to get NFS-over-VSOCK
defined as a formal spec, especially since it was already implemented in
the NFS-ganesha userspace server.

> >> How is VSOCK different than guests and hypervisor sharing a LAN?
> >
> > VSOCK requires no guest configuration, it won't be broken accidentally
> > by NetworkManager (or equivalent), it won't be mistakenly blocked by
> > guest admin/OS adding "deny all" default firewall policy. Similar
> > applies on the host side, and since there's separation from IP networking,
> > there is no possibility of the guest ever getting a channel out to the
> > LAN, even if the host is mis-configurated.
>
> We don't seem to have configuration fragility problems with other
> deployments that scale horizontally.
>
> IMO you should focus on making IP reliable rather than trying to
> move familiar IP-based services to other network fabrics.

I don't see that ever happening, except in a scenario where a single
org is in tight control of the whole stack (host & guest), which is
not the case for cloud in general - only some on-site clouds.

> Or, look at existing transports:
>
> - We have an RPC-over-named-pipes transport that could be used

Could that transport be used with a serial port rather than a
named pipe I wonder ? If so, it could potentially be made to
work with the virtio-serial device model. Is this named pipe
transport already working with the NFS server, or is this just
describing a possible strategy yet to be implemented ?

> - NFS/RDMA with a virtual RoCE adapter would also work, and
> could perform better than something based on TCP streams

I don't know enough about RDMA to answer this, but AFAIK there is
no RDMA device emulation for KVM yet, so that would be another
device model to be created & supported.

> >> Would it be OK if the hypervisor and each guest shared a virtual
> >> point-to-point IP network?
> >
> > No - per above / below text
> >
> >> Can you elaborate on "they are not happy with the guests and host
> >> sharing a LAN" ?
> >
> > The security of the host management LAN is so critical to the cloud,
> > that they're not willing to allow any guest network interface to have
> > an IP visible to/from the host, even if it were locked down with
> > firewall rules. It is just one administrative mis-configuration
> > away from disaster.
>
> For the general case of sharing NFS files, I'm not suggesting that
> the guest and hypervisor share a management LAN for NFS. Rather,
> the suggestion is to have a separate point-to-point storage area
> network that has narrow trust relationships
>
> >>> If the admin takes any live snapshots of the guest, then this throwaway
> >>> disk image has to be kept around for the lifetime of the snapshot too.
> >>> We cannot just throw it away & re-generate it later when restoring the
> >>> snapshot, because we canot guarantee the newly generated image would be
> >>> byte-for-byte identical to the original one we generated due to possible
> >>> changes in mkfs related tools.
> >>
> >> Seems like you could create a loopback mount of a small file to
> >> store configuration data. That would consume very little local
> >> storage. I've done this already in the fedfs-utils-server package,
> >> which creates small loopback mounted filesystems to contain FedFS
> >> domain root directories, for example.
> >>
> >> Sharing the disk serially is a little awkward, but not difficult.
> >> You could use an automounter in the guest to grab that filesystem
> >> when needed, then release it after a period of not being used.
> >
> >
> > With QEMU's previous 9p-over-virtio filesystem support people have
> > built tools which run virtual machines where the root FS is directly
> > running against a 9p share from the host filesystem. It isn't possible
> > to share the host filesystem's /dev/sda (or whatever) to the guest
> > because its a holding a non-cluster filesystem so can't be mounted
> > twice. Likewise you don't want to copy the host filesystems entire
> > contents into a block device and mount that, as its simply impratical
> >
> > With 9p-over-virtio, or NFS-over-VSOCK, we can execute commands
> > present in the host's filesystem, sandboxed inside a QEMU guest
> > by simply sharing the host's '/' FS to the guest and have the
> > guest mount that as its own / (typically it would be read-only,
> > and then a further FS share would be added for writeable areas).
> > For this to be reliable we can't use host IP networking because
> > there's too many ways for that to fail, and if spawning the sandbox
> > as non-root we can't influence the host networking setup at all.
> > Currently it uses 9p-over-virtio for this reason, which works
> > great, except that distros hate the idea of supporting a 9p
> > filesystem driver in the kernel - a NFS driver capable of
> > running over virtio is a much smaller incremental support
> > burden.
>
> OK, so this is not the use case that has been described before.
> This is talking about more than just configuration information.
>
> Let's call this use case "NFSROOT without IP networking".
>
> Still, the practice I was aware of is that OpenStack would
> provide a copy of a golden root filesystem to the hypervisor,
> who would then hydrate that on its local storage, and provide
> it to a new guest as a virtual disk.

The original OpenStack guest storage model is disk based, and in
a simple deployment, you would indeed copy the root filesystem disk
image from the image repository, over to the local virt host and
expose that to the guest. These days though, the more popular option
avoids using virt host local storage, and instead has the root disk
image in an RBD volume that QEMU connects to directly.

There's a further OpenStack project though called Manilla whose
goal is to expose filesystem shares to guests, rather than block
storage. This would typically be used in addition to the block
based storage. eg the guest would have block dev for its root
filesystem, and then a manilla filesystem share for application
data storage. This is where the interest for NFS-over-VSOCK is
coming from in the context of OpenStack.

> >>>> From a user perspective it's much nicer to point to a directory and
> >>>> have it shared with the guest.
> >>>>
> >>>> 2. Using NFS over AF_VSOCK as an interface for a distributed file system
> >>>> like Ceph or Gluster.
> >>>>
> >>>> Hosting providers don't necessarily want to expose their distributed
> >>>> file system directly to the guest. An NFS frontend presents an NFS
> >>>> file system to the guest. The guest doesn't have access to the
> >>>> distributed file system configuration details or network access. The
> >>>> hosting provider can even switch backend file systems without
> >>>> requiring guest configuration changes.
> >>
> >> Notably, NFS can already support hypervisor file sharing and
> >> gateway-ing to Ceph and Gluster. We agree that those are useful.
> >> However VSOCK is not a pre-requisite for either of those use
> >> cases.
> >
> > This again requires that the NFS server which runs on the management LAN
> > be visible to the guest network. So this hits the same problem above with
> > cloud providers wanting those networks completely separate.
>
> No, it doesn't require using the management LAN at all.

The Ceph server is on the management LAN, and guest which has to perform
the NFS client mount is on the guest LAN. So some component must be
visible on both LANs - either its the Ceph server or the NFS server,
neither of which is desired.

> > The desire from OpenStack is to have an NFS server on the compute host,
> > which exposes the Ceph filesystem to the guest over VSOCK
>
> The use of VSOCK is entirely separate from the existence of an
> NFS service on the hypervisor, since NFS can already gateway
> Ceph. Let's just say that the requirement is access to an NFS
> service without IP networking.

FWIW, the Ceph devs have done a proof of concept already where they
use an NFS server running in the host exporting the volume over
VSOCK. They used NFS-ganesha userspace server which already merged
patches to support the VSOCK protocol with NFS.


Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-19 17:24:53

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
> >
> > > On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
> > > VSOCK requires no guest configuration, it won't be broken accidentally
> > > by NetworkManager (or equivalent), it won't be mistakenly blocked by
> > > guest admin/OS adding "deny all" default firewall policy. Similar
> > > applies on the host side, and since there's separation from IP networking,
> > > there is no possibility of the guest ever getting a channel out to the
> > > LAN, even if the host is mis-configurated.
> >
> > We don't seem to have configuration fragility problems with other
> > deployments that scale horizontally.
> >
> > IMO you should focus on making IP reliable rather than trying to
> > move familiar IP-based services to other network fabrics.
>
> I don't see that ever happening, except in a scenario where a single
> org is in tight control of the whole stack (host & guest), which is
> not the case for cloud in general - only some on-site clouds.

Can you elaborate?

I think we're having trouble understanding why you can't just say "don't
do that" to someone whose guest configuration is interfering with the
network interface they need for NFS.

--b.

2017-09-19 17:37:06

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
> >
> > > On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
> > >
> > > On Tue, Sep 19, 2017 at 10:35:49AM -0400, Chuck Lever wrote:
> > >>
> > >>> On Sep 19, 2017, at 5:31 AM, Daniel P. Berrange <[email protected]> wrote:
> > >>>
> > >>> On Mon, Sep 18, 2017 at 07:09:27PM +0100, Stefan Hajnoczi wrote:
> > >>>> There are 2 main use cases:
> > >>>>
> > >>>> 1. Easy file sharing between host & guest
> > >>>>
> > >>>> It's true that a disk image can be used but that's often inconvenient
> > >>>> when the data comes in individual files. Making throwaway ISO or
> > >>>> disk image from those files requires extra disk space, is slow, etc.
> > >>>
> > >>> More critically, it cannot be easily live-updated for a running guest.
> > >>> Not all of the setup data that the hypervisor wants to share with the
> > >>> guest is boot-time only - some may be access repeatedly post boot &
> > >>> have a need to update it dynamically. Currently OpenStack can only
> > >>> satisfy this if using its network based metadata REST service, but
> > >>> many cloud operators refuse to deploy this because they are not happy
> > >>> with the guest and host sharing a LAN, leaving only the virtual disk
> > >>> option which can not support dynamic update.
> > >>
> > >> Hi Daniel-
> > >>
> > >> OK, but why can't the REST service run on VSOCK, for instance?
> > >
> > > That is a possibility, though cloud-init/OpenStack maintainers are
> > > reluctant to add support for new features for the metadata REST
> > > service, because the spec being followed is defined by Amazon (as
> > > part of EC2), not by OpenStack. So adding new features would be
> > > effectively forking the spec by adding stuff Amazon doesn't (yet)
> > > support - this is why its IPv4 only, with no IPv6 support too,
> > > as Amazon has not defined a standardized IPv6 address for the
> > > metadata service at this time.
> >
> > You guys are asking the NFS community for a similar kind of
> > specification change here. We would prefer that you seek that change
> > with the relevant authority (the IETF) first before trying to merge
> > an implementation of it.
> >
> > As a first step we have to define RPC operation on VSOCK transports.
> > That's the smallest piece of it. Dealing with some of the NFS issues
> > (like, what happens to filehandles and lock state during a live
> > guest migration) is an even larger challenge.
> >
> > Sorry, but you can't avoid one interoperability problem (Amazon)
> > by introducing another (NFS).
>
> Agreed, I can't argue with that. It does feel overdue to get NFS-over-VSOCK
> defined as a formal spec, especially since it was already implemented in
> the NFS-ganesha userspace server.

Getting the RPC over AF_VSOCK details through the IETF process is my
next goal.

Stefan

2017-09-19 19:56:56

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 19, 2017, at 12:44 PM, Daniel P. Berrange <[email protected]> wrote:
>
> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
>>
>>> On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
>>>
>>> On Tue, Sep 19, 2017 at 10:35:49AM -0400, Chuck Lever wrote:
>>>>
>>>>> On Sep 19, 2017, at 5:31 AM, Daniel P. Berrange <[email protected]> wrote:
>>>>>
>>>>> On Mon, Sep 18, 2017 at 07:09:27PM +0100, Stefan Hajnoczi wrote:
>>>>>> There are 2 main use cases:
>>>>>>
>>>>>> 1. Easy file sharing between host & guest
>>>>>>
>>>>>> It's true that a disk image can be used but that's often inconvenient
>>>>>> when the data comes in individual files. Making throwaway ISO or
>>>>>> disk image from those files requires extra disk space, is slow, etc.
>>>>>
>>>>> More critically, it cannot be easily live-updated for a running guest.
>>>>> Not all of the setup data that the hypervisor wants to share with the
>>>>> guest is boot-time only - some may be access repeatedly post boot &
>>>>> have a need to update it dynamically. Currently OpenStack can only
>>>>> satisfy this if using its network based metadata REST service, but
>>>>> many cloud operators refuse to deploy this because they are not happy
>>>>> with the guest and host sharing a LAN, leaving only the virtual disk
>>>>> option which can not support dynamic update.
>>>>
>>>> Hi Daniel-
>>>>
>>>> OK, but why can't the REST service run on VSOCK, for instance?
>>>
>>> That is a possibility, though cloud-init/OpenStack maintainers are
>>> reluctant to add support for new features for the metadata REST
>>> service, because the spec being followed is defined by Amazon (as
>>> part of EC2), not by OpenStack. So adding new features would be
>>> effectively forking the spec by adding stuff Amazon doesn't (yet)
>>> support - this is why its IPv4 only, with no IPv6 support too,
>>> as Amazon has not defined a standardized IPv6 address for the
>>> metadata service at this time.
>>
>> You guys are asking the NFS community for a similar kind of
>> specification change here. We would prefer that you seek that change
>> with the relevant authority (the IETF) first before trying to merge
>> an implementation of it.
>>
>> As a first step we have to define RPC operation on VSOCK transports.
>> That's the smallest piece of it. Dealing with some of the NFS issues
>> (like, what happens to filehandles and lock state during a live
>> guest migration) is an even larger challenge.
>>
>> Sorry, but you can't avoid one interoperability problem (Amazon)
>> by introducing another (NFS).
>
> Agreed, I can't argue with that. It does feel overdue to get NFS-over-VSOCK
> defined as a formal spec, especially since it was already implemented in
> the NFS-ganesha userspace server.
>
>>>> How is VSOCK different than guests and hypervisor sharing a LAN?
>>>
>>> VSOCK requires no guest configuration, it won't be broken accidentally
>>> by NetworkManager (or equivalent), it won't be mistakenly blocked by
>>> guest admin/OS adding "deny all" default firewall policy. Similar
>>> applies on the host side, and since there's separation from IP networking,
>>> there is no possibility of the guest ever getting a channel out to the
>>> LAN, even if the host is mis-configurated.
>>
>> We don't seem to have configuration fragility problems with other
>> deployments that scale horizontally.
>>
>> IMO you should focus on making IP reliable rather than trying to
>> move familiar IP-based services to other network fabrics.
>
> I don't see that ever happening, except in a scenario where a single
> org is in tight control of the whole stack (host & guest), which is
> not the case for cloud in general - only some on-site clouds.
>
>> Or, look at existing transports:
>>
>> - We have an RPC-over-named-pipes transport that could be used
>
> Could that transport be used with a serial port rather than a
> named pipe I wonder ? If so, it could potentially be made to
> work with the virtio-serial device model. Is this named pipe
> transport already working with the NFS server, or is this just
> describing a possible strategy yet to be implemented ?

TBH, neither the NFS server nor the NFS client in Linux
support AF_LOCAL, probably for all the reasons we are hemming
and hawing about NFS support for VSOCK. NFS has some very
deep dependencies on IP addressing.

It took many years to get NFS on IPv6 working, for example,
simply because there were so many places where AF_INET was
hardened into implementations and protocols. (I did the Linux
NFS/IPv6 implementation years ago).

However AF_LOCAL avoids the need to provide an RPC transport
specification.


>> - NFS/RDMA with a virtual RoCE adapter would also work, and
>> could perform better than something based on TCP streams
>
> I don't know enough about RDMA to answer this, but AFAIK there is
> no RDMA device emulation for KVM yet, so that would be another
> device model to be created & supported.

As far as I'm aware, there is an ongoing effort to implement
a pvrdma device.

Even without a virtual RDMA device, RoCEv1 can be used on
standard Ethernet via the kernel's rxe provider, and is not
route-able. That could be an effective starting place.

One of the best-known shortcomings of using NFS in a guest
is that virtual network devices are expensive in terms of
real CPU utilization and often have higher latency than a
physical adapter by itself because of context switching
overhead. Both issues have immediate impact on NFS
performance.

In terms of creating a solution that performs decently, I'd
think an RDMA-based solution is the best bet. The guest and
hypervisor can accelerate data transfer between themselves
using memory flipping techniques (in the long run).


>>>> Would it be OK if the hypervisor and each guest shared a virtual
>>>> point-to-point IP network?
>>>
>>> No - per above / below text
>>>
>>>> Can you elaborate on "they are not happy with the guests and host
>>>> sharing a LAN" ?
>>>
>>> The security of the host management LAN is so critical to the cloud,
>>> that they're not willing to allow any guest network interface to have
>>> an IP visible to/from the host, even if it were locked down with
>>> firewall rules. It is just one administrative mis-configuration
>>> away from disaster.
>>
>> For the general case of sharing NFS files, I'm not suggesting that
>> the guest and hypervisor share a management LAN for NFS. Rather,
>> the suggestion is to have a separate point-to-point storage area
>> network that has narrow trust relationships
>>
>>>>> If the admin takes any live snapshots of the guest, then this throwaway
>>>>> disk image has to be kept around for the lifetime of the snapshot too.
>>>>> We cannot just throw it away & re-generate it later when restoring the
>>>>> snapshot, because we canot guarantee the newly generated image would be
>>>>> byte-for-byte identical to the original one we generated due to possible
>>>>> changes in mkfs related tools.
>>>>
>>>> Seems like you could create a loopback mount of a small file to
>>>> store configuration data. That would consume very little local
>>>> storage. I've done this already in the fedfs-utils-server package,
>>>> which creates small loopback mounted filesystems to contain FedFS
>>>> domain root directories, for example.
>>>>
>>>> Sharing the disk serially is a little awkward, but not difficult.
>>>> You could use an automounter in the guest to grab that filesystem
>>>> when needed, then release it after a period of not being used.
>>>
>>>
>>> With QEMU's previous 9p-over-virtio filesystem support people have
>>> built tools which run virtual machines where the root FS is directly
>>> running against a 9p share from the host filesystem. It isn't possible
>>> to share the host filesystem's /dev/sda (or whatever) to the guest
>>> because its a holding a non-cluster filesystem so can't be mounted
>>> twice. Likewise you don't want to copy the host filesystems entire
>>> contents into a block device and mount that, as its simply impratical
>>>
>>> With 9p-over-virtio, or NFS-over-VSOCK, we can execute commands
>>> present in the host's filesystem, sandboxed inside a QEMU guest
>>> by simply sharing the host's '/' FS to the guest and have the
>>> guest mount that as its own / (typically it would be read-only,
>>> and then a further FS share would be added for writeable areas).
>>> For this to be reliable we can't use host IP networking because
>>> there's too many ways for that to fail, and if spawning the sandbox
>>> as non-root we can't influence the host networking setup at all.
>>> Currently it uses 9p-over-virtio for this reason, which works
>>> great, except that distros hate the idea of supporting a 9p
>>> filesystem driver in the kernel - a NFS driver capable of
>>> running over virtio is a much smaller incremental support
>>> burden.
>>
>> OK, so this is not the use case that has been described before.
>> This is talking about more than just configuration information.
>>
>> Let's call this use case "NFSROOT without IP networking".
>>
>> Still, the practice I was aware of is that OpenStack would
>> provide a copy of a golden root filesystem to the hypervisor,
>> who would then hydrate that on its local storage, and provide
>> it to a new guest as a virtual disk.
>
> The original OpenStack guest storage model is disk based, and in
> a simple deployment, you would indeed copy the root filesystem disk
> image from the image repository, over to the local virt host and
> expose that to the guest. These days though, the more popular option
> avoids using virt host local storage, and instead has the root disk
> image in an RBD volume that QEMU connects to directly.
>
> There's a further OpenStack project though called Manilla whose
> goal is to expose filesystem shares to guests, rather than block
> storage. This would typically be used in addition to the block
> based storage. eg the guest would have block dev for its root
> filesystem, and then a manilla filesystem share for application
> data storage. This is where the interest for NFS-over-VSOCK is
> coming from in the context of OpenStack.

Thanks for the further detail. I do understand and appreciate
the interest in secure ways to access NFS storage from guests in
cloud environments. I would like to see NFS play a larger part
in the cloud storage space too. File-based storage management
does have a place here.

However my understanding of the comments so far is that as much
as we share the desire to make this model of file sharing work,
we would like to be sure that the use cases you've described
cannot be accommodated using existing mechanisms.


>>>>>> From a user perspective it's much nicer to point to a directory and
>>>>>> have it shared with the guest.
>>>>>>
>>>>>> 2. Using NFS over AF_VSOCK as an interface for a distributed file system
>>>>>> like Ceph or Gluster.
>>>>>>
>>>>>> Hosting providers don't necessarily want to expose their distributed
>>>>>> file system directly to the guest. An NFS frontend presents an NFS
>>>>>> file system to the guest. The guest doesn't have access to the
>>>>>> distributed file system configuration details or network access. The
>>>>>> hosting provider can even switch backend file systems without
>>>>>> requiring guest configuration changes.
>>>>
>>>> Notably, NFS can already support hypervisor file sharing and
>>>> gateway-ing to Ceph and Gluster. We agree that those are useful.
>>>> However VSOCK is not a pre-requisite for either of those use
>>>> cases.
>>>
>>> This again requires that the NFS server which runs on the management LAN
>>> be visible to the guest network. So this hits the same problem above with
>>> cloud providers wanting those networks completely separate.
>>
>> No, it doesn't require using the management LAN at all.
>
> The Ceph server is on the management LAN, and guest which has to perform
> the NFS client mount is on the guest LAN. So some component must be
> visible on both LANs - either its the Ceph server or the NFS server,
> neither of which is desired.

The hypervisor accesses Ceph storage via the management LAN.
The NFS server is exposed on private (possibly virtual) storage
area networks that are shared with each guest. This is a common
way of providing better security and QoS for NFS traffic. IMO
the Ceph and management services could be adequately hidden
from guests in this way.


>>> The desire from OpenStack is to have an NFS server on the compute host,
>>> which exposes the Ceph filesystem to the guest over VSOCK
>>
>> The use of VSOCK is entirely separate from the existence of an
>> NFS service on the hypervisor, since NFS can already gateway
>> Ceph. Let's just say that the requirement is access to an NFS
>> service without IP networking.
>
> FWIW, the Ceph devs have done a proof of concept already where they
> use an NFS server running in the host exporting the volume over
> VSOCK. They used NFS-ganesha userspace server which already merged
> patches to support the VSOCK protocol with NFS.

A proof of concept is nice, but it isn't sufficient for merging
NFS/VSOCK into upstream Linux. Unlike Ceph, NFS is an Internet
standard. We can't introduce changes as we please and expect
the rest of the world to follow us.

I know the Ganesha folks chafe at this requirement, because
standardization progress can sometimes be measured in geological
time units.


--
Chuck Lever




2017-09-19 20:42:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 19, 2017 at 03:56:50PM -0400, Chuck Lever wrote:
> A proof of concept is nice, but it isn't sufficient for merging
> NFS/VSOCK into upstream Linux. Unlike Ceph, NFS is an Internet
> standard. We can't introduce changes as we please and expect
> the rest of the world to follow us.
>
> I know the Ganesha folks chafe at this requirement, because
> standardization progress can sometimes be measured in geological
> time units.

It doesn't need to be--I think we're only asking for a few pages here,
and nothing especially complicated (at the protocol level). That
shouldn't take so long. (Not to be published as an RFC, necessarily,
but to get far enough along that we can be pretty certain it won't need
incompatible changes.)

--b.

2017-09-19 21:09:31

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 19, 2017, at 4:42 PM, J. Bruce Fields <[email protected]> wrote:
>
> On Tue, Sep 19, 2017 at 03:56:50PM -0400, Chuck Lever wrote:
>> A proof of concept is nice, but it isn't sufficient for merging
>> NFS/VSOCK into upstream Linux. Unlike Ceph, NFS is an Internet
>> standard. We can't introduce changes as we please and expect
>> the rest of the world to follow us.
>>
>> I know the Ganesha folks chafe at this requirement, because
>> standardization progress can sometimes be measured in geological
>> time units.
>
> It doesn't need to be--I think we're only asking for a few pages here,
> and nothing especially complicated (at the protocol level).

That would define RPC over VSOCK. I would like to see a problem
statement here, and we'd want to find a normative reference defining
VSOCK addressing. Does one exist?

My sense is that NFS on VSOCK would need more. The proof of concept
I'm aware of drops a lot of functionality (for example, NFSv2/3 is
excluded, and so is RPCSEC GSS and NFSv4.0 backchannel) to make NFS
work on this transport. Interoperability would require that list
be written somewhere.

We also need to deal with filehandles and lock state during guest
live migration.

That feels like more than a few pages to me.


> That
> shouldn't take so long. (Not to be published as an RFC, necessarily,
> but to get far enough along that we can be pretty certain it won't need
> incompatible changes.)


--
Chuck Lever


2017-09-20 13:16:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 19, 2017 at 05:09:25PM -0400, Chuck Lever wrote:
>
> > On Sep 19, 2017, at 4:42 PM, J. Bruce Fields <[email protected]> wrote:
> >
> > On Tue, Sep 19, 2017 at 03:56:50PM -0400, Chuck Lever wrote:
> >> A proof of concept is nice, but it isn't sufficient for merging
> >> NFS/VSOCK into upstream Linux. Unlike Ceph, NFS is an Internet
> >> standard. We can't introduce changes as we please and expect
> >> the rest of the world to follow us.
> >>
> >> I know the Ganesha folks chafe at this requirement, because
> >> standardization progress can sometimes be measured in geological
> >> time units.
> >
> > It doesn't need to be--I think we're only asking for a few pages here,
> > and nothing especially complicated (at the protocol level).
>
> That would define RPC over VSOCK. I would like to see a problem
> statement here, and we'd want to find a normative reference defining
> VSOCK addressing. Does one exist?
>
> My sense is that NFS on VSOCK would need more. The proof of concept
> I'm aware of drops a lot of functionality (for example, NFSv2/3 is
> excluded, and so is RPCSEC GSS and NFSv4.0 backchannel) to make NFS
> work on this transport. Interoperability would require that list
> be written somewhere.

I don't think they need to support NFSv2/3 or RPCSEC_GSS, but it could
be worth a little text to explain why not, if they don't.

> We also need to deal with filehandles and lock state during guest
> live migration.

That sounds like a separate issue independent of transport.

I've been assuming there's still some use to them in an implementation
that doesn't support migration at first. If not it's a bigger project.

--b.

>
> That feels like more than a few pages to me.
>
>
> > That
> > shouldn't take so long. (Not to be published as an RFC, necessarily,
> > but to get far enough along that we can be pretty certain it won't need
> > incompatible changes.)
>
>
> --
> Chuck Lever

2017-09-20 14:40:53

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 20, 2017, at 9:16 AM, J. Bruce Fields <[email protected]> wrote:
>
> On Tue, Sep 19, 2017 at 05:09:25PM -0400, Chuck Lever wrote:
>>
>>> On Sep 19, 2017, at 4:42 PM, J. Bruce Fields <[email protected]> wrote:
>>>
>>> On Tue, Sep 19, 2017 at 03:56:50PM -0400, Chuck Lever wrote:
>>>> A proof of concept is nice, but it isn't sufficient for merging
>>>> NFS/VSOCK into upstream Linux. Unlike Ceph, NFS is an Internet
>>>> standard. We can't introduce changes as we please and expect
>>>> the rest of the world to follow us.
>>>>
>>>> I know the Ganesha folks chafe at this requirement, because
>>>> standardization progress can sometimes be measured in geological
>>>> time units.
>>>
>>> It doesn't need to be--I think we're only asking for a few pages here,
>>> and nothing especially complicated (at the protocol level).
>>
>> That would define RPC over VSOCK. I would like to see a problem
>> statement here, and we'd want to find a normative reference defining
>> VSOCK addressing. Does one exist?
>>
>> My sense is that NFS on VSOCK would need more. The proof of concept
>> I'm aware of drops a lot of functionality (for example, NFSv2/3 is
>> excluded, and so is RPCSEC GSS and NFSv4.0 backchannel) to make NFS
>> work on this transport. Interoperability would require that list
>> be written somewhere.
>
> I don't think they need to support NFSv2/3 or RPCSEC_GSS, but it could
> be worth a little text to explain why not, if they don't.

Right, I'm not taking a position on whether or not those things
should be supported. But the list of features that don't work on
VSOCK mounts is substantial and needs to be documented, IMO.


>> We also need to deal with filehandles and lock state during guest
>> live migration.
>
> That sounds like a separate issue independent of transport.

Yes, it is separate from transport specifics, but it's significant.


> I've been assuming there's still some use to them in an implementation
> that doesn't support migration at first.

File handles suddenly change and lock state vanishes after a live
migration event, both of which would be catastrophic for hypervisor
mount points. This might be mitigated with some NFS protocol
changes, but some implementation work is definitely required. This
work hasn't been scoped, as far as I am aware.

2017-09-20 14:45:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 20, 2017 at 10:40:45AM -0400, Chuck Lever wrote:
> File handles suddenly change and lock state vanishes after a live
> migration event, both of which would be catastrophic for hypervisor
> mount points.

They're talking about a Ganesha/Ceph backend. It should be able to
preserve filehandles.

Lock migration will require server-side implementation work but not
protocol changes that I'm aware of.

It could be a lot of implementation work, though.

--b.

2017-09-20 14:58:59

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 20, 2017 at 09:16:41AM -0400, J. Bruce Fields wrote:
> On Tue, Sep 19, 2017 at 05:09:25PM -0400, Chuck Lever wrote:
> >
> > > On Sep 19, 2017, at 4:42 PM, J. Bruce Fields <[email protected]> wrote:
> > >
> > > On Tue, Sep 19, 2017 at 03:56:50PM -0400, Chuck Lever wrote:
> > >> A proof of concept is nice, but it isn't sufficient for merging
> > >> NFS/VSOCK into upstream Linux. Unlike Ceph, NFS is an Internet
> > >> standard. We can't introduce changes as we please and expect
> > >> the rest of the world to follow us.
> > >>
> > >> I know the Ganesha folks chafe at this requirement, because
> > >> standardization progress can sometimes be measured in geological
> > >> time units.
> > >
> > > It doesn't need to be--I think we're only asking for a few pages here,
> > > and nothing especially complicated (at the protocol level).
> >
> > That would define RPC over VSOCK. I would like to see a problem
> > statement here, and we'd want to find a normative reference defining
> > VSOCK addressing. Does one exist?
> >
> > My sense is that NFS on VSOCK would need more. The proof of concept
> > I'm aware of drops a lot of functionality (for example, NFSv2/3 is
> > excluded, and so is RPCSEC GSS and NFSv4.0 backchannel) to make NFS
> > work on this transport. Interoperability would require that list
> > be written somewhere.
>
> I don't think they need to support NFSv2/3 or RPCSEC_GSS, but it could
> be worth a little text to explain why not, if they don't.
>
> > We also need to deal with filehandles and lock state during guest
> > live migration.
>
> That sounds like a separate issue independent of transport.
>
> I've been assuming there's still some use to them in an implementation
> that doesn't support migration at first. If not it's a bigger project.

FWIW, the current virtio-9p filesystem passthrough does not support
migration and still has found usage in a number of scenarios where
that limitation is not important. ClearContainers / libvirt sandbox
are two which use 9p and don't care about migration. With this in mind,
NFS-over-VSOCK would still be valuable even if the specification
explicitly stated that live migration is not permitted.

If there was a specified way of supporting live migration with good
semantics of the guest FS, then that would open up NFS-over-VSOCK to
a wider range of use cases. Migration would likely be quite important
to OpenStack usage, since live migration is the way cloud providers
typically upgrade virt hosts to newer releases of OpenStack software.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-20 14:59:35

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 20, 2017, at 10:45 AM, J. Bruce Fields <[email protected]> wrote:
>
> On Wed, Sep 20, 2017 at 10:40:45AM -0400, Chuck Lever wrote:
>> File handles suddenly change and lock state vanishes after a live
>> migration event, both of which would be catastrophic for hypervisor
>> mount points.
>
> They're talking about a Ganesha/Ceph backend. It should be able to
> preserve filehandles.

That's only one possible implementation. I'm thinking in terms of
what needs to be documented for interoperability purposes.


> Lock migration will require server-side implementation work but not
> protocol changes that I'm aware of.
>
> It could be a lot of implementation work, though.

Agreed.


--
Chuck Lever




2017-09-20 16:39:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 20, 2017 at 03:58:49PM +0100, Daniel P. Berrange wrote:
> FWIW, the current virtio-9p filesystem passthrough does not support
> migration and still has found usage in a number of scenarios where
> that limitation is not important. ClearContainers / libvirt sandbox
> are two which use 9p and don't care about migration. With this in mind,
> NFS-over-VSOCK would still be valuable even if the specification
> explicitly stated that live migration is not permitted.
>
> If there was a specified way of supporting live migration with good
> semantics of the guest FS, then that would open up NFS-over-VSOCK to
> a wider range of use cases. Migration would likely be quite important
> to OpenStack usage, since live migration is the way cloud providers
> typically upgrade virt hosts to newer releases of OpenStack software.

Exactly what I was curious about, thanks!

--b.

2017-09-20 17:39:07

by Frank Filz

[permalink] [raw]
Subject: RE: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

> > On Sep 20, 2017, at 10:45 AM, J. Bruce Fields <[email protected]>
wrote:
> >
> > On Wed, Sep 20, 2017 at 10:40:45AM -0400, Chuck Lever wrote:
> >> File handles suddenly change and lock state vanishes after a live
> >> migration event, both of which would be catastrophic for hypervisor
> >> mount points.
> >
> > They're talking about a Ganesha/Ceph backend. It should be able to
> > preserve filehandles.
>
> That's only one possible implementation. I'm thinking in terms of what
needs
> to be documented for interoperability purposes.

It seems like live migration pretty much requires a back end that will
preserve file handles.

> > Lock migration will require server-side implementation work but not
> > protocol changes that I'm aware of.
> >
> > It could be a lot of implementation work, though.
>
> Agreed.

I think the lock migration can be handled the way we handle state migration
in an HA environment - where we treat it as a server reboot to the client
(so SM_NOTIFY to v3 clients, the various errors v4 uses to signal server
reboot, in either case, the client will initiate lock reclaim).

Frank


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


2017-09-20 18:17:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

T24gV2VkLCAyMDE3LTA5LTIwIGF0IDA4OjI1IC0wNzAwLCBGcmFuayBGaWx6IHdyb3RlOg0KPiA+
ID4gT24gU2VwIDIwLCAyMDE3LCBhdCAxMDo0NSBBTSwgSi4gQnJ1Y2UgRmllbGRzIDxiZmllbGRz
QGZpZWxkc2VzLm8NCj4gPiA+IHJnPg0KPiANCj4gd3JvdGU6DQo+ID4gPiANCj4gPiA+IE9uIFdl
ZCwgU2VwIDIwLCAyMDE3IGF0IDEwOjQwOjQ1QU0gLTA0MDAsIENodWNrIExldmVyIHdyb3RlOg0K
PiA+ID4gPiBGaWxlIGhhbmRsZXMgc3VkZGVubHkgY2hhbmdlIGFuZCBsb2NrIHN0YXRlIHZhbmlz
aGVzIGFmdGVyIGENCj4gPiA+ID4gbGl2ZQ0KPiA+ID4gPiBtaWdyYXRpb24gZXZlbnQsIGJvdGgg
b2Ygd2hpY2ggd291bGQgYmUgY2F0YXN0cm9waGljIGZvcg0KPiA+ID4gPiBoeXBlcnZpc29yDQo+
ID4gPiA+IG1vdW50IHBvaW50cy4NCj4gPiA+IA0KPiA+ID4gVGhleSdyZSB0YWxraW5nIGFib3V0
IGEgR2FuZXNoYS9DZXBoIGJhY2tlbmQuICBJdCBzaG91bGQgYmUgYWJsZQ0KPiA+ID4gdG8NCj4g
PiA+IHByZXNlcnZlIGZpbGVoYW5kbGVzLg0KPiA+IA0KPiA+IFRoYXQncyBvbmx5IG9uZSBwb3Nz
aWJsZSBpbXBsZW1lbnRhdGlvbi4gSSdtIHRoaW5raW5nIGluIHRlcm1zIG9mDQo+ID4gd2hhdA0K
PiANCj4gbmVlZHMNCj4gPiB0byBiZSBkb2N1bWVudGVkIGZvciBpbnRlcm9wZXJhYmlsaXR5IHB1
cnBvc2VzLg0KPiANCj4gSXQgc2VlbXMgbGlrZSBsaXZlIG1pZ3JhdGlvbiBwcmV0dHkgbXVjaCBy
ZXF1aXJlcyBhIGJhY2sgZW5kIHRoYXQNCj4gd2lsbA0KPiBwcmVzZXJ2ZSBmaWxlIGhhbmRsZXMu
DQo+IA0KPiA+ID4gTG9jayBtaWdyYXRpb24gd2lsbCByZXF1aXJlIHNlcnZlci1zaWRlIGltcGxl
bWVudGF0aW9uIHdvcmsgYnV0DQo+ID4gPiBub3QNCj4gPiA+IHByb3RvY29sIGNoYW5nZXMgdGhh
dCBJJ20gYXdhcmUgb2YuDQo+ID4gPiANCj4gPiA+IEl0IGNvdWxkIGJlIGEgbG90IG9mIGltcGxl
bWVudGF0aW9uIHdvcmssIHRob3VnaC4NCj4gPiANCj4gPiBBZ3JlZWQuDQo+IA0KPiBJIHRoaW5r
IHRoZSBsb2NrIG1pZ3JhdGlvbiBjYW4gYmUgaGFuZGxlZCB0aGUgd2F5IHdlIGhhbmRsZSBzdGF0
ZQ0KPiBtaWdyYXRpb24NCj4gaW4gYW4gSEEgZW52aXJvbm1lbnQgLSB3aGVyZSB3ZSB0cmVhdCBp
dCBhcyBhIHNlcnZlciByZWJvb3QgdG8gdGhlDQo+IGNsaWVudA0KPiAoc28gU01fTk9USUZZIHRv
IHYzIGNsaWVudHMsIHRoZSB2YXJpb3VzIGVycm9ycyB2NCB1c2VzIHRvIHNpZ25hbA0KPiBzZXJ2
ZXINCj4gcmVib290LCBpbiBlaXRoZXIgY2FzZSwgdGhlIGNsaWVudCB3aWxsIGluaXRpYXRlIGxv
Y2sgcmVjbGFpbSkuDQo+IA0KDQpNaW5kIHNob3dpbmcgdXMgYW4gYXJjaGl0ZWN0dXJlIGZvciB0
aGF0PyBBcyBmYXIgYXMgSSBjYW4gc2VlLCB0aGUNCmxheWVyaW5nIGlzIGFzIGZvbGxvd3M6DQoN
ClZNIGNsaWVudA0KLS0tLS0tLS0tLS0tLS0NCmhvc3Qga25mc2QNCi0tLS0tLS0tLS0tLS0tDQpo
b3N0IGNsaWVudA0KLS0tLS0tLS0tLS0tLS0NClN0b3JhZ2Ugc2VydmVyDQoNClNvIGhvdyB3b3Vs
ZCB5b3Ugbm90aWZ5IHRoZSBWTSBjbGllbnQgdGhhdCBpdHMgbG9ja3MgaGF2ZSBiZWVuDQptaWdy
YXRlZD8NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5l
ciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg==


2017-09-20 18:34:04

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 20, 2017 at 06:17:07PM +0000, Trond Myklebust wrote:
> On Wed, 2017-09-20 at 08:25 -0700, Frank Filz wrote:
> > > > On Sep 20, 2017, at 10:45 AM, J. Bruce Fields <[email protected]
> > > > rg>
> >
> > wrote:
> > > >
> > > > On Wed, Sep 20, 2017 at 10:40:45AM -0400, Chuck Lever wrote:
> > > > > File handles suddenly change and lock state vanishes after a
> > > > > live
> > > > > migration event, both of which would be catastrophic for
> > > > > hypervisor
> > > > > mount points.
> > > >
> > > > They're talking about a Ganesha/Ceph backend. It should be able
> > > > to
> > > > preserve filehandles.
> > >
> > > That's only one possible implementation. I'm thinking in terms of
> > > what
> >
> > needs
> > > to be documented for interoperability purposes.
> >
> > It seems like live migration pretty much requires a back end that
> > will
> > preserve file handles.
> >
> > > > Lock migration will require server-side implementation work but
> > > > not
> > > > protocol changes that I'm aware of.
> > > >
> > > > It could be a lot of implementation work, though.
> > >
> > > Agreed.
> >
> > I think the lock migration can be handled the way we handle state
> > migration
> > in an HA environment - where we treat it as a server reboot to the
> > client
> > (so SM_NOTIFY to v3 clients, the various errors v4 uses to signal
> > server
> > reboot, in either case, the client will initiate lock reclaim).
> >
>
> Mind showing us an architecture for that? As far as I can see, the
> layering is as follows:
>
> VM client
> --------------
> host knfsd
> --------------
> host client
> --------------
> Storage server
>
> So how would you notify the VM client that its locks have been
> migrated?

All I've seen mentioned in this thread is

VM client
---------
host Ganesha
---------
Ceph or Gluster

Did I misunderstand?

NFS proxying would certainly make it all more entertaining.

--b.

2017-09-20 18:38:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

T24gV2VkLCAyMDE3LTA5LTIwIGF0IDE0OjM0IC0wNDAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3
cm90ZToNCj4gT24gV2VkLCBTZXAgMjAsIDIwMTcgYXQgMDY6MTc6MDdQTSArMDAwMCwgVHJvbmQg
TXlrbGVidXN0IHdyb3RlOg0KPiA+IE9uIFdlZCwgMjAxNy0wOS0yMCBhdCAwODoyNSAtMDcwMCwg
RnJhbmsgRmlseiB3cm90ZToNCj4gPiA+ID4gPiBPbiBTZXAgMjAsIDIwMTcsIGF0IDEwOjQ1IEFN
LCBKLiBCcnVjZSBGaWVsZHMgPGJmaWVsZHNAZmllbGRzDQo+ID4gPiA+ID4gZXMubw0KPiA+ID4g
PiA+IHJnPg0KPiA+ID4gDQo+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBPbiBX
ZWQsIFNlcCAyMCwgMjAxNyBhdCAxMDo0MDo0NUFNIC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToN
Cj4gPiA+ID4gPiA+IEZpbGUgaGFuZGxlcyBzdWRkZW5seSBjaGFuZ2UgYW5kIGxvY2sgc3RhdGUg
dmFuaXNoZXMgYWZ0ZXINCj4gPiA+ID4gPiA+IGENCj4gPiA+ID4gPiA+IGxpdmUNCj4gPiA+ID4g
PiA+IG1pZ3JhdGlvbiBldmVudCwgYm90aCBvZiB3aGljaCB3b3VsZCBiZSBjYXRhc3Ryb3BoaWMg
Zm9yDQo+ID4gPiA+ID4gPiBoeXBlcnZpc29yDQo+ID4gPiA+ID4gPiBtb3VudCBwb2ludHMuDQo+
ID4gPiA+ID4gDQo+ID4gPiA+ID4gVGhleSdyZSB0YWxraW5nIGFib3V0IGEgR2FuZXNoYS9DZXBo
IGJhY2tlbmQuICBJdCBzaG91bGQgYmUNCj4gPiA+ID4gPiBhYmxlDQo+ID4gPiA+ID4gdG8NCj4g
PiA+ID4gPiBwcmVzZXJ2ZSBmaWxlaGFuZGxlcy4NCj4gPiA+ID4gDQo+ID4gPiA+IFRoYXQncyBv
bmx5IG9uZSBwb3NzaWJsZSBpbXBsZW1lbnRhdGlvbi4gSSdtIHRoaW5raW5nIGluIHRlcm1zDQo+
ID4gPiA+IG9mDQo+ID4gPiA+IHdoYXQNCj4gPiA+IA0KPiA+ID4gbmVlZHMNCj4gPiA+ID4gdG8g
YmUgZG9jdW1lbnRlZCBmb3IgaW50ZXJvcGVyYWJpbGl0eSBwdXJwb3Nlcy4NCj4gPiA+IA0KPiA+
ID4gSXQgc2VlbXMgbGlrZSBsaXZlIG1pZ3JhdGlvbiBwcmV0dHkgbXVjaCByZXF1aXJlcyBhIGJh
Y2sgZW5kIHRoYXQNCj4gPiA+IHdpbGwNCj4gPiA+IHByZXNlcnZlIGZpbGUgaGFuZGxlcy4NCj4g
PiA+IA0KPiA+ID4gPiA+IExvY2sgbWlncmF0aW9uIHdpbGwgcmVxdWlyZSBzZXJ2ZXItc2lkZSBp
bXBsZW1lbnRhdGlvbiB3b3JrDQo+ID4gPiA+ID4gYnV0DQo+ID4gPiA+ID4gbm90DQo+ID4gPiA+
ID4gcHJvdG9jb2wgY2hhbmdlcyB0aGF0IEknbSBhd2FyZSBvZi4NCj4gPiA+ID4gPiANCj4gPiA+
ID4gPiBJdCBjb3VsZCBiZSBhIGxvdCBvZiBpbXBsZW1lbnRhdGlvbiB3b3JrLCB0aG91Z2guDQo+
ID4gPiA+IA0KPiA+ID4gPiBBZ3JlZWQuDQo+ID4gPiANCj4gPiA+IEkgdGhpbmsgdGhlIGxvY2sg
bWlncmF0aW9uIGNhbiBiZSBoYW5kbGVkIHRoZSB3YXkgd2UgaGFuZGxlIHN0YXRlDQo+ID4gPiBt
aWdyYXRpb24NCj4gPiA+IGluIGFuIEhBIGVudmlyb25tZW50IC0gd2hlcmUgd2UgdHJlYXQgaXQg
YXMgYSBzZXJ2ZXIgcmVib290IHRvDQo+ID4gPiB0aGUNCj4gPiA+IGNsaWVudA0KPiA+ID4gKHNv
IFNNX05PVElGWSB0byB2MyBjbGllbnRzLCB0aGUgdmFyaW91cyBlcnJvcnMgdjQgdXNlcyB0byBz
aWduYWwNCj4gPiA+IHNlcnZlcg0KPiA+ID4gcmVib290LCBpbiBlaXRoZXIgY2FzZSwgdGhlIGNs
aWVudCB3aWxsIGluaXRpYXRlIGxvY2sgcmVjbGFpbSkuDQo+ID4gPiANCj4gPiANCj4gPiBNaW5k
IHNob3dpbmcgdXMgYW4gYXJjaGl0ZWN0dXJlIGZvciB0aGF0PyBBcyBmYXIgYXMgSSBjYW4gc2Vl
LCB0aGUNCj4gPiBsYXllcmluZyBpcyBhcyBmb2xsb3dzOg0KPiA+IA0KPiA+IFZNIGNsaWVudA0K
PiA+IC0tLS0tLS0tLS0tLS0tDQo+ID4gaG9zdCBrbmZzZA0KPiA+IC0tLS0tLS0tLS0tLS0tDQo+
ID4gaG9zdCBjbGllbnQNCj4gPiAtLS0tLS0tLS0tLS0tLQ0KPiA+IFN0b3JhZ2Ugc2VydmVyDQo+
ID4gDQo+ID4gU28gaG93IHdvdWxkIHlvdSBub3RpZnkgdGhlIFZNIGNsaWVudCB0aGF0IGl0cyBs
b2NrcyBoYXZlIGJlZW4NCj4gPiBtaWdyYXRlZD8NCj4gDQo+IEFsbCBJJ3ZlIHNlZW4gbWVudGlv
bmVkIGluIHRoaXMgdGhyZWFkIGlzDQo+IA0KPiAgIFZNIGNsaWVudA0KPiAgIC0tLS0tLS0tLQ0K
PiAgIGhvc3QgR2FuZXNoYQ0KPiAgIC0tLS0tLS0tLQ0KPiAgIENlcGggb3IgR2x1c3Rlcg0KPiAN
Cj4gRGlkIEkgbWlzdW5kZXJzdGFuZD8NCj4gDQo+IE5GUyBwcm94eWluZyB3b3VsZCBjZXJ0YWlu
bHkgbWFrZSBpdCBhbGwgbW9yZSBlbnRlcnRhaW5pbmcuDQo+IA0KDQpQcmV0dHkgc3VyZSBJJ3Zl
IG1lbnRpb25lZCBpdCBiZWZvcmUgaW4gdGhlc2UgVlNPQ0sgdGhyZWFkcy4gSQ0KcGVyc29uYWxs
eSBzZWUgdGhhdCBhcyBhIGxvdCBtb3JlIGludGVyZXN0aW5nIHRoYW4gcmUtZXhwb3J0aW5nIGNl
cGgNCmFuZCBnbHVzdHJlLi4uDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xp
ZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEu
Y29tDQo=


2017-09-21 16:20:24

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 20, 2017 at 02:34:04PM -0400, [email protected] wrote:
> On Wed, Sep 20, 2017 at 06:17:07PM +0000, Trond Myklebust wrote:
> > On Wed, 2017-09-20 at 08:25 -0700, Frank Filz wrote:
> > > > > On Sep 20, 2017, at 10:45 AM, J. Bruce Fields <[email protected]
> > > > > rg>
> > >
> > > wrote:
> > > > >
> > > > > On Wed, Sep 20, 2017 at 10:40:45AM -0400, Chuck Lever wrote:
> > > > > > File handles suddenly change and lock state vanishes after a
> > > > > > live
> > > > > > migration event, both of which would be catastrophic for
> > > > > > hypervisor
> > > > > > mount points.
> > > > >
> > > > > They're talking about a Ganesha/Ceph backend. It should be able
> > > > > to
> > > > > preserve filehandles.
> > > >
> > > > That's only one possible implementation. I'm thinking in terms of
> > > > what
> > >
> > > needs
> > > > to be documented for interoperability purposes.
> > >
> > > It seems like live migration pretty much requires a back end that
> > > will
> > > preserve file handles.
> > >
> > > > > Lock migration will require server-side implementation work but
> > > > > not
> > > > > protocol changes that I'm aware of.
> > > > >
> > > > > It could be a lot of implementation work, though.
> > > >
> > > > Agreed.
> > >
> > > I think the lock migration can be handled the way we handle state
> > > migration
> > > in an HA environment - where we treat it as a server reboot to the
> > > client
> > > (so SM_NOTIFY to v3 clients, the various errors v4 uses to signal
> > > server
> > > reboot, in either case, the client will initiate lock reclaim).
> > >
> >
> > Mind showing us an architecture for that? As far as I can see, the
> > layering is as follows:
> >
> > VM client
> > --------------
> > host knfsd
> > --------------
> > host client
> > --------------
> > Storage server
> >
> > So how would you notify the VM client that its locks have been
> > migrated?
>
> All I've seen mentioned in this thread is
>
> VM client
> ---------
> host Ganesha
> ---------
> Ceph or Gluster
>
> Did I misunderstand?

Yes, this is the Ceph/OpenStack Manilla configuration.

> NFS proxying would certainly make it all more entertaining.

I'm not aware of a use case for proxying (re-exporting) but it was
mentioned when discussing the challenge of migration.

Stefan

2017-09-22 08:32:05

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
> On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
> > On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
> > >
> > > > On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
> > > > VSOCK requires no guest configuration, it won't be broken accidentally
> > > > by NetworkManager (or equivalent), it won't be mistakenly blocked by
> > > > guest admin/OS adding "deny all" default firewall policy. Similar
> > > > applies on the host side, and since there's separation from IP networking,
> > > > there is no possibility of the guest ever getting a channel out to the
> > > > LAN, even if the host is mis-configurated.
> > >
> > > We don't seem to have configuration fragility problems with other
> > > deployments that scale horizontally.
> > >
> > > IMO you should focus on making IP reliable rather than trying to
> > > move familiar IP-based services to other network fabrics.
> >
> > I don't see that ever happening, except in a scenario where a single
> > org is in tight control of the whole stack (host & guest), which is
> > not the case for cloud in general - only some on-site clouds.
>
> Can you elaborate?
>
> I think we're having trouble understanding why you can't just say "don't
> do that" to someone whose guest configuration is interfering with the
> network interface they need for NFS.

Dan can add more information on the OpenStack use case, but your
question is equally relevant to the other use case I mentioned - easy
file sharing between host and guest.

Management tools like virt-manager (https://virt-manager.org/) should
support a "share directory with VM" feature. The user chooses a
directory on the host, a mount point inside the guest, and then clicks
OK. The directory should appear inside the guest.

VMware, VirtualBox, etc have had file sharing for a long time. It's a
standard feature.

Here is how to implement it using AF_VSOCK:
1. Check presence of virtio-vsock device in VM or hotplug it.
2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
and then mount.

The user does not need to take any action inside the guest.
Non-technical users can share files without even knowing what NFS is.

There are too many scenarios where guest administrator action is
required with NFS over TCP/IP. We can't tell them "don't do that"
because it makes this feature unreliable.

Today we ask users to set up NFS or CIFS themselves. In many cases that
is inconvenient and an easy file sharing feature would be much better.

Stefan

2017-09-22 09:56:06

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

Hi,


On 21/09/17 18:00, Stefan Hajnoczi wrote:
> On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
>> On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
>>> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
>>>>> On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
>>>>> VSOCK requires no guest configuration, it won't be broken accidentally
>>>>> by NetworkManager (or equivalent), it won't be mistakenly blocked by
>>>>> guest admin/OS adding "deny all" default firewall policy. Similar
>>>>> applies on the host side, and since there's separation from IP networking,
>>>>> there is no possibility of the guest ever getting a channel out to the
>>>>> LAN, even if the host is mis-configurated.
>>>> We don't seem to have configuration fragility problems with other
>>>> deployments that scale horizontally.
>>>>
>>>> IMO you should focus on making IP reliable rather than trying to
>>>> move familiar IP-based services to other network fabrics.
>>> I don't see that ever happening, except in a scenario where a single
>>> org is in tight control of the whole stack (host & guest), which is
>>> not the case for cloud in general - only some on-site clouds.
>> Can you elaborate?
>>
>> I think we're having trouble understanding why you can't just say "don't
>> do that" to someone whose guest configuration is interfering with the
>> network interface they need for NFS.
> Dan can add more information on the OpenStack use case, but your
> question is equally relevant to the other use case I mentioned - easy
> file sharing between host and guest.
>
> Management tools like virt-manager (https://virt-manager.org/) should
> support a "share directory with VM" feature. The user chooses a
> directory on the host, a mount point inside the guest, and then clicks
> OK. The directory should appear inside the guest.
>
> VMware, VirtualBox, etc have had file sharing for a long time. It's a
> standard feature.
>
> Here is how to implement it using AF_VSOCK:
> 1. Check presence of virtio-vsock device in VM or hotplug it.
> 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
> 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
> and then mount.
>
> The user does not need to take any action inside the guest.
> Non-technical users can share files without even knowing what NFS is.
>
> There are too many scenarios where guest administrator action is
> required with NFS over TCP/IP. We can't tell them "don't do that"
> because it makes this feature unreliable.
>
> Today we ask users to set up NFS or CIFS themselves. In many cases that
> is inconvenient and an easy file sharing feature would be much better.
>
> Stefan
>

I don't think we should give up on making NFS easy to use with TCP/IP in
such situations. With IPv6 we could have (for example) a device with a
well known link-local address at the host end, and an automatically
allocated link-local address at the guest end. In other words the same
as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the
remainder of the NFS config steps would be identical to those you've
outlined with VSOCK above.

Creating a (virtual) network device which is restricted to host/guest
communication and automatically configures itself should be a lot less
work than adding a whole new protocol to NFS I think. It could also be
used for many other use cases too, as well as giving the choice between
NFS and CIFS. So it is much more flexible, and should be quicker to
implement too,

Steve.


2017-09-22 11:32:38

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, 2017-09-22 at 10:55 +0100, Steven Whitehouse wrote:
> Hi,
>
>
> On 21/09/17 18:00, Stefan Hajnoczi wrote:
> > On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
> > > On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
> > > > On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
> > > > > > On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
> > > > > > VSOCK requires no guest configuration, it won't be broken accidentally
> > > > > > by NetworkManager (or equivalent), it won't be mistakenly blocked by
> > > > > > guest admin/OS adding "deny all" default firewall policy. Similar
> > > > > > applies on the host side, and since there's separation from IP networking,
> > > > > > there is no possibility of the guest ever getting a channel out to the
> > > > > > LAN, even if the host is mis-configurated.
> > > > >
> > > > > We don't seem to have configuration fragility problems with other
> > > > > deployments that scale horizontally.
> > > > >
> > > > > IMO you should focus on making IP reliable rather than trying to
> > > > > move familiar IP-based services to other network fabrics.
> > > >
> > > > I don't see that ever happening, except in a scenario where a single
> > > > org is in tight control of the whole stack (host & guest), which is
> > > > not the case for cloud in general - only some on-site clouds.
> > >
> > > Can you elaborate?
> > >
> > > I think we're having trouble understanding why you can't just say "don't
> > > do that" to someone whose guest configuration is interfering with the
> > > network interface they need for NFS.
> >
> > Dan can add more information on the OpenStack use case, but your
> > question is equally relevant to the other use case I mentioned - easy
> > file sharing between host and guest.
> >
> > Management tools like virt-manager (https://virt-manager.org/) should
> > support a "share directory with VM" feature. The user chooses a
> > directory on the host, a mount point inside the guest, and then clicks
> > OK. The directory should appear inside the guest.
> >
> > VMware, VirtualBox, etc have had file sharing for a long time. It's a
> > standard feature.
> >
> > Here is how to implement it using AF_VSOCK:
> > 1. Check presence of virtio-vsock device in VM or hotplug it.
> > 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
> > 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
> > and then mount.
> >
> > The user does not need to take any action inside the guest.
> > Non-technical users can share files without even knowing what NFS is.
> >
> > There are too many scenarios where guest administrator action is
> > required with NFS over TCP/IP. We can't tell them "don't do that"
> > because it makes this feature unreliable.
> >
> > Today we ask users to set up NFS or CIFS themselves. In many cases that
> > is inconvenient and an easy file sharing feature would be much better.
> >
> > Stefan
> >
>
> I don't think we should give up on making NFS easy to use with TCP/IP in
> such situations. With IPv6 we could have (for example) a device with a
> well known link-local address at the host end, and an automatically
> allocated link-local address at the guest end. In other words the same
> as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the
> remainder of the NFS config steps would be identical to those you've
> outlined with VSOCK above.
>
> Creating a (virtual) network device which is restricted to host/guest
> communication and automatically configures itself should be a lot less
> work than adding a whole new protocol to NFS I think. It could also be
> used for many other use cases too, as well as giving the choice between
> NFS and CIFS. So it is much more flexible, and should be quicker to
> implement too,
>

FWIW, I'm also intrigued by Chuck's AF_LOCAL proposition. What about
this idea:

Make a filesystem (or a pair of filesystems) that could be mounted on
host and guest. Application running on host creates a unix socket in
there, and it shows up on the guest's filesystem. The sockets use a
virtio backend to shuffle data around.

That seems like it could be very useful.
--
Jeff Layton <[email protected]>

2017-09-22 11:43:30

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 22, 2017, at 5:55 AM, Steven Whitehouse <[email protected]> wrote:
>
> Hi,
>
>
> On 21/09/17 18:00, Stefan Hajnoczi wrote:
>> On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
>>> On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
>>>> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
>>>>>> On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
>>>>>> VSOCK requires no guest configuration, it won't be broken accidentally
>>>>>> by NetworkManager (or equivalent), it won't be mistakenly blocked by
>>>>>> guest admin/OS adding "deny all" default firewall policy. Similar
>>>>>> applies on the host side, and since there's separation from IP networking,
>>>>>> there is no possibility of the guest ever getting a channel out to the
>>>>>> LAN, even if the host is mis-configurated.
>>>>> We don't seem to have configuration fragility problems with other
>>>>> deployments that scale horizontally.
>>>>>
>>>>> IMO you should focus on making IP reliable rather than trying to
>>>>> move familiar IP-based services to other network fabrics.
>>>> I don't see that ever happening, except in a scenario where a single
>>>> org is in tight control of the whole stack (host & guest), which is
>>>> not the case for cloud in general - only some on-site clouds.
>>> Can you elaborate?
>>>
>>> I think we're having trouble understanding why you can't just say "don't
>>> do that" to someone whose guest configuration is interfering with the
>>> network interface they need for NFS.
>> Dan can add more information on the OpenStack use case, but your
>> question is equally relevant to the other use case I mentioned - easy
>> file sharing between host and guest.
>>
>> Management tools like virt-manager (https://virt-manager.org/) should
>> support a "share directory with VM" feature. The user chooses a
>> directory on the host, a mount point inside the guest, and then clicks
>> OK. The directory should appear inside the guest.
>>
>> VMware, VirtualBox, etc have had file sharing for a long time. It's a
>> standard feature.
>>
>> Here is how to implement it using AF_VSOCK:
>> 1. Check presence of virtio-vsock device in VM or hotplug it.
>> 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
>> 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
>> and then mount.
>>
>> The user does not need to take any action inside the guest.
>> Non-technical users can share files without even knowing what NFS is.
>>
>> There are too many scenarios where guest administrator action is
>> required with NFS over TCP/IP. We can't tell them "don't do that"
>> because it makes this feature unreliable.
>>
>> Today we ask users to set up NFS or CIFS themselves. In many cases that
>> is inconvenient and an easy file sharing feature would be much better.
>>
>> Stefan
>>
>
> I don't think we should give up on making NFS easy to use with TCP/IP in such situations. With IPv6 we could have (for example) a device with a well known link-local address at the host end, and an automatically allocated link-local address at the guest end. In other words the same as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the remainder of the NFS config steps would be identical to those you've outlined with VSOCK above.
>
> Creating a (virtual) network device which is restricted to host/guest communication and automatically configures itself should be a lot less work than adding a whole new protocol to NFS I think. It could also be used for many other use cases too, as well as giving the choice between NFS and CIFS. So it is much more flexible, and should be quicker to implement too,

I agree. IMO mechanisms already exist to handle a self-configuring
NFS mount. Use IPv6 link-local and the automounter and an entry in
/etc/hosts. Done, and no-one even had to type "mount".

If firewall configuration is a chronic problem, let's address that.
That would have broader benefits than adding VSOCK to NFS.

As for NFSROOT, it occurred to me there might be another interesting
problem: VSOCK supports only NFSv4. Does the Linux NFS client support
NFSROOT on NFSv4 yet?


--
Chuck Lever




2017-09-22 11:55:33

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 22, 2017 at 07:43:39AM -0400, Chuck Lever wrote:
>
> > On Sep 22, 2017, at 5:55 AM, Steven Whitehouse <[email protected]> wrote:
> >
> > Hi,
> >
> >
> > On 21/09/17 18:00, Stefan Hajnoczi wrote:
> >> On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
> >>> On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
> >>>> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
> >>>>>> On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
> >>>>>> VSOCK requires no guest configuration, it won't be broken accidentally
> >>>>>> by NetworkManager (or equivalent), it won't be mistakenly blocked by
> >>>>>> guest admin/OS adding "deny all" default firewall policy. Similar
> >>>>>> applies on the host side, and since there's separation from IP networking,
> >>>>>> there is no possibility of the guest ever getting a channel out to the
> >>>>>> LAN, even if the host is mis-configurated.
> >>>>> We don't seem to have configuration fragility problems with other
> >>>>> deployments that scale horizontally.
> >>>>>
> >>>>> IMO you should focus on making IP reliable rather than trying to
> >>>>> move familiar IP-based services to other network fabrics.
> >>>> I don't see that ever happening, except in a scenario where a single
> >>>> org is in tight control of the whole stack (host & guest), which is
> >>>> not the case for cloud in general - only some on-site clouds.
> >>> Can you elaborate?
> >>>
> >>> I think we're having trouble understanding why you can't just say "don't
> >>> do that" to someone whose guest configuration is interfering with the
> >>> network interface they need for NFS.
> >> Dan can add more information on the OpenStack use case, but your
> >> question is equally relevant to the other use case I mentioned - easy
> >> file sharing between host and guest.
> >>
> >> Management tools like virt-manager (https://virt-manager.org/) should
> >> support a "share directory with VM" feature. The user chooses a
> >> directory on the host, a mount point inside the guest, and then clicks
> >> OK. The directory should appear inside the guest.
> >>
> >> VMware, VirtualBox, etc have had file sharing for a long time. It's a
> >> standard feature.
> >>
> >> Here is how to implement it using AF_VSOCK:
> >> 1. Check presence of virtio-vsock device in VM or hotplug it.
> >> 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
> >> 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
> >> and then mount.
> >>
> >> The user does not need to take any action inside the guest.
> >> Non-technical users can share files without even knowing what NFS is.
> >>
> >> There are too many scenarios where guest administrator action is
> >> required with NFS over TCP/IP. We can't tell them "don't do that"
> >> because it makes this feature unreliable.
> >>
> >> Today we ask users to set up NFS or CIFS themselves. In many cases that
> >> is inconvenient and an easy file sharing feature would be much better.
> >>
> >> Stefan
> >>
> >
> > I don't think we should give up on making NFS easy to use with TCP/IP in such situations. With IPv6 we could have (for example) a device with a well known link-local address at the host end, and an automatically allocated link-local address at the guest end. In other words the same as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the remainder of the NFS config steps would be identical to those you've outlined with VSOCK above.
> >
> > Creating a (virtual) network device which is restricted to host/guest communication and automatically configures itself should be a lot less work than adding a whole new protocol to NFS I think. It could also be used for many other use cases too, as well as giving the choice between NFS and CIFS. So it is much more flexible, and should be quicker to implement too,
>
> I agree. IMO mechanisms already exist to handle a self-configuring
> NFS mount. Use IPv6 link-local and the automounter and an entry in
> /etc/hosts. Done, and no-one even had to type "mount".
>
> If firewall configuration is a chronic problem, let's address that.

This just isn't practical in the general case. Even on a single Linux OS
distro there are multiple ways to manage firewalls (Fedora as a static
init script, or firewalld, and many users invent their own personal way
of doing it). There are countless other OS, many closed source with 3rd
party firewall products in use. And then there are the firewall policies
defined by organization's IT departments that mandate particular ways of
doing things with layers of approval to go through to get changes made.

IOW, while improving firewall configuraiton is a worthy goal, it isn't
a substitute for host<->guest file system sharing over a non-network
based transport.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-22 12:00:18

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 22, 2017, at 7:55 AM, Daniel P. Berrange <[email protected]> wrote:
>
> On Fri, Sep 22, 2017 at 07:43:39AM -0400, Chuck Lever wrote:
>>
>>> On Sep 22, 2017, at 5:55 AM, Steven Whitehouse <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>>
>>> On 21/09/17 18:00, Stefan Hajnoczi wrote:
>>>> On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
>>>>> On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
>>>>>> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
>>>>>>>> On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
>>>>>>>> VSOCK requires no guest configuration, it won't be broken accidentally
>>>>>>>> by NetworkManager (or equivalent), it won't be mistakenly blocked by
>>>>>>>> guest admin/OS adding "deny all" default firewall policy. Similar
>>>>>>>> applies on the host side, and since there's separation from IP networking,
>>>>>>>> there is no possibility of the guest ever getting a channel out to the
>>>>>>>> LAN, even if the host is mis-configurated.
>>>>>>> We don't seem to have configuration fragility problems with other
>>>>>>> deployments that scale horizontally.
>>>>>>>
>>>>>>> IMO you should focus on making IP reliable rather than trying to
>>>>>>> move familiar IP-based services to other network fabrics.
>>>>>> I don't see that ever happening, except in a scenario where a single
>>>>>> org is in tight control of the whole stack (host & guest), which is
>>>>>> not the case for cloud in general - only some on-site clouds.
>>>>> Can you elaborate?
>>>>>
>>>>> I think we're having trouble understanding why you can't just say "don't
>>>>> do that" to someone whose guest configuration is interfering with the
>>>>> network interface they need for NFS.
>>>> Dan can add more information on the OpenStack use case, but your
>>>> question is equally relevant to the other use case I mentioned - easy
>>>> file sharing between host and guest.
>>>>
>>>> Management tools like virt-manager (https://virt-manager.org/) should
>>>> support a "share directory with VM" feature. The user chooses a
>>>> directory on the host, a mount point inside the guest, and then clicks
>>>> OK. The directory should appear inside the guest.
>>>>
>>>> VMware, VirtualBox, etc have had file sharing for a long time. It's a
>>>> standard feature.
>>>>
>>>> Here is how to implement it using AF_VSOCK:
>>>> 1. Check presence of virtio-vsock device in VM or hotplug it.
>>>> 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
>>>> 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
>>>> and then mount.
>>>>
>>>> The user does not need to take any action inside the guest.
>>>> Non-technical users can share files without even knowing what NFS is.
>>>>
>>>> There are too many scenarios where guest administrator action is
>>>> required with NFS over TCP/IP. We can't tell them "don't do that"
>>>> because it makes this feature unreliable.
>>>>
>>>> Today we ask users to set up NFS or CIFS themselves. In many cases that
>>>> is inconvenient and an easy file sharing feature would be much better.
>>>>
>>>> Stefan
>>>>
>>>
>>> I don't think we should give up on making NFS easy to use with TCP/IP in such situations. With IPv6 we could have (for example) a device with a well known link-local address at the host end, and an automatically allocated link-local address at the guest end. In other words the same as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the remainder of the NFS config steps would be identical to those you've outlined with VSOCK above.
>>>
>>> Creating a (virtual) network device which is restricted to host/guest communication and automatically configures itself should be a lot less work than adding a whole new protocol to NFS I think. It could also be used for many other use cases too, as well as giving the choice between NFS and CIFS. So it is much more flexible, and should be quicker to implement too,
>>
>> I agree. IMO mechanisms already exist to handle a self-configuring
>> NFS mount. Use IPv6 link-local and the automounter and an entry in
>> /etc/hosts. Done, and no-one even had to type "mount".
>>
>> If firewall configuration is a chronic problem, let's address that.
>
> This just isn't practical in the general case. Even on a single Linux OS
> distro there are multiple ways to manage firewalls (Fedora as a static
> init script, or firewalld, and many users invent their own personal way
> of doing it). There are countless other OS, many closed source with 3rd
> party firewall products in use. And then there are the firewall policies
> defined by organization's IT departments that mandate particular ways of
> doing things with layers of approval to go through to get changes made.
>
> IOW, while improving firewall configuraiton is a worthy goal, it isn't
> a substitute for host<->guest file system sharing over a non-network
> based transport.

OK, but how do you expect to get support for NFS/VSOCK into all
these 3rd party environments?


--
Chuck Lever




2017-09-22 12:08:03

by Matt Benjamin

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

This version of AF_LOCAL just looks like VSOCK and vhost-vsock, by
another name. E.g., it apparently hard-wires VSOCK's host-guest
communication restriction even more strongly. What are its intrinsic
advantages?

Matt

On Fri, Sep 22, 2017 at 7:32 AM, Jeff Layton <[email protected]> wrote:
> On Fri, 2017-09-22 at 10:55 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 21/09/17 18:00, Stefan Hajnoczi wrote:
>> > On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
>> > > On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
>> > > > On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
>> > > > > > On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
>> > > > > > VSOCK requires no guest configuration, it won't be broken accidentally
>> > > > > > by NetworkManager (or equivalent), it won't be mistakenly blocked by
>> > > > > > guest admin/OS adding "deny all" default firewall policy. Similar
>> > > > > > applies on the host side, and since there's separation from IP networking,
>> > > > > > there is no possibility of the guest ever getting a channel out to the
>> > > > > > LAN, even if the host is mis-configurated.
>> > > > >
>> > > > > We don't seem to have configuration fragility problems with other
>> > > > > deployments that scale horizontally.
>> > > > >
>> > > > > IMO you should focus on making IP reliable rather than trying to
>> > > > > move familiar IP-based services to other network fabrics.
>> > > >
>> > > > I don't see that ever happening, except in a scenario where a single
>> > > > org is in tight control of the whole stack (host & guest), which is
>> > > > not the case for cloud in general - only some on-site clouds.
>> > >
>> > > Can you elaborate?
>> > >
>> > > I think we're having trouble understanding why you can't just say "don't
>> > > do that" to someone whose guest configuration is interfering with the
>> > > network interface they need for NFS.
>> >
>> > Dan can add more information on the OpenStack use case, but your
>> > question is equally relevant to the other use case I mentioned - easy
>> > file sharing between host and guest.
>> >
>> > Management tools like virt-manager (https://virt-manager.org/) should
>> > support a "share directory with VM" feature. The user chooses a
>> > directory on the host, a mount point inside the guest, and then clicks
>> > OK. The directory should appear inside the guest.
>> >
>> > VMware, VirtualBox, etc have had file sharing for a long time. It's a
>> > standard feature.
>> >
>> > Here is how to implement it using AF_VSOCK:
>> > 1. Check presence of virtio-vsock device in VM or hotplug it.
>> > 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
>> > 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
>> > and then mount.
>> >
>> > The user does not need to take any action inside the guest.
>> > Non-technical users can share files without even knowing what NFS is.
>> >
>> > There are too many scenarios where guest administrator action is
>> > required with NFS over TCP/IP. We can't tell them "don't do that"
>> > because it makes this feature unreliable.
>> >
>> > Today we ask users to set up NFS or CIFS themselves. In many cases that
>> > is inconvenient and an easy file sharing feature would be much better.
>> >
>> > Stefan
>> >
>>
>> I don't think we should give up on making NFS easy to use with TCP/IP in
>> such situations. With IPv6 we could have (for example) a device with a
>> well known link-local address at the host end, and an automatically
>> allocated link-local address at the guest end. In other words the same
>> as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the
>> remainder of the NFS config steps would be identical to those you've
>> outlined with VSOCK above.
>>
>> Creating a (virtual) network device which is restricted to host/guest
>> communication and automatically configures itself should be a lot less
>> work than adding a whole new protocol to NFS I think. It could also be
>> used for many other use cases too, as well as giving the choice between
>> NFS and CIFS. So it is much more flexible, and should be quicker to
>> implement too,
>>
>
> FWIW, I'm also intrigued by Chuck's AF_LOCAL proposition. What about
> this idea:
>
> Make a filesystem (or a pair of filesystems) that could be mounted on
> host and guest. Application running on host creates a unix socket in
> there, and it shows up on the guest's filesystem. The sockets use a
> virtio backend to shuffle data around.
>
> That seems like it could be very useful.
> --
> Jeff Layton <[email protected]>



--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309

2017-09-22 12:10:48

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 22, 2017 at 08:00:11AM -0400, Chuck Lever wrote:
>
> > On Sep 22, 2017, at 7:55 AM, Daniel P. Berrange <[email protected]> wrote:
> >
> > On Fri, Sep 22, 2017 at 07:43:39AM -0400, Chuck Lever wrote:
> >>
> >>> On Sep 22, 2017, at 5:55 AM, Steven Whitehouse <[email protected]> wrote:
> >>>
> >>> Hi,
> >>>
> >>>
> >>> On 21/09/17 18:00, Stefan Hajnoczi wrote:
> >>>> On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
> >>>>> On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
> >>>>>> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
> >>>>>>>> On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
> >>>>>>>> VSOCK requires no guest configuration, it won't be broken accidentally
> >>>>>>>> by NetworkManager (or equivalent), it won't be mistakenly blocked by
> >>>>>>>> guest admin/OS adding "deny all" default firewall policy. Similar
> >>>>>>>> applies on the host side, and since there's separation from IP networking,
> >>>>>>>> there is no possibility of the guest ever getting a channel out to the
> >>>>>>>> LAN, even if the host is mis-configurated.
> >>>>>>> We don't seem to have configuration fragility problems with other
> >>>>>>> deployments that scale horizontally.
> >>>>>>>
> >>>>>>> IMO you should focus on making IP reliable rather than trying to
> >>>>>>> move familiar IP-based services to other network fabrics.
> >>>>>> I don't see that ever happening, except in a scenario where a single
> >>>>>> org is in tight control of the whole stack (host & guest), which is
> >>>>>> not the case for cloud in general - only some on-site clouds.
> >>>>> Can you elaborate?
> >>>>>
> >>>>> I think we're having trouble understanding why you can't just say "don't
> >>>>> do that" to someone whose guest configuration is interfering with the
> >>>>> network interface they need for NFS.
> >>>> Dan can add more information on the OpenStack use case, but your
> >>>> question is equally relevant to the other use case I mentioned - easy
> >>>> file sharing between host and guest.
> >>>>
> >>>> Management tools like virt-manager (https://virt-manager.org/) should
> >>>> support a "share directory with VM" feature. The user chooses a
> >>>> directory on the host, a mount point inside the guest, and then clicks
> >>>> OK. The directory should appear inside the guest.
> >>>>
> >>>> VMware, VirtualBox, etc have had file sharing for a long time. It's a
> >>>> standard feature.
> >>>>
> >>>> Here is how to implement it using AF_VSOCK:
> >>>> 1. Check presence of virtio-vsock device in VM or hotplug it.
> >>>> 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
> >>>> 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
> >>>> and then mount.
> >>>>
> >>>> The user does not need to take any action inside the guest.
> >>>> Non-technical users can share files without even knowing what NFS is.
> >>>>
> >>>> There are too many scenarios where guest administrator action is
> >>>> required with NFS over TCP/IP. We can't tell them "don't do that"
> >>>> because it makes this feature unreliable.
> >>>>
> >>>> Today we ask users to set up NFS or CIFS themselves. In many cases that
> >>>> is inconvenient and an easy file sharing feature would be much better.
> >>>>
> >>>> Stefan
> >>>>
> >>>
> >>> I don't think we should give up on making NFS easy to use with TCP/IP in such situations. With IPv6 we could have (for example) a device with a well known link-local address at the host end, and an automatically allocated link-local address at the guest end. In other words the same as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the remainder of the NFS config steps would be identical to those you've outlined with VSOCK above.
> >>>
> >>> Creating a (virtual) network device which is restricted to host/guest communication and automatically configures itself should be a lot less work than adding a whole new protocol to NFS I think. It could also be used for many other use cases too, as well as giving the choice between NFS and CIFS. So it is much more flexible, and should be quicker to implement too,
> >>
> >> I agree. IMO mechanisms already exist to handle a self-configuring
> >> NFS mount. Use IPv6 link-local and the automounter and an entry in
> >> /etc/hosts. Done, and no-one even had to type "mount".
> >>
> >> If firewall configuration is a chronic problem, let's address that.
> >
> > This just isn't practical in the general case. Even on a single Linux OS
> > distro there are multiple ways to manage firewalls (Fedora as a static
> > init script, or firewalld, and many users invent their own personal way
> > of doing it). There are countless other OS, many closed source with 3rd
> > party firewall products in use. And then there are the firewall policies
> > defined by organization's IT departments that mandate particular ways of
> > doing things with layers of approval to go through to get changes made.
> >
> > IOW, while improving firewall configuraiton is a worthy goal, it isn't
> > a substitute for host<->guest file system sharing over a non-network
> > based transport.
>
> OK, but how do you expect to get support for NFS/VSOCK into all
> these 3rd party environments?

Assuming it is accepted as a feature in NFS server, support in Linux would
flow into the distros as the vendors either updated to newer version, or
backport the feature. There would need to be support written for other
commonly used OS (which basically means Windows) too over time, which
would probablybe made available via the existing KVM Windows guest
support ISO (which provides paravirt devices for disk, net, balloon,
etc, etc).

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-22 12:26:42

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

I'm not sure there is a strong one. I most just thought it sounded like
a possible solution here.

There's already a standard in place for doing RPC over AF_LOCAL, so
there's less work to be done there. We also already have AF_LOCAL
transport in the kernel (mostly for talking to rpcbind), so there's
helps reduce the maintenance burden there.

It utilizes something that looks like a traditional unix socket, which
may make it easier to alter other applications to use it.

There's also a clear way to "firewall" this -- just don't mount hvsockfs
(or whatever), or don't build it into the kernel. No filesystem, no
sockets.

I'm not sure I'd agree about this being more restrictive, necessarily.
If we did this, you could envision eventually building something that
looks like this to a running host, but where the remote end is something
else entirely. Whether that's truly useful, IDK...

-- Jeff

On Fri, 2017-09-22 at 08:08 -0400, Matt Benjamin wrote:
> This version of AF_LOCAL just looks like VSOCK and vhost-vsock, by
> another name. E.g., it apparently hard-wires VSOCK's host-guest
> communication restriction even more strongly. What are its intrinsic
> advantages?
>
> Matt
>
> On Fri, Sep 22, 2017 at 7:32 AM, Jeff Layton <[email protected]> wrote:
> > On Fri, 2017-09-22 at 10:55 +0100, Steven Whitehouse wrote:
> > > Hi,
> > >
> > >
> > > On 21/09/17 18:00, Stefan Hajnoczi wrote:
> > > > On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
> > > > > On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
> > > > > > On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
> > > > > > > > On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
> > > > > > > > VSOCK requires no guest configuration, it won't be broken accidentally
> > > > > > > > by NetworkManager (or equivalent), it won't be mistakenly blocked by
> > > > > > > > guest admin/OS adding "deny all" default firewall policy. Similar
> > > > > > > > applies on the host side, and since there's separation from IP networking,
> > > > > > > > there is no possibility of the guest ever getting a channel out to the
> > > > > > > > LAN, even if the host is mis-configurated.
> > > > > > >
> > > > > > > We don't seem to have configuration fragility problems with other
> > > > > > > deployments that scale horizontally.
> > > > > > >
> > > > > > > IMO you should focus on making IP reliable rather than trying to
> > > > > > > move familiar IP-based services to other network fabrics.
> > > > > >
> > > > > > I don't see that ever happening, except in a scenario where a single
> > > > > > org is in tight control of the whole stack (host & guest), which is
> > > > > > not the case for cloud in general - only some on-site clouds.
> > > > >
> > > > > Can you elaborate?
> > > > >
> > > > > I think we're having trouble understanding why you can't just say "don't
> > > > > do that" to someone whose guest configuration is interfering with the
> > > > > network interface they need for NFS.
> > > >
> > > > Dan can add more information on the OpenStack use case, but your
> > > > question is equally relevant to the other use case I mentioned - easy
> > > > file sharing between host and guest.
> > > >
> > > > Management tools like virt-manager (https://virt-manager.org/) should
> > > > support a "share directory with VM" feature. The user chooses a
> > > > directory on the host, a mount point inside the guest, and then clicks
> > > > OK. The directory should appear inside the guest.
> > > >
> > > > VMware, VirtualBox, etc have had file sharing for a long time. It's a
> > > > standard feature.
> > > >
> > > > Here is how to implement it using AF_VSOCK:
> > > > 1. Check presence of virtio-vsock device in VM or hotplug it.
> > > > 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
> > > > 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
> > > > and then mount.
> > > >
> > > > The user does not need to take any action inside the guest.
> > > > Non-technical users can share files without even knowing what NFS is.
> > > >
> > > > There are too many scenarios where guest administrator action is
> > > > required with NFS over TCP/IP. We can't tell them "don't do that"
> > > > because it makes this feature unreliable.
> > > >
> > > > Today we ask users to set up NFS or CIFS themselves. In many cases that
> > > > is inconvenient and an easy file sharing feature would be much better.
> > > >
> > > > Stefan
> > > >
> > >
> > > I don't think we should give up on making NFS easy to use with TCP/IP in
> > > such situations. With IPv6 we could have (for example) a device with a
> > > well known link-local address at the host end, and an automatically
> > > allocated link-local address at the guest end. In other words the same
> > > as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the
> > > remainder of the NFS config steps would be identical to those you've
> > > outlined with VSOCK above.
> > >
> > > Creating a (virtual) network device which is restricted to host/guest
> > > communication and automatically configures itself should be a lot less
> > > work than adding a whole new protocol to NFS I think. It could also be
> > > used for many other use cases too, as well as giving the choice between
> > > NFS and CIFS. So it is much more flexible, and should be quicker to
> > > implement too,
> > >
> >
> > FWIW, I'm also intrigued by Chuck's AF_LOCAL proposition. What about
> > this idea:
> >
> > Make a filesystem (or a pair of filesystems) that could be mounted on
> > host and guest. Application running on host creates a unix socket in
> > there, and it shows up on the guest's filesystem. The sockets use a
> > virtio backend to shuffle data around.
> >
> > That seems like it could be very useful.
> > --
> > Jeff Layton <[email protected]>
>
>
>

--
Jeff Layton <[email protected]>

2017-09-22 15:28:57

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 22, 2017 at 08:26:39AM -0400, Jeff Layton wrote:
> I'm not sure there is a strong one. I most just thought it sounded like
> a possible solution here.
>
> There's already a standard in place for doing RPC over AF_LOCAL, so
> there's less work to be done there. We also already have AF_LOCAL
> transport in the kernel (mostly for talking to rpcbind), so there's
> helps reduce the maintenance burden there.
>
> It utilizes something that looks like a traditional unix socket, which
> may make it easier to alter other applications to use it.
>
> There's also a clear way to "firewall" this -- just don't mount hvsockfs
> (or whatever), or don't build it into the kernel. No filesystem, no
> sockets.
>
> I'm not sure I'd agree about this being more restrictive, necessarily.
> If we did this, you could envision eventually building something that
> looks like this to a running host, but where the remote end is something
> else entirely. Whether that's truly useful, IDK...

This approach where communications channels appear on the file system is
similar to the existing virtio-serial device. The guest driver creates
a character device for each serial communications channel configured on
the host. It's a character device node though and not a UNIX domain
socket.

One of the main reasons for adding virtio-vsock was to get native
Sockets API communications that most applications expect (including
NFS!). Serial char device semantics are awkward.

Sticking with AF_LOCAL for a moment, another approach is for AF_VSOCK
tunnel to the NFS traffic:

(host)# vsock-proxy-daemon --unix-domain-socket path/to/local.sock
--listen --port 2049
(host)# nfsd --local path/to/local.sock ...

(guest)# vsock-proxy-daemon --unix-domain-socket path/to/local.sock
--cid 2 --port 2049
(guest)# mount -t nfs -o proto=local path/to/local.sock /mnt

It has drawbacks over native AF_VSOCK support:

1. Certain NFS protocol features become impossible to implement since
there is no meaningful address information that can be exchanged
between client and server (e.g. separate backchannel connection,
pNFS, etc). Are you sure AF_LOCAL makes sense for NFS?

2. Performance is worse due to extra proxy daemon.

If I understand correctly both Linux and nfs-utils lack NFS AF_LOCAL
support although it is present in sunrpc. For example, today
fs/nfsd/nfsctl.c cannot add UNIX domain sockets. Similarly, the
nfs-utils nsfd program has no command-line syntax for UNIX domain
sockets.

Funnily enough making AF_LOCAL work for NFS requires similar changes to
the patches I've posted for AF_VSOCK. I think AF_LOCAL tunnelling is a
technically inferior solution than native AF_VSOCK support (for the
reasons mentioned above), but I appreciate that it insulates NFS from
AF_VSOCK specifics and could be used in other use cases too.

Can someone with more knowledge than myself confirm that NFS over
AF_LOCAL would actually work? I thought the ability to exchange
addressing information across RPC was quite important for the NFS
protocol.

Stefan

2017-09-22 16:23:32

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 22, 2017 at 04:28:55PM +0100, Stefan Hajnoczi wrote:
> On Fri, Sep 22, 2017 at 08:26:39AM -0400, Jeff Layton wrote:
> > I'm not sure there is a strong one. I most just thought it sounded like
> > a possible solution here.
> >
> > There's already a standard in place for doing RPC over AF_LOCAL, so
> > there's less work to be done there. We also already have AF_LOCAL
> > transport in the kernel (mostly for talking to rpcbind), so there's
> > helps reduce the maintenance burden there.
> >
> > It utilizes something that looks like a traditional unix socket, which
> > may make it easier to alter other applications to use it.
> >
> > There's also a clear way to "firewall" this -- just don't mount hvsockfs
> > (or whatever), or don't build it into the kernel. No filesystem, no
> > sockets.
> >
> > I'm not sure I'd agree about this being more restrictive, necessarily.
> > If we did this, you could envision eventually building something that
> > looks like this to a running host, but where the remote end is something
> > else entirely. Whether that's truly useful, IDK...
>
> This approach where communications channels appear on the file system is
> similar to the existing virtio-serial device. The guest driver creates
> a character device for each serial communications channel configured on
> the host. It's a character device node though and not a UNIX domain
> socket.
>
> One of the main reasons for adding virtio-vsock was to get native
> Sockets API communications that most applications expect (including
> NFS!). Serial char device semantics are awkward.
>
> Sticking with AF_LOCAL for a moment, another approach is for AF_VSOCK
> tunnel to the NFS traffic:
>
> (host)# vsock-proxy-daemon --unix-domain-socket path/to/local.sock
> --listen --port 2049
> (host)# nfsd --local path/to/local.sock ...
>
> (guest)# vsock-proxy-daemon --unix-domain-socket path/to/local.sock
> --cid 2 --port 2049
> (guest)# mount -t nfs -o proto=local path/to/local.sock /mnt
>
> It has drawbacks over native AF_VSOCK support:
>
> 1. Certain NFS protocol features become impossible to implement since
> there is no meaningful address information that can be exchanged
> between client and server (e.g. separate backchannel connection,
> pNFS, etc). Are you sure AF_LOCAL makes sense for NFS?
>
> 2. Performance is worse due to extra proxy daemon.
>
> If I understand correctly both Linux and nfs-utils lack NFS AF_LOCAL
> support although it is present in sunrpc. For example, today
> fs/nfsd/nfsctl.c cannot add UNIX domain sockets. Similarly, the
> nfs-utils nsfd program has no command-line syntax for UNIX domain
> sockets.
>
> Funnily enough making AF_LOCAL work for NFS requires similar changes to
> the patches I've posted for AF_VSOCK. I think AF_LOCAL tunnelling is a
> technically inferior solution than native AF_VSOCK support (for the
> reasons mentioned above), but I appreciate that it insulates NFS from
> AF_VSOCK specifics and could be used in other use cases too.

In the virt world using AF_LOCAL would be less portable than AF_VSOCK,
because AF_VSOCK is a technology implemented by both VMWare and KVM,
whereas an AF_LOCAL approach would likely be KVM only. In practice it
probably doesn't matter, since I doubt VMWare would end up using
NFS over AF_VSOCK, but conceptually I think AF_VSOCK makes more sense
for a virt scenario.

Using AF_LOCAL would not be solving the hard problems for virt like
migration either - it would just be hiding them under the carpet
and pretending they don't exist. Again preferrable to actually use
AF_VSOCK and define what the expected semantics are for migration.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-22 18:32:05

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 22, 2017, at 12:23 PM, Daniel P. Berrange <[email protected]> wrote:
>
> On Fri, Sep 22, 2017 at 04:28:55PM +0100, Stefan Hajnoczi wrote:
>> On Fri, Sep 22, 2017 at 08:26:39AM -0400, Jeff Layton wrote:
>>> I'm not sure there is a strong one. I most just thought it sounded like
>>> a possible solution here.
>>>
>>> There's already a standard in place for doing RPC over AF_LOCAL, so
>>> there's less work to be done there. We also already have AF_LOCAL
>>> transport in the kernel (mostly for talking to rpcbind), so there's
>>> helps reduce the maintenance burden there.
>>>
>>> It utilizes something that looks like a traditional unix socket, which
>>> may make it easier to alter other applications to use it.
>>>
>>> There's also a clear way to "firewall" this -- just don't mount hvsockfs
>>> (or whatever), or don't build it into the kernel. No filesystem, no
>>> sockets.
>>>
>>> I'm not sure I'd agree about this being more restrictive, necessarily.
>>> If we did this, you could envision eventually building something that
>>> looks like this to a running host, but where the remote end is something
>>> else entirely. Whether that's truly useful, IDK...
>>
>> This approach where communications channels appear on the file system is
>> similar to the existing virtio-serial device. The guest driver creates
>> a character device for each serial communications channel configured on
>> the host. It's a character device node though and not a UNIX domain
>> socket.
>>
>> One of the main reasons for adding virtio-vsock was to get native
>> Sockets API communications that most applications expect (including
>> NFS!). Serial char device semantics are awkward.
>>
>> Sticking with AF_LOCAL for a moment, another approach is for AF_VSOCK
>> tunnel to the NFS traffic:
>>
>> (host)# vsock-proxy-daemon --unix-domain-socket path/to/local.sock
>> --listen --port 2049
>> (host)# nfsd --local path/to/local.sock ...
>>
>> (guest)# vsock-proxy-daemon --unix-domain-socket path/to/local.sock
>> --cid 2 --port 2049
>> (guest)# mount -t nfs -o proto=local path/to/local.sock /mnt
>>
>> It has drawbacks over native AF_VSOCK support:
>>
>> 1. Certain NFS protocol features become impossible to implement since
>> there is no meaningful address information that can be exchanged
>> between client and server (e.g. separate backchannel connection,
>> pNFS, etc). Are you sure AF_LOCAL makes sense for NFS?
>>
>> 2. Performance is worse due to extra proxy daemon.
>>
>> If I understand correctly both Linux and nfs-utils lack NFS AF_LOCAL
>> support although it is present in sunrpc. For example, today
>> fs/nfsd/nfsctl.c cannot add UNIX domain sockets. Similarly, the
>> nfs-utils nsfd program has no command-line syntax for UNIX domain
>> sockets.
>>
>> Funnily enough making AF_LOCAL work for NFS requires similar changes to
>> the patches I've posted for AF_VSOCK. I think AF_LOCAL tunnelling is a
>> technically inferior solution than native AF_VSOCK support (for the
>> reasons mentioned above), but I appreciate that it insulates NFS from
>> AF_VSOCK specifics and could be used in other use cases too.
>
> In the virt world using AF_LOCAL would be less portable than AF_VSOCK,
> because AF_VSOCK is a technology implemented by both VMWare and KVM,
> whereas an AF_LOCAL approach would likely be KVM only.

Is there a standard that defines the AF_VSOCK symbolic name and
reserves a numeric value for it that can be used in a sockaddr?


> In practice it
> probably doesn't matter, since I doubt VMWare would end up using
> NFS over AF_VSOCK, but conceptually I think AF_VSOCK makes more sense
> for a virt scenario.
>
> Using AF_LOCAL would not be solving the hard problems for virt like
> migration either - it would just be hiding them under the carpet
> and pretending they don't exist. Again preferrable to actually use
> AF_VSOCK and define what the expected semantics are for migration.

There's no hiding or carpets. We're just reviewing the various
alternatives. AF_LOCAL has the same challenges as AF_VSOCK, as I've
said in the past, except that it already has well-defined semantics,
and it can be used in other environments besides host-guest.


--
Chuck Lever




2017-09-22 19:14:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 22, 2017 at 12:55:24PM +0100, Daniel P. Berrange wrote:
> On Fri, Sep 22, 2017 at 07:43:39AM -0400, Chuck Lever wrote:
> > If firewall configuration is a chronic problem, let's address that.
>
> This just isn't practical in the general case. Even on a single Linux OS
> distro there are multiple ways to manage firewalls (Fedora as a static
> init script, or firewalld, and many users invent their own personal way
> of doing it). There are countless other OS, many closed source with 3rd
> party firewall products in use. And then there are the firewall policies
> defined by organization's IT departments that mandate particular ways of
> doing things with layers of approval to go through to get changes made.
>
> IOW, while improving firewall configuraiton is a worthy goal, it isn't
> a substitute for host<->guest file system sharing over a non-network
> based transport.

I guess what's confusing to me is you're already depending on a ton of
assumptions about the guest:

- it has to be running a recent kernel with NFS/VSOCK support.
- it has to have all the nfs-utils userspace stuff, a
/usr/bin/mount that works the way you expect, and an
/etc/nfsmount.conf that doesn't have any odd options.
- it has to have a suitable mount point somewhere that the admin
knows about.
- probably lots of other stuff

It's odd that the firewall configuration is the one step too far.

As long as we've got all these requirements on guests, is there no
chance we could add a requirement like "if you want shared filesystems,
outbound tcp connections to port 2049 must be permitted on interface
vhost0". ?

--b.

2017-09-25 08:15:02

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 22, 2017 at 02:31:56PM -0400, Chuck Lever wrote:
>
> > On Sep 22, 2017, at 12:23 PM, Daniel P. Berrange <[email protected]> wrote:
> >
> > On Fri, Sep 22, 2017 at 04:28:55PM +0100, Stefan Hajnoczi wrote:
> >> On Fri, Sep 22, 2017 at 08:26:39AM -0400, Jeff Layton wrote:
> >>> I'm not sure there is a strong one. I most just thought it sounded like
> >>> a possible solution here.
> >>>
> >>> There's already a standard in place for doing RPC over AF_LOCAL, so
> >>> there's less work to be done there. We also already have AF_LOCAL
> >>> transport in the kernel (mostly for talking to rpcbind), so there's
> >>> helps reduce the maintenance burden there.
> >>>
> >>> It utilizes something that looks like a traditional unix socket, which
> >>> may make it easier to alter other applications to use it.
> >>>
> >>> There's also a clear way to "firewall" this -- just don't mount hvsockfs
> >>> (or whatever), or don't build it into the kernel. No filesystem, no
> >>> sockets.
> >>>
> >>> I'm not sure I'd agree about this being more restrictive, necessarily.
> >>> If we did this, you could envision eventually building something that
> >>> looks like this to a running host, but where the remote end is something
> >>> else entirely. Whether that's truly useful, IDK...
> >>
> >> This approach where communications channels appear on the file system is
> >> similar to the existing virtio-serial device. The guest driver creates
> >> a character device for each serial communications channel configured on
> >> the host. It's a character device node though and not a UNIX domain
> >> socket.
> >>
> >> One of the main reasons for adding virtio-vsock was to get native
> >> Sockets API communications that most applications expect (including
> >> NFS!). Serial char device semantics are awkward.
> >>
> >> Sticking with AF_LOCAL for a moment, another approach is for AF_VSOCK
> >> tunnel to the NFS traffic:
> >>
> >> (host)# vsock-proxy-daemon --unix-domain-socket path/to/local.sock
> >> --listen --port 2049
> >> (host)# nfsd --local path/to/local.sock ...
> >>
> >> (guest)# vsock-proxy-daemon --unix-domain-socket path/to/local.sock
> >> --cid 2 --port 2049
> >> (guest)# mount -t nfs -o proto=local path/to/local.sock /mnt
> >>
> >> It has drawbacks over native AF_VSOCK support:
> >>
> >> 1. Certain NFS protocol features become impossible to implement since
> >> there is no meaningful address information that can be exchanged
> >> between client and server (e.g. separate backchannel connection,
> >> pNFS, etc). Are you sure AF_LOCAL makes sense for NFS?
> >>
> >> 2. Performance is worse due to extra proxy daemon.
> >>
> >> If I understand correctly both Linux and nfs-utils lack NFS AF_LOCAL
> >> support although it is present in sunrpc. For example, today
> >> fs/nfsd/nfsctl.c cannot add UNIX domain sockets. Similarly, the
> >> nfs-utils nsfd program has no command-line syntax for UNIX domain
> >> sockets.
> >>
> >> Funnily enough making AF_LOCAL work for NFS requires similar changes to
> >> the patches I've posted for AF_VSOCK. I think AF_LOCAL tunnelling is a
> >> technically inferior solution than native AF_VSOCK support (for the
> >> reasons mentioned above), but I appreciate that it insulates NFS from
> >> AF_VSOCK specifics and could be used in other use cases too.
> >
> > In the virt world using AF_LOCAL would be less portable than AF_VSOCK,
> > because AF_VSOCK is a technology implemented by both VMWare and KVM,
> > whereas an AF_LOCAL approach would likely be KVM only.
>
> Is there a standard that defines the AF_VSOCK symbolic name and
> reserves a numeric value for it that can be used in a sockaddr?

The VMWare documentation for this feature is located here - they call
the overall feature either "vSockets" or "VMCI sockets":

https://code.vmware.com/web/sdk/65/vmci-socket

In those code examples you'l see they never refer to AF_VSOCK directly
because the code is written from POV of a Windows developer, and IIUC
VMWare could not define a static AF_VSOCK constant for Windows. Instead
they show use of a function VMCISock_GetAFValue().

When VMWare implemented this for Linux they defined the AF_VSOCK constant,
and introduced the 'sockaddr_vm' struct for address info

commit d021c344051af91f42c5ba9fdedc176740cbd238
Author: Andy King <[email protected]>
Date: Wed Feb 6 14:23:56 2013 +0000

VSOCK: Introduce VM Sockets

VM Sockets allows communication between virtual machines and the hypervisor.
User level applications both in a virtual machine and on the host can use the
VM Sockets API, which facilitates fast and efficient communication between
guest virtual machines and their host. A socket address family, designed to be
compatible with UDP and TCP at the interface level, is provided.

Today, VM Sockets is used by various VMware Tools components inside the guest
for zero-config, network-less access to VMware host services. In addition to
this, VMware's users are using VM Sockets for various applications, where
network access of the virtual machine is restricted or non-existent. Examples
of this are VMs communicating with device proxies for proprietary hardware
running as host applications and automated testing of applications running
within virtual machines.

The VMware VM Sockets are similar to other socket types, like Berkeley UNIX
socket interface. The VM Sockets module supports both connection-oriented
stream sockets like TCP, and connectionless datagram sockets like UDP. The VM
Sockets protocol family is defined as "AF_VSOCK" and the socket operations
split for SOCK_DGRAM and SOCK_STREAM.

For additional information about the use of VM Sockets, please refer to the
VM Sockets Programming Guide available at:

https://www.vmware.com/support/developer/vmci-sdk/

Signed-off-by: George Zhang <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Andy king <[email protected]>
Signed-off-by: David S. Miller <[email protected]>


The KVM implementation was merged last year in Linux, simply using the
existing vSockets spec, but with a data transport using virtio. So any
standards that exist are actally those defined by VMWare - the only KVM
part is the use of virtio as a transport.

> > In practice it
> > probably doesn't matter, since I doubt VMWare would end up using
> > NFS over AF_VSOCK, but conceptually I think AF_VSOCK makes more sense
> > for a virt scenario.
> >
> > Using AF_LOCAL would not be solving the hard problems for virt like
> > migration either - it would just be hiding them under the carpet
> > and pretending they don't exist. Again preferrable to actually use
> > AF_VSOCK and define what the expected semantics are for migration.
>
> There's no hiding or carpets. We're just reviewing the various
> alternatives. AF_LOCAL has the same challenges as AF_VSOCK, as I've
> said in the past, except that it already has well-defined semantics,
> and it can be used in other environments besides host-guest.

The existing usage / other environments have no concept of migration, so
there is no defined behaviour for AF_LOCAL wrt guest migration. So my point
was that to use AF_LOCAL would be explicitly deciding to ignore the problem
of migration. Unless we define new semantics for AF_LOCAL wrt to migration,
in the same way we'd have to define those semantics for AF_VSOCK.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-25 08:30:28

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 22, 2017 at 03:14:57PM -0400, J. Bruce Fields wrote:
> On Fri, Sep 22, 2017 at 12:55:24PM +0100, Daniel P. Berrange wrote:
> > On Fri, Sep 22, 2017 at 07:43:39AM -0400, Chuck Lever wrote:
> > > If firewall configuration is a chronic problem, let's address that.
> >
> > This just isn't practical in the general case. Even on a single Linux OS
> > distro there are multiple ways to manage firewalls (Fedora as a static
> > init script, or firewalld, and many users invent their own personal way
> > of doing it). There are countless other OS, many closed source with 3rd
> > party firewall products in use. And then there are the firewall policies
> > defined by organization's IT departments that mandate particular ways of
> > doing things with layers of approval to go through to get changes made.
> >
> > IOW, while improving firewall configuraiton is a worthy goal, it isn't
> > a substitute for host<->guest file system sharing over a non-network
> > based transport.
>
> I guess what's confusing to me is you're already depending on a ton of
> assumptions about the guest:
>
> - it has to be running a recent kernel with NFS/VSOCK support.
> - it has to have all the nfs-utils userspace stuff, a
> /usr/bin/mount that works the way you expect, and an
> /etc/nfsmount.conf that doesn't have any odd options.
> - it has to have a suitable mount point somewhere that the admin
> knows about.
> - probably lots of other stuff
>
> It's odd that the firewall configuration is the one step too far.

The key factor is considering which pieces are liable to significant or
complex interactions with other usage of the OS, and are thus liable to
be accidentally misconfigured or at risk of breaking during usage. The
configuration of network interfaces and firewalls is very major risk
area compared to the other pre-requisites.

Providing a kernel/userspace with the feature is taken care of by the
distro vendor and OS admins can't break this unless they go out of their
way to prevent loading of the kernel modules which is not a likely
scenario. Making a mount point is straightforward and not something
that other services in the system are liable to break. Potentially the
mount point creation can be either baked into the guest OS pre-built
disk image, or populated by metadata from another source. The nfsmount.conf
options are a possible source of concern, but IIUC, anything set there is
possible to override via explicit args to mount.

The way in which network interfaces are configured though is a major
source of complexity & unknowns because it is not something the distro
vendor just defines once. There are countless different tools to
configure network interfaces on Linux alone, and many permutations of
how the actual interfaces / routing are setup. Firewall setup is a
similar place of complexity & unknowns, because not only are there many
different tools to manage it, but you get well into the realm of policy
defined by the organizations deploying it. Expecting things to "just work"
in this area is just unrealistic. It is a big part of why virtualization
platforms all provide dedicated paravirtualized devices for communication
between host and guest, that is independant of networking.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-25 10:31:55

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support


> On Sep 25, 2017, at 4:14 AM, Daniel P. Berrange <[email protected]> wrote:
>
> On Fri, Sep 22, 2017 at 02:31:56PM -0400, Chuck Lever wrote:
>>
>>> On Sep 22, 2017, at 12:23 PM, Daniel P. Berrange <[email protected]> wrote:
>>>
>>> In practice it
>>> probably doesn't matter, since I doubt VMWare would end up using
>>> NFS over AF_VSOCK, but conceptually I think AF_VSOCK makes more sense
>>> for a virt scenario.
>>>
>>> Using AF_LOCAL would not be solving the hard problems for virt like
>>> migration either - it would just be hiding them under the carpet
>>> and pretending they don't exist. Again preferrable to actually use
>>> AF_VSOCK and define what the expected semantics are for migration.
>>
>> There's no hiding or carpets. We're just reviewing the various
>> alternatives. AF_LOCAL has the same challenges as AF_VSOCK, as I've
>> said in the past, except that it already has well-defined semantics,
>> and it can be used in other environments besides host-guest.
>
> The existing usage / other environments have no concept of migration, so
> there is no defined behaviour for AF_LOCAL wrt guest migration.

That is correct, and also true for all other current RPC
transports.


> So my point
> was that to use AF_LOCAL would be explicitly deciding to ignore the problem
> of migration.

That is nonsense, because all other current RPC transports also
do not support live guest migration, because that set of issues
has to do with NFS, not with RPC.

Further, AFAIK, the use of AF_LOCAL does not force any a priori
decision about whether live guest migration with NFS can be
supported. The question of live guest migration support is
orthogonal to the choice of RPC transport.


> Unless we define new semantics for AF_LOCAL wrt to migration,
> in the same way we'd have to define those semantics for AF_VSOCK.

Perhaps you misunderstood what I meant above by "already has
well-defined semantics". I simply meant that, unlike RPC on
AF_VSOCK currently, RPC on AF_LOCAL is well-defined and already
in use for some RPC programs. I am restating what Jeff Layton
already said in earlier e-mail. I was not making any claim about
whether live guest migration can be supported with NFS on an
AF_LOCAL transport.


--
Chuck Lever




2017-09-26 02:08:19

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Fri, Sep 22 2017, Daniel P. Berrange wrote:

> On Fri, Sep 22, 2017 at 07:43:39AM -0400, Chuck Lever wrote:
>>
>> > On Sep 22, 2017, at 5:55 AM, Steven Whitehouse <[email protected]> wrote:
>> >
>> > Hi,
>> >
>> >
>> > On 21/09/17 18:00, Stefan Hajnoczi wrote:
>> >> On Tue, Sep 19, 2017 at 01:24:52PM -0400, J. Bruce Fields wrote:
>> >>> On Tue, Sep 19, 2017 at 05:44:27PM +0100, Daniel P. Berrange wrote:
>> >>>> On Tue, Sep 19, 2017 at 11:48:10AM -0400, Chuck Lever wrote:
>> >>>>>> On Sep 19, 2017, at 11:10 AM, Daniel P. Berrange <[email protected]> wrote:
>> >>>>>> VSOCK requires no guest configuration, it won't be broken accidentally
>> >>>>>> by NetworkManager (or equivalent), it won't be mistakenly blocked by
>> >>>>>> guest admin/OS adding "deny all" default firewall policy. Similar
>> >>>>>> applies on the host side, and since there's separation from IP networking,
>> >>>>>> there is no possibility of the guest ever getting a channel out to the
>> >>>>>> LAN, even if the host is mis-configurated.
>> >>>>> We don't seem to have configuration fragility problems with other
>> >>>>> deployments that scale horizontally.
>> >>>>>
>> >>>>> IMO you should focus on making IP reliable rather than trying to
>> >>>>> move familiar IP-based services to other network fabrics.
>> >>>> I don't see that ever happening, except in a scenario where a single
>> >>>> org is in tight control of the whole stack (host & guest), which is
>> >>>> not the case for cloud in general - only some on-site clouds.
>> >>> Can you elaborate?
>> >>>
>> >>> I think we're having trouble understanding why you can't just say "don't
>> >>> do that" to someone whose guest configuration is interfering with the
>> >>> network interface they need for NFS.
>> >> Dan can add more information on the OpenStack use case, but your
>> >> question is equally relevant to the other use case I mentioned - easy
>> >> file sharing between host and guest.
>> >>
>> >> Management tools like virt-manager (https://virt-manager.org/) should
>> >> support a "share directory with VM" feature. The user chooses a
>> >> directory on the host, a mount point inside the guest, and then clicks
>> >> OK. The directory should appear inside the guest.
>> >>
>> >> VMware, VirtualBox, etc have had file sharing for a long time. It's a
>> >> standard feature.
>> >>
>> >> Here is how to implement it using AF_VSOCK:
>> >> 1. Check presence of virtio-vsock device in VM or hotplug it.
>> >> 2. Export directory from host NFS server (nfs-ganesha, nfsd, etc).
>> >> 3. Send qemu-guest-agent command to (optionally) add /etc/fstab entry
>> >> and then mount.
>> >>
>> >> The user does not need to take any action inside the guest.
>> >> Non-technical users can share files without even knowing what NFS is.
>> >>
>> >> There are too many scenarios where guest administrator action is
>> >> required with NFS over TCP/IP. We can't tell them "don't do that"
>> >> because it makes this feature unreliable.
>> >>
>> >> Today we ask users to set up NFS or CIFS themselves. In many cases that
>> >> is inconvenient and an easy file sharing feature would be much better.
>> >>
>> >> Stefan
>> >>
>> >
>> > I don't think we should give up on making NFS easy to use with TCP/IP in such situations. With IPv6 we could have (for example) a device with a well known link-local address at the host end, and an automatically allocated link-local address at the guest end. In other words the same as VSOCK, but with IPv6 rather than VSOCK addresses. At that point the remainder of the NFS config steps would be identical to those you've outlined with VSOCK above.
>> >
>> > Creating a (virtual) network device which is restricted to host/guest communication and automatically configures itself should be a lot less work than adding a whole new protocol to NFS I think. It could also be used for many other use cases too, as well as giving the choice between NFS and CIFS. So it is much more flexible, and should be quicker to implement too,
>>
>> I agree. IMO mechanisms already exist to handle a self-configuring
>> NFS mount. Use IPv6 link-local and the automounter and an entry in
>> /etc/hosts. Done, and no-one even had to type "mount".
>>
>> If firewall configuration is a chronic problem, let's address that.
>
> This just isn't practical in the general case. Even on a single Linux OS
> distro there are multiple ways to manage firewalls (Fedora as a static
> init script, or firewalld, and many users invent their own personal way
> of doing it). There are countless other OS, many closed source with 3rd
> party firewall products in use. And then there are the firewall policies
> defined by organization's IT departments that mandate particular ways of
> doing things with layers of approval to go through to get changes made.
>
> IOW, while improving firewall configuraiton is a worthy goal, it isn't
> a substitute for host<->guest file system sharing over a non-network
> based transport.

I don't find this argument at all convincing.

The main selling point for VSOCK seems to be that it bypassed all
firewalls. i.e. traffic through a VSOCK is immune to any netfilter
settings.

If "traffic immune to any netfilter settings" is a useful thing (and I'm
quite willing to believe that it is), then let's focus on that goal. It
might be useful well beyond NFS and VMs.

How about a flag for a network interface which says "disable netfilter"??

I really don't know enough about netfilter to have any idea of this is
possible. If it is, then it seems like a good path to a general
solution.
If it isn't, then understanding why might help us move forward some
other way.

Rather than a flag, it might work to use network namespaces.
Very early in the init sequence the filesystem gets mounted using the
IPv6 link-local address on a client->host interface, and then a new
network namespace is created which does not include that interface, and
which everything else including firewall code runs in. Maybe.

NeilBrown


Attachments:
signature.asc (832.00 B)

2017-09-26 03:40:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
> On Fri, Sep 22 2017, Daniel P. Berrange wrote:
> > This just isn't practical in the general case. Even on a single Linux OS
> > distro there are multiple ways to manage firewalls (Fedora as a static
> > init script, or firewalld, and many users invent their own personal way
> > of doing it). There are countless other OS, many closed source with 3rd
> > party firewall products in use. And then there are the firewall policies
> > defined by organization's IT departments that mandate particular ways of
> > doing things with layers of approval to go through to get changes made.
> >
> > IOW, while improving firewall configuraiton is a worthy goal, it isn't
> > a substitute for host<->guest file system sharing over a non-network
> > based transport.
>
> I don't find this argument at all convincing.
>
> The main selling point for VSOCK seems to be that it bypassed all
> firewalls. i.e. traffic through a VSOCK is immune to any netfilter
> settings.
>
> If "traffic immune to any netfilter settings" is a useful thing (and I'm
> quite willing to believe that it is), then let's focus on that goal. It
> might be useful well beyond NFS and VMs.
>
> How about a flag for a network interface which says "disable netfilter"??

Sounds like network configuration tools in general are a problem, so
it'd probably need to disable more than just netfilter.

But if iptable configuration and other operations start failing or
behaving strangely on one special interface, I'd worry that stuff will
break.

So I think you'd really want the interface completely hidden.

Except that you still want to configure the interface yourself using the
usual interfaces. So:

> Rather than a flag, it might work to use network namespaces.
> Very early in the init sequence the filesystem gets mounted using the
> IPv6 link-local address on a client->host interface, and then a new
> network namespace is created which does not include that interface, and
> which everything else including firewall code runs in. Maybe.

That seems closer, since it allows you to hide the interface from most
of the guest while letting some special software--qemu guest agent?--
still work with it. That agent would also need to be the one to do the
mount, and would need to be able to make that mount usable to the rest
of the guest.

Sounds doable to me?

There's still the problem of the paranoid security bureaucracy.

It should be pretty easy to demonstrate that the host only allows
point-to-point traffic on these interfaces. I'd hope that that, plus
the appeal of the feature, would be enough to win out in the end. This
is not a class of problem that I have experience dealing with, though!

--b.

2017-09-26 10:56:28

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Mon, Sep 25, 2017 at 11:40:26PM -0400, J. Bruce Fields wrote:
> On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
> > On Fri, Sep 22 2017, Daniel P. Berrange wrote:
> > Rather than a flag, it might work to use network namespaces.
> > Very early in the init sequence the filesystem gets mounted using the
> > IPv6 link-local address on a client->host interface, and then a new
> > network namespace is created which does not include that interface, and
> > which everything else including firewall code runs in. Maybe.
>
> That seems closer, since it allows you to hide the interface from most
> of the guest while letting some special software--qemu guest agent?--
> still work with it. That agent would also need to be the one to do the
> mount, and would need to be able to make that mount usable to the rest
> of the guest.
>
> Sounds doable to me?
>
> There's still the problem of the paranoid security bureaucracy.
>
> It should be pretty easy to demonstrate that the host only allows
> point-to-point traffic on these interfaces. I'd hope that that, plus
> the appeal of the feature, would be enough to win out in the end. This
> is not a class of problem that I have experience dealing with, though!

Programs wishing to use host<->guest networking might still need the
main network namespace for UNIX domain sockets and other communication.

For example, the QEMU guest agent has a command to report the IP
addresses of the guest. It must access the main network namespace to
collect this information while using a host<->guest socket to
communicate with the hypervisor.

I think this can be achieved as follows:
1. open /proc/self/ns/net (stash the file descriptor)
2. open /var/run/netns/hvnet & call setns(2) to switch namespaces
3. socket(AF_INET6, SOCK_STREAM, 0) to create host<->guest socket
4. call setns(2) to switch back to main namespace

In other words, the program stays mostly in the main network namespace
and only enters the host<->guest namespace to create sockets.

setns(2) with a network namespace requires CAP_SYS_ADMIN so it's not
very practical.

Is there an alternative that makes using the host<->guest network
namespace less clunky?

Stefan

2017-09-26 11:07:40

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 26, 2017 at 11:56:26AM +0100, Stefan Hajnoczi wrote:
> On Mon, Sep 25, 2017 at 11:40:26PM -0400, J. Bruce Fields wrote:
> > On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
> > > On Fri, Sep 22 2017, Daniel P. Berrange wrote:
> > > Rather than a flag, it might work to use network namespaces.
> > > Very early in the init sequence the filesystem gets mounted using the
> > > IPv6 link-local address on a client->host interface, and then a new
> > > network namespace is created which does not include that interface, and
> > > which everything else including firewall code runs in. Maybe.
> >
> > That seems closer, since it allows you to hide the interface from most
> > of the guest while letting some special software--qemu guest agent?--
> > still work with it. That agent would also need to be the one to do the
> > mount, and would need to be able to make that mount usable to the rest
> > of the guest.
> >
> > Sounds doable to me?
> >
> > There's still the problem of the paranoid security bureaucracy.
> >
> > It should be pretty easy to demonstrate that the host only allows
> > point-to-point traffic on these interfaces. I'd hope that that, plus
> > the appeal of the feature, would be enough to win out in the end. This
> > is not a class of problem that I have experience dealing with, though!
>
> Programs wishing to use host<->guest networking might still need the
> main network namespace for UNIX domain sockets and other communication.
>
> For example, the QEMU guest agent has a command to report the IP
> addresses of the guest. It must access the main network namespace to
> collect this information while using a host<->guest socket to
> communicate with the hypervisor.
>
> I think this can be achieved as follows:
> 1. open /proc/self/ns/net (stash the file descriptor)
> 2. open /var/run/netns/hvnet & call setns(2) to switch namespaces
> 3. socket(AF_INET6, SOCK_STREAM, 0) to create host<->guest socket
> 4. call setns(2) to switch back to main namespace
>
> In other words, the program stays mostly in the main network namespace
> and only enters the host<->guest namespace to create sockets.
>
> setns(2) with a network namespace requires CAP_SYS_ADMIN so it's not
> very practical.

This is also a Linux only solution - it doesn't do anything to help us with
supporting the feature on *BSD, Windows, and such changes are harder to
backport to existing Linux guest OS. Having to play all these games to design
the applications using this "just work" does not compare favourably with
AF_VSOCK which is on a par with UNIX domain sockets in terms of simplicitly
of use.

There's also still the complexity on the host side where we have to setup
firewalling to both ensure that these extra NICs can't be used to access
to the LAN, as well as providing rules to ensure we don't get fooled by
guest doing IP address spoofing. Again VSOCK is preferrable here since
by design the data channel source address is trustworthy without needing
any special protection.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2017-09-26 13:39:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Mon, Sep 25, 2017 at 11:40:26PM -0400, J. Bruce Fields wrote:
> On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
> > Rather than a flag, it might work to use network namespaces.
> > Very early in the init sequence the filesystem gets mounted using the
> > IPv6 link-local address on a client->host interface, and then a new
> > network namespace is created which does not include that interface, and
> > which everything else including firewall code runs in. Maybe.
>
> That seems closer, since it allows you to hide the interface from most
> of the guest while letting some special software--qemu guest agent?--
> still work with it. That agent would also need to be the one to do the
> mount, and would need to be able to make that mount usable to the rest
> of the guest.

On the other hand, you're not *really* hiding it--system software in the
guest can certainly find the interface if it wants to. I don't know if
that's likely to cause any trouble in practice.

The same is true of VSOCK, I suppose. But VSOCK being designed
specifically for host<->guest communications, anyone monkeying with it
knows what they're doing and is responsible for the consequences, in a
way which someone dealing with ordinary network interfaces and
namespaces isn't.

--b.

2017-09-26 13:42:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

By the way, do we know anything about likely performance of NFS/VSOCK?

--b.

2017-09-26 18:32:05

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 26, 2017 at 11:56:26AM +0100, Stefan Hajnoczi wrote:
> On Mon, Sep 25, 2017 at 11:40:26PM -0400, J. Bruce Fields wrote:
> > On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
> > > On Fri, Sep 22 2017, Daniel P. Berrange wrote:
> > > Rather than a flag, it might work to use network namespaces.
> > > Very early in the init sequence the filesystem gets mounted using the
> > > IPv6 link-local address on a client->host interface, and then a new
> > > network namespace is created which does not include that interface, and
> > > which everything else including firewall code runs in. Maybe.
> >
> > That seems closer, since it allows you to hide the interface from most
> > of the guest while letting some special software--qemu guest agent?--
> > still work with it. That agent would also need to be the one to do the
> > mount, and would need to be able to make that mount usable to the rest
> > of the guest.
> >
> > Sounds doable to me?
> >
> > There's still the problem of the paranoid security bureaucracy.
> >
> > It should be pretty easy to demonstrate that the host only allows
> > point-to-point traffic on these interfaces. I'd hope that that, plus
> > the appeal of the feature, would be enough to win out in the end. This
> > is not a class of problem that I have experience dealing with, though!
>
> Programs wishing to use host<->guest networking might still need the
> main network namespace for UNIX domain sockets and other communication.
>
> For example, the QEMU guest agent has a command to report the IP
> addresses of the guest. It must access the main network namespace to
> collect this information while using a host<->guest socket to
> communicate with the hypervisor.
>
> I think this can be achieved as follows:
> 1. open /proc/self/ns/net (stash the file descriptor)
> 2. open /var/run/netns/hvnet & call setns(2) to switch namespaces
> 3. socket(AF_INET6, SOCK_STREAM, 0) to create host<->guest socket
> 4. call setns(2) to switch back to main namespace
>
> In other words, the program stays mostly in the main network namespace
> and only enters the host<->guest namespace to create sockets.

Sounds like it would work. Or use two communicating processes, one in
each namespace?

> setns(2) with a network namespace requires CAP_SYS_ADMIN so it's not
> very practical.

The guest agent will need root to do NFS mounts.

> Is there an alternative that makes using the host<->guest network
> namespace less clunky?

You'll have to define "clunky" for us....

--b.

2017-09-27 00:45:32

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 26 2017, Stefan Hajnoczi wrote:

> On Mon, Sep 25, 2017 at 11:40:26PM -0400, J. Bruce Fields wrote:
>> On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
>> > On Fri, Sep 22 2017, Daniel P. Berrange wrote:
>> > Rather than a flag, it might work to use network namespaces.
>> > Very early in the init sequence the filesystem gets mounted using the
>> > IPv6 link-local address on a client->host interface, and then a new
>> > network namespace is created which does not include that interface, and
>> > which everything else including firewall code runs in. Maybe.
>>
>> That seems closer, since it allows you to hide the interface from most
>> of the guest while letting some special software--qemu guest agent?--
>> still work with it. That agent would also need to be the one to do the
>> mount, and would need to be able to make that mount usable to the rest
>> of the guest.
>>
>> Sounds doable to me?
>>
>> There's still the problem of the paranoid security bureaucracy.
>>
>> It should be pretty easy to demonstrate that the host only allows
>> point-to-point traffic on these interfaces. I'd hope that that, plus
>> the appeal of the feature, would be enough to win out in the end. This
>> is not a class of problem that I have experience dealing with, though!
>
> Programs wishing to use host<->guest networking might still need the
> main network namespace for UNIX domain sockets and other
> communication.

Did I miss something.... the whole premise of this work seems to be that
programs (nfs in particular) cannot rely on host<->guest networking
because some rogue firewall might interfere with it, but now you say
that some programs might rely on it....

However I think you missed the important point - maybe I didn't explain
it clearly.

My idea is that the "root" network namespace is only available in early
boot. An NFS mount happens then (and possibly a daemon hangs around in
this network namespace to refresh the NFS mount). A new network
namespace is created and *everthing*else* runs in that subordinate
namespace.

If you want host<->guest networking in this subordinate namespace you
are quite welcome to configure that - maybe a vethX interface which
bridges out to the host interface.
But the important point is that any iptables rules configured in the
subordinate namespace will not affect the primary namespace and so will
not hurt the NFS mount. They will be entirely local.

There should be no need to move between namespaces once they have been
set up.

NeilBrown


>
> For example, the QEMU guest agent has a command to report the IP
> addresses of the guest. It must access the main network namespace to
> collect this information while using a host<->guest socket to
> communicate with the hypervisor.
>
> I think this can be achieved as follows:
> 1. open /proc/self/ns/net (stash the file descriptor)
> 2. open /var/run/netns/hvnet & call setns(2) to switch namespaces
> 3. socket(AF_INET6, SOCK_STREAM, 0) to create host<->guest socket
> 4. call setns(2) to switch back to main namespace
>
> In other words, the program stays mostly in the main network namespace
> and only enters the host<->guest namespace to create sockets.
>
> setns(2) with a network namespace requires CAP_SYS_ADMIN so it's not
> very practical.
>
> Is there an alternative that makes using the host<->guest network
> namespace less clunky?
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Attachments:
signature.asc (832.00 B)

2017-09-27 12:23:01

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Tue, Sep 26, 2017 at 09:42:39AM -0400, J. Bruce Fields wrote:
> By the way, do we know anything about likely performance of NFS/VSOCK?

virtio-vsock is designed for reliable host<->guest communication, not
performance. It is not a fast-path to avoid Ethernet/IP. I haven't run
benchmarks on NFS over AF_VSOCK but don't expect its performance to set
it apart from virtio-net.

Stefan

2017-09-27 13:05:28

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 27, 2017 at 10:45:17AM +1000, NeilBrown wrote:
> On Tue, Sep 26 2017, Stefan Hajnoczi wrote:
>
> > On Mon, Sep 25, 2017 at 11:40:26PM -0400, J. Bruce Fields wrote:
> >> On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
> >> > On Fri, Sep 22 2017, Daniel P. Berrange wrote:
> >> > Rather than a flag, it might work to use network namespaces.
> >> > Very early in the init sequence the filesystem gets mounted using the
> >> > IPv6 link-local address on a client->host interface, and then a new
> >> > network namespace is created which does not include that interface, and
> >> > which everything else including firewall code runs in. Maybe.
> >>
> >> That seems closer, since it allows you to hide the interface from most
> >> of the guest while letting some special software--qemu guest agent?--
> >> still work with it. That agent would also need to be the one to do the
> >> mount, and would need to be able to make that mount usable to the rest
> >> of the guest.
> >>
> >> Sounds doable to me?
> >>
> >> There's still the problem of the paranoid security bureaucracy.
> >>
> >> It should be pretty easy to demonstrate that the host only allows
> >> point-to-point traffic on these interfaces. I'd hope that that, plus
> >> the appeal of the feature, would be enough to win out in the end. This
> >> is not a class of problem that I have experience dealing with, though!
> >
> > Programs wishing to use host<->guest networking might still need the
> > main network namespace for UNIX domain sockets and other
> > communication.
>
> Did I miss something.... the whole premise of this work seems to be that
> programs (nfs in particular) cannot rely on host<->guest networking
> because some rogue firewall might interfere with it, but now you say
> that some programs might rely on it....

Programs rely on IPC (e.g. UNIX domain sockets) and that's affected by
network namespace isolation. This is what I was interested in.

But I've checked that UNIX domain socket connect(2) works across network
namespaces for pathname sockets. The path to the socket file just needs
to be accessible via the file system.

> However I think you missed the important point - maybe I didn't explain
> it clearly.
>
> My idea is that the "root" network namespace is only available in early
> boot. An NFS mount happens then (and possibly a daemon hangs around in
> this network namespace to refresh the NFS mount). A new network
> namespace is created and *everthing*else* runs in that subordinate
> namespace.
>
> If you want host<->guest networking in this subordinate namespace you
> are quite welcome to configure that - maybe a vethX interface which
> bridges out to the host interface.
> But the important point is that any iptables rules configured in the
> subordinate namespace will not affect the primary namespace and so will
> not hurt the NFS mount. They will be entirely local.

Using the "root" (initial) network namespace is invasive. Hotplugged
NICs appear in the initial network netspace and interfaces move there if
a subordinate namespace is destroyed. Were you thinking of this
approach because it could share a single NIC (you mentioned bridging)?

Maybe it's best to leave the initial network namespace alone and instead
create a host<->guest namespace with a dedicated virtio-net NIC. That
way hotplug and network management continues to work as usual except
there is another namespace that contains a dedicated virtio-net NIC for
NFS and other host<->guest activity.

> There should be no need to move between namespaces once they have been
> set up.

If the namespace approach is better than AF_VSOCK, then it should work
for more use cases than just NFS. The QEMU Guest Agent was mentioned,
for example.

The guest agent needs to see the guest's network interfaces so it can
report the guest IP address. Therefore it needs access to both network
namespaces and I wondered what the cleanest way to do that was.

Stefan

2017-09-27 13:35:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 27, 2017 at 10:45:17AM +1000, NeilBrown wrote:
> My idea is that the "root" network namespace is only available in early
> boot. An NFS mount happens then (and possibly a daemon hangs around in
> this network namespace to refresh the NFS mount).

I think they also want to be able to do mounts after boot.

I assume you either keep the mount namespace shared, or use mount
propagation of some kind.

--b.

2017-09-27 13:46:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 27, 2017 at 01:22:58PM +0100, Stefan Hajnoczi wrote:
> On Tue, Sep 26, 2017 at 09:42:39AM -0400, J. Bruce Fields wrote:
> > By the way, do we know anything about likely performance of NFS/VSOCK?
>
> virtio-vsock is designed for reliable host<->guest communication, not
> performance. It is not a fast-path to avoid Ethernet/IP. I haven't run
> benchmarks on NFS over AF_VSOCK but don't expect its performance to set
> it apart from virtio-net.

OK.

But if we implement NFS/VSOCK and it turns out to be a success, I expect
people will start using it for things that weren't expected and
complaining about performance issues.

I guess I'm not too concerned about performance of the initial
implementation but it'd be nice to know that there's the possibility to
optimize later on.

But if our answer will be just to go figure out how to use a proper
NFS/TCP mount instead then I suppose that's OK.

--b.

2017-09-27 22:22:01

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 27 2017, Stefan Hajnoczi wrote:

> On Wed, Sep 27, 2017 at 10:45:17AM +1000, NeilBrown wrote:
>> On Tue, Sep 26 2017, Stefan Hajnoczi wrote:
>>
>> > On Mon, Sep 25, 2017 at 11:40:26PM -0400, J. Bruce Fields wrote:
>> >> On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
>> >> > On Fri, Sep 22 2017, Daniel P. Berrange wrote:
>> >> > Rather than a flag, it might work to use network namespaces.
>> >> > Very early in the init sequence the filesystem gets mounted using the
>> >> > IPv6 link-local address on a client->host interface, and then a new
>> >> > network namespace is created which does not include that interface, and
>> >> > which everything else including firewall code runs in. Maybe.
>> >>
>> >> That seems closer, since it allows you to hide the interface from most
>> >> of the guest while letting some special software--qemu guest agent?--
>> >> still work with it. That agent would also need to be the one to do the
>> >> mount, and would need to be able to make that mount usable to the rest
>> >> of the guest.
>> >>
>> >> Sounds doable to me?
>> >>
>> >> There's still the problem of the paranoid security bureaucracy.
>> >>
>> >> It should be pretty easy to demonstrate that the host only allows
>> >> point-to-point traffic on these interfaces. I'd hope that that, plus
>> >> the appeal of the feature, would be enough to win out in the end. This
>> >> is not a class of problem that I have experience dealing with, though!
>> >
>> > Programs wishing to use host<->guest networking might still need the
>> > main network namespace for UNIX domain sockets and other
>> > communication.
>>
>> Did I miss something.... the whole premise of this work seems to be that
>> programs (nfs in particular) cannot rely on host<->guest networking
>> because some rogue firewall might interfere with it, but now you say
>> that some programs might rely on it....
>
> Programs rely on IPC (e.g. UNIX domain sockets) and that's affected by
> network namespace isolation. This is what I was interested in.
>
> But I've checked that UNIX domain socket connect(2) works across network
> namespaces for pathname sockets. The path to the socket file just needs
> to be accessible via the file system.
>
>> However I think you missed the important point - maybe I didn't explain
>> it clearly.
>>
>> My idea is that the "root" network namespace is only available in early
>> boot. An NFS mount happens then (and possibly a daemon hangs around in
>> this network namespace to refresh the NFS mount). A new network
>> namespace is created and *everthing*else* runs in that subordinate
>> namespace.
>>
>> If you want host<->guest networking in this subordinate namespace you
>> are quite welcome to configure that - maybe a vethX interface which
>> bridges out to the host interface.
>> But the important point is that any iptables rules configured in the
>> subordinate namespace will not affect the primary namespace and so will
>> not hurt the NFS mount. They will be entirely local.
>
> Using the "root" (initial) network namespace is invasive. Hotplugged
> NICs appear in the initial network netspace and interfaces move there if
> a subordinate namespace is destroyed. Were you thinking of this
> approach because it could share a single NIC (you mentioned bridging)?

I was thinking of this approach because you appear to want isolation to
protect the NFS mount from random firewalls, and the general approach of
namespaces is to place the thing you want to contain (the firewall etc)
in a subordinate namespace.

However, if a different arrangement works better then a different
arrangement should be pursued. I knew nothing about network namespaces
until a couple of days ago, so I'm largely guessing.

The problem I assumed you would have with putting NFS in a subordinate
namespace is that the root namespace could still get in and mess it up,
whereas once you are in a subordinate namespace, I assume you cannot
get out (I assume that is part of the point). But maybe you can stop
processes from the root namespace getting in, or maybe you can choose
that that is not part of the threat scenario.

>
> Maybe it's best to leave the initial network namespace alone and instead
> create a host<->guest namespace with a dedicated virtio-net NIC. That
> way hotplug and network management continues to work as usual except
> there is another namespace that contains a dedicated virtio-net NIC for
> NFS and other host<->guest activity.

That probably makes sense.

>
>> There should be no need to move between namespaces once they have been
>> set up.
>
> If the namespace approach is better than AF_VSOCK, then it should work
> for more use cases than just NFS. The QEMU Guest Agent was mentioned,
> for example.

It appears that you have "trustworthy" services, such as NFS, which you
are confident will not break other services on the host, and
"untrustworthy" services, such as a firewall or network manager, which
might interfere negatively.

It makes sense to put all the trustworthy services in one network
namespace, and all the untrustworthy in the other.
Exactly how you arrange that depends on specific requirements. I
imagine you would start all the trustworthy services early, and then
close off their namespace from further access. Other arrangements are
certainly possible. Stepping back and forth between two namespaces
doesn't seem like the most elegant solution.

>
> The guest agent needs to see the guest's network interfaces so it can
> report the guest IP address. Therefore it needs access to both network
> namespaces and I wondered what the cleanest way to do that was.

There are several options. I cannot say which is the "cleanest", partly
because that is a subjective assessment.
Based on fairly shallow understanding of what the guest agent must do, I
would probably explore putting the main guest agent in the untrusted
namespace, with some sort of forwarding service in the trusted
namespace. The agent would talk to the forwarding service using
unix-domain sockets - possibly created with socketpair() very early so
they don't depend on any shared filesystem namespace (just incase that
gets broken).
I assume the guest agent doesn't require low-latency/high-bandwidth,
and so will not be adversely affected by a forwarding agent.

>
> Stefan

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2017-09-27 22:25:18

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 27 2017, J. Bruce Fields wrote:

> On Wed, Sep 27, 2017 at 10:45:17AM +1000, NeilBrown wrote:
>> My idea is that the "root" network namespace is only available in early
>> boot. An NFS mount happens then (and possibly a daemon hangs around in
>> this network namespace to refresh the NFS mount).
>
> I think they also want to be able to do mounts after boot.

Hence "a daemon hangs around ... to refresh the NFS mount" by which I
meant to imply the possibility of creating new mounts as well.

That may be unnecessary. It might be safe to allow processes to move
between the network namespace. We still don't have a clear statement of
the threat model and the degree of isolation that is required, so it is
hard to create concrete recommendations.

Thanks,
NeilBrown

>
> I assume you either keep the mount namespace shared, or use mount
> propagation of some kind.
>
> --b.


Attachments:
signature.asc (832.00 B)

2017-09-28 10:34:46

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Wed, Sep 27, 2017 at 09:46:34AM -0400, J. Bruce Fields wrote:
> On Wed, Sep 27, 2017 at 01:22:58PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Sep 26, 2017 at 09:42:39AM -0400, J. Bruce Fields wrote:
> > > By the way, do we know anything about likely performance of NFS/VSOCK?
> >
> > virtio-vsock is designed for reliable host<->guest communication, not
> > performance. It is not a fast-path to avoid Ethernet/IP. I haven't run
> > benchmarks on NFS over AF_VSOCK but don't expect its performance to set
> > it apart from virtio-net.
>
> OK.
>
> But if we implement NFS/VSOCK and it turns out to be a success, I expect
> people will start using it for things that weren't expected and
> complaining about performance issues.
>
> I guess I'm not too concerned about performance of the initial
> implementation but it'd be nice to know that there's the possibility to
> optimize later on.
>
> But if our answer will be just to go figure out how to use a proper
> NFS/TCP mount instead then I suppose that's OK.

Yes, virtio-vsock can be extended in the future for performance
optimizations.

Stefan

2017-09-28 10:44:46

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

On Thu, Sep 28, 2017 at 08:21:48AM +1000, NeilBrown wrote:
> On Wed, Sep 27 2017, Stefan Hajnoczi wrote:
>
> > On Wed, Sep 27, 2017 at 10:45:17AM +1000, NeilBrown wrote:
> >> On Tue, Sep 26 2017, Stefan Hajnoczi wrote:
> >>
> >> > On Mon, Sep 25, 2017 at 11:40:26PM -0400, J. Bruce Fields wrote:
> >> >> On Tue, Sep 26, 2017 at 12:08:07PM +1000, NeilBrown wrote:
> >> >> > On Fri, Sep 22 2017, Daniel P. Berrange wrote:
> >> >> > Rather than a flag, it might work to use network namespaces.
> >> >> > Very early in the init sequence the filesystem gets mounted using the
> >> >> > IPv6 link-local address on a client->host interface, and then a new
> >> >> > network namespace is created which does not include that interface, and
> >> >> > which everything else including firewall code runs in. Maybe.
> >> >>
> >> >> That seems closer, since it allows you to hide the interface from most
> >> >> of the guest while letting some special software--qemu guest agent?--
> >> >> still work with it. That agent would also need to be the one to do the
> >> >> mount, and would need to be able to make that mount usable to the rest
> >> >> of the guest.
> >> >>
> >> >> Sounds doable to me?
> >> >>
> >> >> There's still the problem of the paranoid security bureaucracy.
> >> >>
> >> >> It should be pretty easy to demonstrate that the host only allows
> >> >> point-to-point traffic on these interfaces. I'd hope that that, plus
> >> >> the appeal of the feature, would be enough to win out in the end. This
> >> >> is not a class of problem that I have experience dealing with, though!
> >> >
> >> > Programs wishing to use host<->guest networking might still need the
> >> > main network namespace for UNIX domain sockets and other
> >> > communication.
> >>
> >> Did I miss something.... the whole premise of this work seems to be that
> >> programs (nfs in particular) cannot rely on host<->guest networking
> >> because some rogue firewall might interfere with it, but now you say
> >> that some programs might rely on it....
> >
> > Programs rely on IPC (e.g. UNIX domain sockets) and that's affected by
> > network namespace isolation. This is what I was interested in.
> >
> > But I've checked that UNIX domain socket connect(2) works across network
> > namespaces for pathname sockets. The path to the socket file just needs
> > to be accessible via the file system.
> >
> >> However I think you missed the important point - maybe I didn't explain
> >> it clearly.
> >>
> >> My idea is that the "root" network namespace is only available in early
> >> boot. An NFS mount happens then (and possibly a daemon hangs around in
> >> this network namespace to refresh the NFS mount). A new network
> >> namespace is created and *everthing*else* runs in that subordinate
> >> namespace.
> >>
> >> If you want host<->guest networking in this subordinate namespace you
> >> are quite welcome to configure that - maybe a vethX interface which
> >> bridges out to the host interface.
> >> But the important point is that any iptables rules configured in the
> >> subordinate namespace will not affect the primary namespace and so will
> >> not hurt the NFS mount. They will be entirely local.
> >
> > Using the "root" (initial) network namespace is invasive. Hotplugged
> > NICs appear in the initial network netspace and interfaces move there if
> > a subordinate namespace is destroyed. Were you thinking of this
> > approach because it could share a single NIC (you mentioned bridging)?
>
> I was thinking of this approach because you appear to want isolation to
> protect the NFS mount from random firewalls, and the general approach of
> namespaces is to place the thing you want to contain (the firewall etc)
> in a subordinate namespace.
>
> However, if a different arrangement works better then a different
> arrangement should be pursued. I knew nothing about network namespaces
> until a couple of days ago, so I'm largely guessing.

Me neither.

> The problem I assumed you would have with putting NFS in a subordinate
> namespace is that the root namespace could still get in and mess it up,
> whereas once you are in a subordinate namespace, I assume you cannot
> get out (I assume that is part of the point). But maybe you can stop
> processes from the root namespace getting in, or maybe you can choose
> that that is not part of the threat scenario.

Good point, I didn't think about enforcing isolation. I was assuming
that anything running in the initial namespace will not mess with the
host<->guest namespace accidentally. That's probably a mistake :).

2017-09-13 18:18:09

by David Noveck

[permalink] [raw]
Subject: Re: [nfsv4] [PATCH nfs-utils v3 00/14] add NFS over AF_VSOCK support

> and how to ask IESG to assign it?

The way to get the IESG to assign it would be to write an RFC and get it
approved as a Proposed Standard but I don't think you need to do that.
There is a portion of the netid registry that is assigned on a
first-come-first-served basis (see RFCs 5665 and 5226) and if you are OK
with that, the IESG doesn't have to be involved. You simply have to ask
IANA to assign it, providing the information (pretty limited) mentioned in
those RFCs.

On Wed, Sep 13, 2017 at 12:21 PM, Christoph Hellwig <[email protected]>
wrote:

> Please get your VSOCK NFS transport into the ietf NFSv4 working group
> first before moving forward with Linux support - we should not implement
> non-standardized extensions.
>
> On Wed, Sep 13, 2017 at 11:26:36AM +0100, Stefan Hajnoczi wrote:
> > * The last revision was somewhat controversial because it's already
> possible
> > to share files between a hypervisor and virtual machine using TCP/IP,
> so why
> > add AF_VSOCK support to the stack? TCP/IP based solutions require the
> > virtual machine administrator to be involved in the configuration and
> are
> > therefore not suitable for automatic management by OpenStack, oVirt,
> etc.
> > Maintainers, is this feature acceptable?
> >
> > * Need advice on netid: is there agreement to use "tcpv" instead of
> "vsock" as
> > Chuck Lever suggested and how to ask IESG to assign it?
> >
> > The AF_VSOCK address family allows virtual machines to communicate with
> the
> > hypervisor using a zero-configuration transport. KVM, VMware, and
> Hyper-V
> > hypervisors support AF_VSOCK and it was first introduced in Linux 3.9.
> >
> > This patch series adds AF_VSOCK support to mount.nfs(8) and
> rpc.nfsd(8). To
> > mount an export from the hypervisor (CID 2):
> >
> > # mount.nfs 2:/srv/vm01 /mnt -o proto=vsock
> >
> > To serve exports over vsock port 2049:
> >
> > # nfsd ... --vsock 2049
> >
> > This series extends exports(5) syntax to handle vsock:<CID> or vsock:*.
> For
> > example, the guest with CID 3 can be given access using vsock:3.
> >
> > nfsd can export over IPv4/IPv6 and vsock at the same time. See the
> changes to
> > exports.man, nfs.man, and nfsd.man in the patches for syntax details.
> >
> > NFSv4 and later are supported.
> >
> > The code is also available here:
> > https://github.com/stefanha/nfs-utils/tree/vsock-nfsd
> >
> > The latest kernel patches are available here:
> > https://github.com/stefanha/linux/tree/vsock-nfsd
> >
> > Stefan Hajnoczi (14):
> > mount: don't use IPPROTO_UDP for address resolution
> > nfs-utils: add vsock.h
> > nfs-utils: add AF_VSOCK support to sockaddr.h
> > mount: present AF_VSOCK addresses
> > mount: accept AF_VSOCK in nfs_verify_family()
> > mount: generate AF_VSOCK clientaddr
> > getport: recognize "vsock" netid
> > mount: AF_VSOCK address parsing
> > exportfs: introduce host_freeaddrinfo()
> > exportfs: add AF_VSOCK address parsing and printing
> > exportfs: add AF_VSOCK support to set_addrlist()
> > exportfs: add support for "vsock:" exports(5) syntax
> > nfsd: add --vsock (-v) option to nfsd
> > tests: add "vsock:" exports(5) test case
> >
> > tests/Makefile.am | 3 +-
> > support/include/exportfs.h | 4 ++
> > support/include/sockaddr.h | 18 +++++
> > support/include/vsock.h | 59 +++++++++++++++++
> > utils/nfsd/nfssvc.h | 1 +
> > support/export/client.c | 8 +--
> > support/export/hostname.c | 161 ++++++++++++++++++++++++++++++
> +++++++++++++--
> > support/nfs/getport.c | 16 +++--
> > utils/exportfs/exportfs.c | 42 ++++++++++--
> > utils/mount/network.c | 37 ++++++++++-
> > utils/mount/stropts.c | 61 ++++++++++++++---
> > utils/mountd/auth.c | 2 +-
> > utils/mountd/cache.c | 10 +--
> > utils/mountd/mountd.c | 4 +-
> > utils/mountd/rmtab.c | 2 +-
> > utils/nfsd/nfsd.c | 18 ++++-
> > utils/nfsd/nfssvc.c | 62 +++++++++++++++++
> > configure.ac | 3 +
> > tests/t0002-vsock-basic.sh | 53 +++++++++++++++
> > utils/exportfs/exports.man | 12 +++-
> > utils/mount/nfs.man | 20 ++++--
> > utils/nfsd/nfsd.man | 4 ++
> > 22 files changed, 552 insertions(+), 48 deletions(-)
> > create mode 100644 support/include/vsock.h
> > create mode 100755 tests/t0002-vsock-basic.sh
> >
> > --
> > 2.13.5
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> ---end quoted text---
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4
>


Attachments:
(No filename) (126.00 B)