2017-06-30 13:21:25

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 00/12] add NFS over AF_VSOCK support

The AF_VSOCK address family allows virtual machines to communicate with the
hypervisor using a zero-configuration transport. Both KVM and VMware
hypervisors support AF_VSOCK and it was introduced in Linux 3.9.

This patch series adds AF_VSOCK support to mount.nfs(8) and rpc.nfsd(8).

It requires the kernel NFS client patches that I am also posting to
[email protected]. Please see the kernel patch series for an overview
of NFS over AF_VSOCK and quickstart instructions for testing.

This series extends exports(5) syntax to handle vsock:<CID> or vsock:*. For
example, the guest with CID 3 can be given access using vsock:3.

The code is also available here:
https://github.com/stefanha/nfs-utils/tree/vsock-nfsd

Stefan Hajnoczi (12):
mount: don't use IPPROTO_UDP for address resolution
nfs-utils: add AF_VSOCK support to sockaddr.h
mount: present AF_VSOCK addresses
mount: accept AF_VSOCK in nfs_verify_family()
getport: recognize "vsock" netid
mount: AF_VSOCK address parsing
exportfs: introduce host_freeaddrinfo()
exportfs: add AF_VSOCK address parsing and printing
exportfs: add AF_VSOCK support to set_addrlist()
exportfs: add support for "vsock:" exports(5) syntax
nfsd: add --vsock (-v) option to nfsd
tests: add "vsock:" exports(5) test case

tests/Makefile.am | 3 +-
support/include/exportfs.h | 6 ++
support/include/sockaddr.h | 29 +++++++
utils/nfsd/nfssvc.h | 1 +
support/export/client.c | 8 +-
support/export/hostname.c | 183 +++++++++++++++++++++++++++++++++++++++++++--
support/nfs/getport.c | 16 +++-
utils/exportfs/exportfs.c | 42 +++++++++--
utils/mount/network.c | 10 ++-
utils/mount/stropts.c | 67 +++++++++++++++--
utils/mountd/auth.c | 2 +-
utils/mountd/cache.c | 10 +--
utils/mountd/mountd.c | 4 +-
utils/mountd/rmtab.c | 2 +-
utils/nfsd/nfsd.c | 18 ++++-
utils/nfsd/nfssvc.c | 62 +++++++++++++++
tests/t0002-vsock-basic.sh | 53 +++++++++++++
17 files changed, 474 insertions(+), 42 deletions(-)
create mode 100755 tests/t0002-vsock-basic.sh

--
2.9.4



2017-06-30 13:21:27

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 01/12] mount: don't use IPPROTO_UDP for address resolution

Although getaddrinfo(3) with IPPROTO_UDP works fine for AF_INET and
AF_INET6, the AF_VSOCK address family does not support IPPROTO_UDP and
produces an error.

Drop IPPROTO_UDP and use the 0 default (TCP) which works for all address
families. Modern NFS uses TCP anyway so it's strange to specify UDP.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
---
utils/mount/stropts.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
index c2a739b..99656dd 100644
--- a/utils/mount/stropts.c
+++ b/utils/mount/stropts.c
@@ -909,9 +909,7 @@ static int nfs_try_mount(struct nfsmount_info *mi)
int result = 0;

if (mi->address == NULL) {
- struct addrinfo hint = {
- .ai_protocol = (int)IPPROTO_UDP,
- };
+ struct addrinfo hint = {};
int error;
struct addrinfo *address;

--
2.9.4


2017-06-30 13:21:30

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 02/12] nfs-utils: add AF_VSOCK support to sockaddr.h

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/include/sockaddr.h | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/support/include/sockaddr.h b/support/include/sockaddr.h
index 446b537..0577859 100644
--- a/support/include/sockaddr.h
+++ b/support/include/sockaddr.h
@@ -31,6 +31,9 @@
#include <stdbool.h>
#include <sys/socket.h>
#include <netinet/in.h>
+#ifdef AF_VSOCK
+#include <linux/vm_sockets.h>
+#endif

/*
* This type is for defining buffers that contain network socket
@@ -51,6 +54,9 @@ union nfs_sockaddr {
struct sockaddr sa;
struct sockaddr_in s4;
struct sockaddr_in6 s6;
+#ifdef AF_VSOCK
+ struct sockaddr_vm svm;
+#endif
};

#if SIZEOF_SOCKLEN_T - 0 == 0
@@ -66,6 +72,10 @@ union nfs_sockaddr {
#define SIZEOF_SOCKADDR_IN6 SIZEOF_SOCKADDR_UNKNOWN
#endif /* !IPV6_SUPPORTED */

+#ifdef AF_VSOCK
+#define SIZEOF_SOCKADDR_VM (socklen_t)sizeof(struct sockaddr_vm)
+#endif
+
/**
* nfs_sockaddr_length - return the size in bytes of a socket address
* @sap: pointer to socket address
@@ -81,6 +91,10 @@ nfs_sockaddr_length(const struct sockaddr *sap)
return SIZEOF_SOCKADDR_IN;
case AF_INET6:
return SIZEOF_SOCKADDR_IN6;
+#ifdef AF_VSOCK
+ case AF_VSOCK:
+ return SIZEOF_SOCKADDR_VM;
+#endif /* !AF_VSOCK */
}
return SIZEOF_SOCKADDR_UNKNOWN;
}
@@ -218,6 +232,17 @@ compare_sockaddr6(__attribute__ ((unused)) const struct sockaddr *sa1,
}
#endif /* !IPV6_SUPPORTED */

+#ifdef AF_VSOCK
+static inline _Bool
+compare_sockaddr_vsock(const struct sockaddr *sa1, const struct sockaddr *sa2)
+{
+ const struct sockaddr_vm *svm1 = (const struct sockaddr_vm *)sa1;
+ const struct sockaddr_vm *svm2 = (const struct sockaddr_vm *)sa2;
+
+ return svm1->svm_cid == svm2->svm_cid;
+}
+#endif
+
/**
* nfs_compare_sockaddr - compare two socket addresses for equality
* @sa1: pointer to a socket address
@@ -238,6 +263,10 @@ nfs_compare_sockaddr(const struct sockaddr *sa1, const struct sockaddr *sa2)
return compare_sockaddr4(sa1, sa2);
case AF_INET6:
return compare_sockaddr6(sa1, sa2);
+#ifdef AF_VSOCK
+ case AF_VSOCK:
+ return compare_sockaddr_vsock(sa1, sa2);
+#endif
}

return false;
--
2.9.4


2017-06-30 13:21:45

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 06/12] mount: AF_VSOCK address parsing

getaddrinfo(3) does not have AF_VSOCK support. Parse the CID in the
hostname option and build a struct sockaddr_vm.

There is one tricky thing here: struct addrinfo is normally allocated by
getaddrinfo(3) and freed by freeaddrinfo(3). The memory allocation of
the struct addrinfo and struct sockaddr are an implementation detail of
libc. Therefore we must avoid freeaddrinfo(3) calls when the addrinfo
details were filled out by us for AF_VSOCK instead of by getaddrinfo(3).

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/mount/stropts.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
index 99656dd..3161609 100644
--- a/utils/mount/stropts.c
+++ b/utils/mount/stropts.c
@@ -898,6 +898,42 @@ fall_back:
return nfs_try_mount_v3v2(mi, FALSE);
}

+#ifdef AF_VSOCK
+/* There are no guarantees on how getaddrinfo(3) allocates struct addrinfo so
+ * be sure to call free(3) on *address instead of freeaddrinfo(3).
+ */
+static int vsock_getaddrinfo(struct nfsmount_info *mi,
+ struct addrinfo **address)
+{
+ struct {
+ struct addrinfo ai;
+ struct sockaddr_vm svm;
+ } *vai;
+ long cid;
+ char *endptr;
+
+ errno = 0;
+ cid = strtol(mi->hostname, &endptr, 10);
+ if (errno != 0 || *endptr != '\0' ||
+ cid < 0 || cid > UINT_MAX)
+ return EAI_NONAME;
+
+ vai = calloc(1, sizeof(*vai));
+ if (!vai)
+ return EAI_MEMORY;
+
+ vai->ai.ai_family = AF_VSOCK;
+ vai->ai.ai_socktype = SOCK_STREAM;
+ vai->ai.ai_addrlen = sizeof(vai->svm);
+ vai->ai.ai_addr = (struct sockaddr *)&vai->svm;
+ vai->svm.svm_family = AF_VSOCK;
+ vai->svm.svm_cid = cid;
+
+ *address = &vai->ai;
+ return 0;
+}
+#endif /* AF_VSOCK */
+
/*
* This is a single pass through the fg/bg loop.
*
@@ -909,12 +945,21 @@ static int nfs_try_mount(struct nfsmount_info *mi)
int result = 0;

if (mi->address == NULL) {
- struct addrinfo hint = {};
- int error;
- struct addrinfo *address;
+ int error = 0;
+ struct addrinfo *address = NULL;
+
+#ifdef AF_VSOCK
+ if (mi->family == AF_VSOCK)
+ error = vsock_getaddrinfo(mi, &address);
+#endif
+
+ if (error == 0 && !address) {
+ struct addrinfo hint = {};
+
+ hint.ai_family = (int)mi->family;
+ error = getaddrinfo(mi->hostname, NULL, &hint, &address);
+ }

- hint.ai_family = (int)mi->family;
- error = getaddrinfo(mi->hostname, NULL, &hint, &address);
if (error != 0) {
if (error == EAI_AGAIN)
errno = EAGAIN;
@@ -1209,6 +1254,14 @@ int nfsmount_string(const char *spec, const char *node, const char *type,
} else
nfs_error(_("%s: internal option parsing error"), progname);

+#ifdef AF_VSOCK
+ /* See vsock_getaddrinfo() for why we cannot use freeaddrinfo(3) */
+ if (mi.address && mi.address->ai_family == AF_VSOCK) {
+ free(mi.address);
+ mi.address = NULL;
+ }
+#endif
+
freeaddrinfo(mi.address);
free(mi.hostname);
return retval;
--
2.9.4


2017-06-30 13:21:36

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 04/12] mount: accept AF_VSOCK in nfs_verify_family()

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/mount/network.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utils/mount/network.c b/utils/mount/network.c
index b5dcaa5..8f362de 100644
--- a/utils/mount/network.c
+++ b/utils/mount/network.c
@@ -1419,7 +1419,7 @@ sa_family_t config_default_family = AF_INET;
static int
nfs_verify_family(sa_family_t family)
{
- if (family != AF_INET)
+ if (family != AF_INET && family != AF_VSOCK)
return 0;

return 1;
--
2.9.4


2017-06-30 13:21:40

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. For similar
reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.

It is now possible to mount a file system from the host (hypervisor)
over AF_VSOCK like this:

(guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock

The VM's cid address is 3 and the hypervisor is 2.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/nfs/getport.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/support/nfs/getport.c b/support/nfs/getport.c
index 081594c..0b857af 100644
--- a/support/nfs/getport.c
+++ b/support/nfs/getport.c
@@ -217,8 +217,7 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
struct protoent *proto;

/*
- * IANA does not define a protocol number for rdma netids,
- * since "rdma" is not an IP protocol.
+ * IANA does not define protocol numbers for non-IP netids.
*/
if (strcmp(netid, "rdma") == 0) {
*family = AF_INET;
@@ -230,6 +229,11 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
*protocol = NFSPROTO_RDMA;
return 1;
}
+ if (strcmp(netid, "vsock") == 0) {
+ *family = AF_VSOCK;
+ *protocol = 0;
+ return 1;
+ }

nconf = getnetconfigent(netid);
if (nconf == NULL)
@@ -258,14 +262,18 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
struct protoent *proto;

/*
- * IANA does not define a protocol number for rdma netids,
- * since "rdma" is not an IP protocol.
+ * IANA does not define protocol numbers for non-IP netids.
*/
if (strcmp(netid, "rdma") == 0) {
*family = AF_INET;
*protocol = NFSPROTO_RDMA;
return 1;
}
+ if (strcmp(netid, "vsock") == 0) {
+ *family = AF_VSOCK;
+ *protocol = 0;
+ return 1;
+ }

proto = getprotobyname(netid);
if (proto == NULL)
--
2.9.4


2017-06-30 13:21:34

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses

Format vsock hosts as "vsock:<cid>" so the addresses can be easily
distinguished from IPv4 and IPv6 addresses.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/mount/network.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/utils/mount/network.c b/utils/mount/network.c
index 281e935..b5dcaa5 100644
--- a/utils/mount/network.c
+++ b/utils/mount/network.c
@@ -45,6 +45,8 @@
#include <rpc/pmap_prot.h>
#include <rpc/pmap_clnt.h>

+#include <linux/vm_sockets.h>
+
#include "sockaddr.h"
#include "xcommon.h"
#include "mount.h"
@@ -325,6 +327,12 @@ int nfs_string_to_sockaddr(const char *address, struct sockaddr *sap,
int nfs_present_sockaddr(const struct sockaddr *sap, const socklen_t salen,
char *buf, const size_t buflen)
{
+ if (sap->sa_family == AF_VSOCK) {
+ snprintf(buf, buflen, "vsock:%u",
+ ((struct sockaddr_vm *)sap)->svm_cid);
+ return 1;
+ }
+
#ifdef HAVE_GETNAMEINFO
int result;

--
2.9.4


2017-06-30 13:21:51

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 07/12] exportfs: introduce host_freeaddrinfo()

The AF_VSOCK address family is not supported by the getaddrinfo(3)
family of functions so we will have to arrange our own struct addrinfo
for vsock addresses.

Different libc implementations can allocate struct addrinfo and struct
sockaddr in different ways. Since the memory allocation details of
getaddrinfo(3) results are private to libc we cannot call freeaddrinfo()
on a struct addrinfo we allocated ourselves.

Introduce a freeaddrinfo(3) wrapper function that will be used to safely
free AF_VSOCK struct addrinfo in a later patch. Just introduce the
wrapper function now so this patch is easy to review without
AF_VSOCK-specific changes.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/include/exportfs.h | 1 +
support/export/client.c | 8 ++++----
support/export/hostname.c | 28 +++++++++++++++++++++-------
utils/exportfs/exportfs.c | 10 +++++-----
utils/mountd/auth.c | 2 +-
utils/mountd/cache.c | 10 +++++-----
utils/mountd/mountd.c | 4 ++--
utils/mountd/rmtab.c | 2 +-
8 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/support/include/exportfs.h b/support/include/exportfs.h
index 8af47a8..98d45c5 100644
--- a/support/include/exportfs.h
+++ b/support/include/exportfs.h
@@ -161,6 +161,7 @@ __attribute__((__malloc__))
struct addrinfo * host_reliable_addrinfo(const struct sockaddr *sap);
__attribute__((__malloc__))
struct addrinfo * host_numeric_addrinfo(const struct sockaddr *sap);
+void host_freeaddrinfo(struct addrinfo * ai);

struct nfskey * key_lookup(char *hname);

diff --git a/support/export/client.c b/support/export/client.c
index 2346f99..881c776 100644
--- a/support/export/client.c
+++ b/support/export/client.c
@@ -210,7 +210,7 @@ init_subnetwork(nfs_client *clp)
set_addrlist(clp, 0, ai->ai_addr);
family = ai->ai_addr->sa_family;

- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);

switch (family) {
case AF_INET:
@@ -309,7 +309,7 @@ client_lookup(char *hname, int canonical)
init_addrlist(clp, ai);

out:
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
return clp;
}

@@ -378,7 +378,7 @@ client_freeall(void)
* @sap: pointer to socket address to resolve
*
* Returns an addrinfo structure, or NULL if some problem occurred.
- * Caller must free the result with freeaddrinfo(3).
+ * Caller must free the result with host_freeaddrinfo().
*/
struct addrinfo *
client_resolve(const struct sockaddr *sap)
@@ -673,7 +673,7 @@ check_netgroup(const nfs_client *clp, const struct addrinfo *ai)
tmp = host_pton(hname);
if (tmp != NULL) {
char *cname = host_canonname(tmp->ai_addr);
- freeaddrinfo(tmp);
+ host_freeaddrinfo(tmp);

/* The resulting FQDN may be in our netgroup. */
if (cname != NULL) {
diff --git a/support/export/hostname.c b/support/export/hostname.c
index 5c4c824..7f8a6f8 100644
--- a/support/export/hostname.c
+++ b/support/export/hostname.c
@@ -89,7 +89,7 @@ host_ntop(const struct sockaddr *sap, char *buf, const size_t buflen)
* IP presentation address
*
* Returns address info structure, or NULL if an error occurs. Caller
- * must free the returned structure with freeaddrinfo(3).
+ * must free the returned structure with host_freeaddrinfo().
*/
__attribute__((__malloc__))
struct addrinfo *
@@ -155,7 +155,7 @@ host_pton(const char *paddr)
*
* Returns address info structure with ai_canonname filled in, or NULL
* if no information is available for @hostname. Caller must free the
- * returned structure with freeaddrinfo(3).
+ * returned structure with host_freeaddrinfo().
*/
__attribute__((__malloc__))
struct addrinfo *
@@ -192,6 +192,20 @@ host_addrinfo(const char *hostname)
}

/**
+ * host_freeaddrinfo - free addrinfo obtained from host_*() functions
+ * @ai: pointer to addrinfo to free
+ *
+ * The addrinfos returned by host_*() functions may not have been allocated by
+ * a call to getaddrinfo(3). It is not safe to free them directly with
+ * freeaddrinfo(3). Use this function instead.
+ */
+void
+host_freeaddrinfo(struct addrinfo *ai)
+{
+ freeaddrinfo(ai);
+}
+
+/**
* host_canonname - return canonical hostname bound to an address
* @sap: pointer to socket address to look up
*
@@ -268,7 +282,7 @@ host_canonname(const struct sockaddr *sap)
* ai_canonname filled in. If there is a problem with resolution or
* the resolved records don't match up properly then it returns NULL
*
- * Caller must free the returned structure with freeaddrinfo(3).
+ * Caller must free the returned structure with host_freeaddrinfo().
*/
__attribute__((__malloc__))
struct addrinfo *
@@ -290,7 +304,7 @@ host_reliable_addrinfo(const struct sockaddr *sap)
if (nfs_compare_sockaddr(a->ai_addr, sap))
break;

- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
if (!a)
goto out_free_hostname;

@@ -314,7 +328,7 @@ out_free_hostname:
* @sap: pointer to socket address
*
* Returns address info structure, or NULL if an error occurred.
- * Caller must free the returned structure with freeaddrinfo(3).
+ * Caller must free the returned structure with host_freeaddrinfo().
*/
#ifdef HAVE_GETNAMEINFO
__attribute__((__malloc__))
@@ -357,7 +371,7 @@ host_numeric_addrinfo(const struct sockaddr *sap)
free(ai->ai_canonname); /* just in case */
ai->ai_canonname = strdup(buf);
if (ai->ai_canonname == NULL) {
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
ai = NULL;
}
}
@@ -390,7 +404,7 @@ host_numeric_addrinfo(const struct sockaddr *sap)
if (ai != NULL) {
ai->ai_canonname = strdup(buf);
if (ai->ai_canonname == NULL) {
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
ai = NULL;
}
}
diff --git a/utils/exportfs/exportfs.c b/utils/exportfs/exportfs.c
index beed1b3..3ded733 100644
--- a/utils/exportfs/exportfs.c
+++ b/utils/exportfs/exportfs.c
@@ -282,7 +282,7 @@ exportfs_parsed(char *hname, char *path, char *options, int verbose)
validate_export(exp);

out:
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}

static int exportfs_generic(char *arg, char *options, int verbose)
@@ -395,7 +395,7 @@ unexportfs_parsed(char *hname, char *path, int verbose)
if (!success)
xlog(L_ERROR, "Could not find '%s:%s' to unexport.", hname, path);

- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}

static int unexportfs_generic(char *arg, int verbose)
@@ -588,7 +588,7 @@ address_list(const char *hostname)
if (ai != NULL) {
/* @hostname was a presentation address */
cname = host_canonname(ai->ai_addr);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
if (cname != NULL)
goto out;
}
@@ -639,8 +639,8 @@ matchhostname(const char *hostname1, const char *hostname2)
}

out:
- freeaddrinfo(results1);
- freeaddrinfo(results2);
+ host_freeaddrinfo(results1);
+ host_freeaddrinfo(results2);
return result;
}

diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
index 8299256..dee0f3d 100644
--- a/utils/mountd/auth.c
+++ b/utils/mountd/auth.c
@@ -297,7 +297,7 @@ auth_authenticate(const char *what, const struct sockaddr *caller,
path, epath, error);
}

- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
return exp;
}

diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
index ca6c84f..b57fef6 100644
--- a/utils/mountd/cache.c
+++ b/utils/mountd/cache.c
@@ -112,7 +112,7 @@ static void auth_unix_ip(int f)
ai = client_resolve(tmp->ai_addr);
if (ai) {
client = client_compose(ai);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}
}
bp = buf; blen = sizeof(buf);
@@ -132,7 +132,7 @@ static void auth_unix_ip(int f)
xlog(D_CALL, "auth_unix_ip: client %p '%s'", client, client?client: "DEFAULT");

free(client);
- freeaddrinfo(tmp);
+ host_freeaddrinfo(tmp);

}

@@ -666,7 +666,7 @@ static struct addrinfo *lookup_client_addr(char *dom)
if (tmp == NULL)
return NULL;
ret = client_resolve(tmp->ai_addr);
- freeaddrinfo(tmp);
+ host_freeaddrinfo(tmp);
return ret;
}

@@ -833,7 +833,7 @@ static void nfsd_fh(int f)
out:
if (found_path)
free(found_path);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
free(dom);
xlog(D_CALL, "nfsd_fh: found %p path %s", found, found ? found->e_path : NULL);
}
@@ -1363,7 +1363,7 @@ static void nfsd_export(int f)
xlog(D_CALL, "nfsd_export: found %p path %s", found, path ? path : NULL);
if (dom) free(dom);
if (path) free(path);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}


diff --git a/utils/mountd/mountd.c b/utils/mountd/mountd.c
index 829f803..3193ded 100644
--- a/utils/mountd/mountd.c
+++ b/utils/mountd/mountd.c
@@ -578,10 +578,10 @@ static void prune_clients(nfs_export *exp, struct exportnode *e)
*cp = c->gr_next;
xfree(c->gr_name);
xfree(c);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
continue;
}
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}
cp = &(c->gr_next);
}
diff --git a/utils/mountd/rmtab.c b/utils/mountd/rmtab.c
index 3ae0dbb..99f7474 100644
--- a/utils/mountd/rmtab.c
+++ b/utils/mountd/rmtab.c
@@ -226,7 +226,7 @@ mountlist_list(void)
ai = host_pton(rep->r_client);
if (ai != NULL) {
m->ml_hostname = host_canonname(ai->ai_addr);
- freeaddrinfo(ai);
+ host_freeaddrinfo(ai);
}
}
if (m->ml_hostname == NULL)
--
2.9.4


2017-06-30 13:21:58

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 09/12] exportfs: add AF_VSOCK support to set_addrlist()

Let set_addrlist() set AF_VSOCK client addresses.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/include/exportfs.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/support/include/exportfs.h b/support/include/exportfs.h
index 98d45c5..d3d5abc 100644
--- a/support/include/exportfs.h
+++ b/support/include/exportfs.h
@@ -89,6 +89,11 @@ set_addrlist(nfs_client *clp, const int i, const struct sockaddr *sap)
memcpy(&clp->m_addrlist[i].s6, sap, sizeof(struct sockaddr_in6));
break;
#endif
+#ifdef AF_VSOCK
+ case AF_VSOCK:
+ memcpy(&clp->m_addrlist[i].svm, sap, sizeof(struct sockaddr_vm));
+ break;
+#endif
}
}

--
2.9.4


2017-06-30 13:22:05

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 11/12] nfsd: add --vsock (-v) option to nfsd

The following command-line serves NFSv4.1 over AF_VSOCK:

nfsd -TU -N3 -V4.1 -v 2049

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/nfsd/nfssvc.h | 1 +
utils/nfsd/nfsd.c | 18 +++++++++++++---
utils/nfsd/nfssvc.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/utils/nfsd/nfssvc.h b/utils/nfsd/nfssvc.h
index 39ebf37..1d251ca 100644
--- a/utils/nfsd/nfssvc.h
+++ b/utils/nfsd/nfssvc.h
@@ -26,6 +26,7 @@ int nfssvc_set_sockets(const unsigned int protobits,
const char *host, const char *port);
void nfssvc_set_time(const char *type, const int seconds);
int nfssvc_set_rdmaport(const char *port);
+int nfssvc_set_vsock(const char *port);
void nfssvc_setvers(unsigned int ctlbits, unsigned int minorvers4, unsigned int minorvers4set);
int nfssvc_threads(int nrservs);
void nfssvc_get_minormask(unsigned int *mask);
diff --git a/utils/nfsd/nfsd.c b/utils/nfsd/nfsd.c
index 2b38249..f969da0 100644
--- a/utils/nfsd/nfsd.c
+++ b/utils/nfsd/nfsd.c
@@ -53,6 +53,7 @@ static struct option longopts[] =
{ "rdma", 2, 0, 'R' },
{ "grace-time", 1, 0, 'G'},
{ "lease-time", 1, 0, 'L'},
+ { "vsock", 1, 0, 'v' },
{ NULL, 0, 0, 0 }
};

@@ -61,6 +62,7 @@ main(int argc, char **argv)
{
int count = NFSD_NPROC, c, i, error = 0, portnum, fd, found_one;
char *p, *progname, *port, *rdma_port = NULL;
+ char *vsock_port = NULL;
char **haddr = NULL;
int hcounter = 0;
struct conf_list *hosts;
@@ -145,7 +147,7 @@ main(int argc, char **argv)
}
}

- while ((c = getopt_long(argc, argv, "dH:hN:V:p:P:stTituUrG:L:", longopts, NULL)) != EOF) {
+ while ((c = getopt_long(argc, argv, "dH:hN:V:p:P:stTituUrG:L:v:", longopts, NULL)) != EOF) {
switch(c) {
case 'd':
xlog_config(D_ALL, 1);
@@ -180,7 +182,9 @@ main(int argc, char **argv)
else
rdma_port = "nfsrdma";
break;
-
+ case 'v': /* --vsock */
+ vsock_port = optarg;
+ break;
case 'N':
switch((c = strtol(optarg, &p, 0))) {
case 4:
@@ -306,7 +310,8 @@ main(int argc, char **argv)
}

if (NFSCTL_VERISSET(versbits, 4) &&
- !NFSCTL_TCPISSET(protobits)) {
+ !NFSCTL_TCPISSET(protobits) &&
+ !vsock_port) {
xlog(L_ERROR, "version 4 requires the TCP protocol");
exit(1);
}
@@ -350,6 +355,13 @@ main(int argc, char **argv)
if (!error)
socket_up = 1;
}
+
+ if (vsock_port) {
+ error = nfssvc_set_vsock(vsock_port);
+ if (!error)
+ socket_up = 1;
+ }
+
set_threads:
/* don't start any threads if unable to hand off any sockets */
if (!socket_up) {
diff --git a/utils/nfsd/nfssvc.c b/utils/nfsd/nfssvc.c
index e8609c1..e867454 100644
--- a/utils/nfsd/nfssvc.c
+++ b/utils/nfsd/nfssvc.c
@@ -15,6 +15,7 @@
#include <netdb.h>
#include <netinet/in.h>
#include <arpa/inet.h>
+#include <linux/vm_sockets.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
@@ -304,6 +305,67 @@ nfssvc_set_rdmaport(const char *port)
return ret;
}

+int
+nfssvc_set_vsock(const char *port)
+{
+ struct sockaddr_vm svm;
+ int nport;
+ char buf[20];
+ int rc = 1;
+ int sockfd = -1;
+ int fd = -1;
+ char *ep;
+
+ nport = strtol(port, &ep, 10);
+ if (!*port || *ep) {
+ xlog(L_ERROR, "unable to interpret port name %s",
+ port);
+ goto out;
+ }
+
+ sockfd = socket(AF_VSOCK, SOCK_STREAM, 0);
+ if (sockfd < 0) {
+ xlog(L_ERROR, "unable to create AF_VSOCK socket: "
+ "errno %d (%m)", errno);
+ goto out;
+ }
+
+ svm.svm_family = AF_VSOCK;
+ svm.svm_port = nport;
+ svm.svm_cid = VMADDR_CID_ANY;
+
+ if (bind(sockfd, (struct sockaddr*)&svm, sizeof(svm))) {
+ xlog(L_ERROR, "unable to bind AF_VSOCK socket: "
+ "errno %d (%m)", errno);
+ goto out;
+ }
+
+ if (listen(sockfd, 64)) {
+ xlog(L_ERROR, "unable to create listening socket: "
+ "errno %d (%m)", errno);
+ goto out;
+ }
+
+ fd = open(NFSD_PORTS_FILE, O_WRONLY);
+ if (fd < 0) {
+ xlog(L_ERROR, "couldn't open ports file: errno "
+ "%d (%m)", errno);
+ goto out;
+ }
+ snprintf(buf, sizeof(buf), "%d\n", sockfd);
+ if (write(fd, buf, strlen(buf)) != (ssize_t)strlen(buf)) {
+ xlog(L_ERROR, "unable to request vsock services: %m");
+ goto out;
+ }
+ rc = 0;
+out:
+ if (fd != -1)
+ close(fd);
+ if (sockfd != -1)
+ close(sockfd);
+ return rc;
+}
+
void
nfssvc_set_time(const char *type, const int seconds)
{
--
2.9.4


2017-06-30 13:21:56

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 08/12] exportfs: add AF_VSOCK address parsing and printing

Add code to parse and print AF_VSOCK addresses since the getaddrinfo(3)
family of functions don't handle this address family.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
support/export/hostname.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 155 insertions(+)

diff --git a/support/export/hostname.c b/support/export/hostname.c
index 7f8a6f8..b7f8b39 100644
--- a/support/export/hostname.c
+++ b/support/export/hostname.c
@@ -30,6 +30,107 @@
#include "sockaddr.h"
#include "exportfs.h"

+#ifdef AF_VSOCK
+static char *
+host_ntop_vsock(const struct sockaddr *sap, char *buf, const size_t buflen)
+{
+ struct sockaddr_vm *svm = (struct sockaddr_vm *)sap;
+ snprintf(buf, buflen, "vsock:%u", svm->svm_cid);
+ return buf;
+}
+
+/* Allocate an addrinfo for AF_VSOCK. Free with host_freeaddrinfo(). */
+static struct addrinfo *
+vsock_alloc_addrinfo(struct sockaddr_vm **svm)
+{
+ struct {
+ struct addrinfo ai;
+ struct sockaddr_vm svm;
+ } *vai;
+
+ vai = calloc(1, sizeof(*vai));
+ if (!vai)
+ return NULL;
+
+ vai->ai.ai_family = AF_VSOCK;
+ vai->ai.ai_socktype = SOCK_STREAM;
+ vai->ai.ai_addrlen = sizeof(vai->svm);
+ vai->ai.ai_addr = (struct sockaddr *)&vai->svm;
+ vai->svm.svm_family = AF_VSOCK;
+
+ if (svm)
+ *svm = &vai->svm;
+
+ return &vai->ai;
+}
+
+/* hostname -> addrinfo */
+static struct addrinfo *
+vsock_hostname_addrinfo(const char *hostname)
+{
+ const char *cid_str;
+ char *end_ptr;
+ struct addrinfo *ai;
+ struct sockaddr_vm *svm;
+ long cid;
+
+ cid_str = hostname + strlen("vsock:");
+ cid = strtol(cid_str, &end_ptr, 10);
+ if (end_ptr == cid_str || *end_ptr != '\0')
+ return NULL;
+ if (cid < 0 || cid > UINT32_MAX)
+ return NULL;
+
+ ai = vsock_alloc_addrinfo(&svm);
+ if (!ai)
+ return NULL;
+
+ ai->ai_canonname = strdup(hostname);
+ if (!ai->ai_canonname) {
+ host_freeaddrinfo(ai);
+ return NULL;
+ }
+
+ svm->svm_cid = cid;
+ return ai;
+}
+
+/* sockaddr -> hostname */
+static char *
+vsock_canonname(const struct sockaddr *sap)
+{
+ const struct sockaddr_vm *svm = (const struct sockaddr_vm *)sap;
+ char *canonname;
+
+ if (asprintf(&canonname, "vsock:%u", svm->svm_cid) < 0)
+ return NULL;
+ return canonname;
+}
+
+/* sockaddr -> addrinfo */
+static struct addrinfo *
+vsock_sockaddr_addrinfo(const struct sockaddr *sap)
+{
+ const struct sockaddr_vm *svm = (const struct sockaddr_vm *)sap;
+ struct sockaddr_vm *ai_svm;
+ struct addrinfo *ai;
+
+ ai = vsock_alloc_addrinfo(&ai_svm);
+ if (!ai)
+ return NULL;
+
+ *ai_svm = *svm;
+
+ ai->ai_canonname = vsock_canonname(sap);
+ if (!ai->ai_canonname) {
+ host_freeaddrinfo(ai);
+ return NULL;
+ }
+
+ return ai;
+}
+#endif /* AF_VSOCK */
+
/**
* host_ntop - generate presentation address given a sockaddr
* @sap: pointer to socket address
@@ -52,6 +153,11 @@ host_ntop(const struct sockaddr *sap, char *buf, const size_t buflen)
return buf;
}

+#ifdef AF_VSOCK
+ if (sap->sa_family == AF_VSOCK)
+ return host_ntop_vsock(sap, buf, buflen);
+#endif
+
error = getnameinfo(sap, salen, buf, (socklen_t)buflen,
NULL, 0, NI_NUMERICHOST);
if (error != 0) {
@@ -69,6 +175,11 @@ host_ntop(const struct sockaddr *sap, char *buf, const size_t buflen)

memset(buf, 0, buflen);

+#ifdef AF_VSOCK
+ if (sap->sa_family == AF_VSOCK)
+ return host_ntop_vsock(sap, buf, buflen);
+#endif
+
if (sin->sin_family != AF_INET) {
(void)strncpy(buf, "bad family", buflen - 1);
return buf;
@@ -120,6 +231,12 @@ host_pton(const char *paddr)
__func__);
return NULL;
}
+
+#ifdef AF_VSOCK
+ if (strncmp(paddr, "vsock:", strlen("vsock:")) == 0)
+ return vsock_hostname_addrinfo(paddr);
+#endif
+
inet4 = 1;
if (inet_pton(AF_INET, paddr, &sin.sin_addr) == 0)
inet4 = 0;
@@ -174,6 +291,11 @@ host_addrinfo(const char *hostname)
};
int error;

+#ifdef AF_VSOCK
+ if (strncmp(hostname, "vsock:", strlen("vsock:")) == 0)
+ return vsock_hostname_addrinfo(hostname);
+#endif
+
error = getaddrinfo(hostname, NULL, &hint, &ai);
switch (error) {
case 0:
@@ -202,6 +324,14 @@ host_addrinfo(const char *hostname)
void
host_freeaddrinfo(struct addrinfo *ai)
{
+#ifdef AF_VSOCK
+ if (ai && ai->ai_family == AF_VSOCK) {
+ free(ai->ai_canonname);
+ free(ai);
+ return;
+ }
+#endif /* AF_VSOCK */
+
freeaddrinfo(ai);
}

@@ -225,6 +355,11 @@ host_canonname(const struct sockaddr *sap)
char buf[NI_MAXHOST];
int error;

+#ifdef AF_VSOCK
+ if (sap->sa_family == AF_VSOCK)
+ return vsock_canonname(sap);
+#endif
+
if (salen == 0) {
xlog(D_GENERAL, "%s: unsupported address family %d",
__func__, sap->sa_family);
@@ -260,6 +395,11 @@ host_canonname(const struct sockaddr *sap)
const struct in_addr *addr = &sin->sin_addr;
struct hostent *hp;

+#ifdef AF_VSOCK
+ if (sap->sa_family == AF_VSOCK)
+ return vsock_canonname(sap);
+#endif
+
if (sap->sa_family != AF_INET)
return NULL;

@@ -291,6 +431,11 @@ host_reliable_addrinfo(const struct sockaddr *sap)
struct addrinfo *ai, *a;
char *hostname;

+#ifdef AF_VSOCK
+ if (sap->sa_family == AF_VSOCK)
+ return vsock_sockaddr_addrinfo(sap);
+#endif
+
hostname = host_canonname(sap);
if (hostname == NULL)
return NULL;
@@ -340,6 +485,11 @@ host_numeric_addrinfo(const struct sockaddr *sap)
struct addrinfo *ai;
int error;

+#ifdef AF_VSOCK
+ if (sap->sa_family == AF_VSOCK)
+ return vsock_sockaddr_addrinfo(sap);
+#endif
+
if (salen == 0) {
xlog(D_GENERAL, "%s: unsupported address family %d",
__func__, sap->sa_family);
@@ -388,6 +538,11 @@ host_numeric_addrinfo(const struct sockaddr *sap)
char buf[INET_ADDRSTRLEN];
struct addrinfo *ai;

+#ifdef AF_VSOCK
+ if (sap->sa_family == AF_VSOCK)
+ return vsock_sockaddr_addrinfo(sap);
+#endif
+
if (sap->sa_family != AF_INET)
return NULL;

--
2.9.4


2017-06-30 13:22:01

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 10/12] exportfs: add support for "vsock:" exports(5) syntax

Allow exports to be restricted to AF_VSOCK clients:

# exportfs vsock:3:/export

and:

# cat /etc/exports
/export vsock:*(rw,no_root_squash,insecure,subtree_check)

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
utils/exportfs/exportfs.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/utils/exportfs/exportfs.c b/utils/exportfs/exportfs.c
index 3ded733..6bf67f1 100644
--- a/utils/exportfs/exportfs.c
+++ b/utils/exportfs/exportfs.c
@@ -299,6 +299,20 @@ static int exportfs_generic(char *arg, char *options, int verbose)
return 0;
}

+static int exportfs_vsock(char *arg, char *options, int verbose)
+{
+ char *path;
+
+ if ((path = strchr(arg + strlen("vsock:"), ':')) != NULL)
+ *path++ = '\0';
+
+ if (!path || *path != '/')
+ return 1;
+
+ exportfs_parsed(arg, path, options, verbose);
+ return 0;
+}
+
static int exportfs_ipv6(char *arg, char *options, int verbose)
{
char *path, *c;
@@ -332,6 +346,8 @@ exportfs(char *arg, char *options, int verbose)

if (*arg == '[')
failed = exportfs_ipv6(arg, options, verbose);
+ else if (strncmp(arg, "vsock:", strlen("vsock:")) == 0)
+ failed = exportfs_vsock(arg, options, verbose);
else
failed = exportfs_generic(arg, options, verbose);
if (failed)
@@ -412,6 +428,20 @@ static int unexportfs_generic(char *arg, int verbose)
return 0;
}

+static int unexportfs_vsock(char *arg, int verbose)
+{
+ char *path;
+
+ if ((path = strchr(arg + strlen("vsock:"), ':')) != NULL)
+ *path++ = '\0';
+
+ if (!path || *path != '/')
+ return 1;
+
+ unexportfs_parsed(arg, path, verbose);
+ return 0;
+}
+
static int unexportfs_ipv6(char *arg, int verbose)
{
char *path, *c;
@@ -445,6 +475,8 @@ unexportfs(char *arg, int verbose)

if (*arg == '[')
failed = unexportfs_ipv6(arg, verbose);
+ else if (strncmp(arg, "vsock:", strlen("vsock:")) == 0)
+ failed = unexportfs_vsock(arg, verbose);
else
failed = unexportfs_generic(arg, verbose);
if (failed)
--
2.9.4


2017-06-30 13:22:10

by Stefan Hajnoczi

[permalink] [raw]
Subject: [PATCH nfs-utils v2 12/12] tests: add "vsock:" exports(5) test case

This simple test case checks that the new syntax works for AF_VSOCK.

Signed-off-by: Stefan Hajnoczi <[email protected]>
---
tests/Makefile.am | 3 ++-
tests/t0002-vsock-basic.sh | 53 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 1 deletion(-)
create mode 100755 tests/t0002-vsock-basic.sh

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 1f96264..c4e2792 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -10,5 +10,6 @@ SUBDIRS = nsm_client

MAINTAINERCLEANFILES = Makefile.in

-TESTS = t0001-statd-basic-mon-unmon.sh
+TESTS = t0001-statd-basic-mon-unmon.sh \
+ t0002-vsock-basic.sh
EXTRA_DIST = test-lib.sh $(TESTS)
diff --git a/tests/t0002-vsock-basic.sh b/tests/t0002-vsock-basic.sh
new file mode 100755
index 0000000..21a3884
--- /dev/null
+++ b/tests/t0002-vsock-basic.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+#
+# t0002-vsock-basic.sh -- test basic NFSv4 over AF_VSOCK functionality
+#
+# Copyright (C) 2017 Red Hat, Stefan Hajnoczi <[email protected]>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 0211-1301 USA
+#
+
+. ./test-lib.sh
+
+check_root
+
+test_exportfs() {
+ client_addr="$1"
+ export_spec="$client_addr:$(realpath .)"
+
+ echo "TEST: $client_addr"
+
+ "$srcdir/../utils/exportfs/exportfs" "$export_spec"
+ if [ $? -ne 0 ]; then
+ echo "FAIL: exportfs failed"
+ exit 1
+ fi
+
+ expected_etab="$(realpath .) $client_addr("
+ grep --fixed-strings -q "$expected_etab" /var/lib/nfs/etab
+ if [ $? -ne 0 ]; then
+ echo "FAIL: etab doesn't contain entry"
+ exit 1
+ fi
+
+ "$srcdir/../utils/exportfs/exportfs" -u "$export_spec"
+ if [ $? -ne 0 ]; then
+ echo "FAIL: exportfs -u failed"
+ exit 1
+ fi
+}
+
+test_exportfs "vsock:3"
+test_exportfs "vsock:*"
--
2.9.4


2017-06-30 14:34:57

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 01/12] mount: don't use IPPROTO_UDP for address resolution



On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> Although getaddrinfo(3) with IPPROTO_UDP works fine for AF_INET and
> AF_INET6, the AF_VSOCK address family does not support IPPROTO_UDP and
> produces an error.
>
> Drop IPPROTO_UDP and use the 0 default (TCP) which works for all address
> families. Modern NFS uses TCP anyway so it's strange to specify UDP.
>
> Signed-off-by: Stefan Hajnoczi <[email protected]>
> Reviewed-by: Jeff Layton <[email protected]>
> ---
> utils/mount/stropts.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
> index c2a739b..99656dd 100644
> --- a/utils/mount/stropts.c
> +++ b/utils/mount/stropts.c
> @@ -909,9 +909,7 @@ static int nfs_try_mount(struct nfsmount_info *mi)
> int result = 0;
>
> if (mi->address == NULL) {
> - struct addrinfo hint = {
> - .ai_protocol = (int)IPPROTO_UDP,
> - };
> + struct addrinfo hint = {};
Just curious as to why not simply pass a NULL hints parameter
verses an empty hints structure?

steved.

> int error;
> struct addrinfo *address;
>
>

2017-06-30 14:40:52

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses



On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> Format vsock hosts as "vsock:<cid>" so the addresses can be easily
> distinguished from IPv4 and IPv6 addresses.
>
> Signed-off-by: Stefan Hajnoczi <[email protected]>
> ---
> utils/mount/network.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/utils/mount/network.c b/utils/mount/network.c
> index 281e935..b5dcaa5 100644
> --- a/utils/mount/network.c
> +++ b/utils/mount/network.c
> @@ -45,6 +45,8 @@
> #include <rpc/pmap_prot.h>
> #include <rpc/pmap_clnt.h>
>
> +#include <linux/vm_sockets.h>
In the previous patch you had this surrounded by #ifdef AF_VSOCK
I'm not keen on sprinkling a bunch ifdefs around since
I think it makes the code harder to read. So my question
is why is the ifdef need in the previous patch and
not needed in this patch and are they needed in the
previous patch?

steved.

> +
> #include "sockaddr.h"
> #include "xcommon.h"
> #include "mount.h"
> @@ -325,6 +327,12 @@ int nfs_string_to_sockaddr(const char *address, struct sockaddr *sap,
> int nfs_present_sockaddr(const struct sockaddr *sap, const socklen_t salen,
> char *buf, const size_t buflen)
> {
> + if (sap->sa_family == AF_VSOCK) {
> + snprintf(buf, buflen, "vsock:%u",
> + ((struct sockaddr_vm *)sap)->svm_cid);
> + return 1;
> + }
> +
> #ifdef HAVE_GETNAMEINFO
> int result;
>
>

2017-06-30 15:01:17

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid



On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. For similar
> reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.
>
> It is now possible to mount a file system from the host (hypervisor)
> over AF_VSOCK like this:
>
> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
>
> The VM's cid address is 3 and the hypervisor is 2.
So this is how vsocks are going to look...
There is not going to be a way to lookup an vsock address?
Since the format of the clientaddr parameter shouldn't
that be documented in the man page?

I guess a general question, is this new mount type
documented anywhere?

steved.
>
> Signed-off-by: Stefan Hajnoczi <[email protected]>
> ---
> support/nfs/getport.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/support/nfs/getport.c b/support/nfs/getport.c
> index 081594c..0b857af 100644
> --- a/support/nfs/getport.c
> +++ b/support/nfs/getport.c
> @@ -217,8 +217,7 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> struct protoent *proto;
>
> /*
> - * IANA does not define a protocol number for rdma netids,
> - * since "rdma" is not an IP protocol.
> + * IANA does not define protocol numbers for non-IP netids.
> */
> if (strcmp(netid, "rdma") == 0) {
> *family = AF_INET;
> @@ -230,6 +229,11 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> *protocol = NFSPROTO_RDMA;
> return 1;
> }
> + if (strcmp(netid, "vsock") == 0) {
> + *family = AF_VSOCK;
> + *protocol = 0;
> + return 1;
> + }
>
> nconf = getnetconfigent(netid);
> if (nconf == NULL)
> @@ -258,14 +262,18 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> struct protoent *proto;
>
> /*
> - * IANA does not define a protocol number for rdma netids,
> - * since "rdma" is not an IP protocol.
> + * IANA does not define protocol numbers for non-IP netids.
> */
> if (strcmp(netid, "rdma") == 0) {
> *family = AF_INET;
> *protocol = NFSPROTO_RDMA;
> return 1;
> }
> + if (strcmp(netid, "vsock") == 0) {
> + *family = AF_VSOCK;
> + *protocol = 0;
> + return 1;
> + }
>
> proto = getprotobyname(netid);
> if (proto == NULL)
>

2017-06-30 15:07:31

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 10/12] exportfs: add support for "vsock:" exports(5) syntax



On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> Allow exports to be restricted to AF_VSOCK clients:
>
> # exportfs vsock:3:/export
>
> and:
>
> # cat /etc/exports
> /export vsock:*(rw,no_root_squash,insecure,subtree_check)
Again it would be nice to use hostnames or address types
instead of a qualifier like 'vsock'... But if we do end
up going down this road, this definitely has to be
documented in the man page.

steved.
>
> Signed-off-by: Stefan Hajnoczi <[email protected]>
> ---
> utils/exportfs/exportfs.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/utils/exportfs/exportfs.c b/utils/exportfs/exportfs.c
> index 3ded733..6bf67f1 100644
> --- a/utils/exportfs/exportfs.c
> +++ b/utils/exportfs/exportfs.c
> @@ -299,6 +299,20 @@ static int exportfs_generic(char *arg, char *options, int verbose)
> return 0;
> }
>
> +static int exportfs_vsock(char *arg, char *options, int verbose)
> +{
> + char *path;
> +
> + if ((path = strchr(arg + strlen("vsock:"), ':')) != NULL)
> + *path++ = '\0';
> +
> + if (!path || *path != '/')
> + return 1;
> +
> + exportfs_parsed(arg, path, options, verbose);
> + return 0;
> +}
> +
> static int exportfs_ipv6(char *arg, char *options, int verbose)
> {
> char *path, *c;
> @@ -332,6 +346,8 @@ exportfs(char *arg, char *options, int verbose)
>
> if (*arg == '[')
> failed = exportfs_ipv6(arg, options, verbose);
> + else if (strncmp(arg, "vsock:", strlen("vsock:")) == 0)
> + failed = exportfs_vsock(arg, options, verbose);
> else
> failed = exportfs_generic(arg, options, verbose);
> if (failed)
> @@ -412,6 +428,20 @@ static int unexportfs_generic(char *arg, int verbose)
> return 0;
> }
>
> +static int unexportfs_vsock(char *arg, int verbose)
> +{
> + char *path;
> +
> + if ((path = strchr(arg + strlen("vsock:"), ':')) != NULL)
> + *path++ = '\0';
> +
> + if (!path || *path != '/')
> + return 1;
> +
> + unexportfs_parsed(arg, path, verbose);
> + return 0;
> +}
> +
> static int unexportfs_ipv6(char *arg, int verbose)
> {
> char *path, *c;
> @@ -445,6 +475,8 @@ unexportfs(char *arg, int verbose)
>
> if (*arg == '[')
> failed = unexportfs_ipv6(arg, verbose);
> + else if (strncmp(arg, "vsock:", strlen("vsock:")) == 0)
> + failed = unexportfs_vsock(arg, verbose);
> else
> failed = unexportfs_generic(arg, verbose);
> if (failed)
>

2017-06-30 15:25:34

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 11/12] nfsd: add --vsock (-v) option to nfsd



On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> The following command-line serves NFSv4.1 over AF_VSOCK:
>
> nfsd -TU -N3 -V4.1 -v 2049
So this specifying no-tcp, no-udp (which is no longer needed since
udp is off by default), no-v3, only-v4.1, vsock listiner,
and you are missing a -p because I don't think you want to
start up 2049 process.

How does this work with the standard NFS server? Are the
co-compatible?

Also the --vsock needs to be documented. Also can the
--vsock flag just mean all of these specifics... Meaning
so they don't have to be specified on the command line?

steved.

>
> Signed-off-by: Stefan Hajnoczi <[email protected]>
> ---
> utils/nfsd/nfssvc.h | 1 +
> utils/nfsd/nfsd.c | 18 +++++++++++++---
> utils/nfsd/nfssvc.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 78 insertions(+), 3 deletions(-)
>
> diff --git a/utils/nfsd/nfssvc.h b/utils/nfsd/nfssvc.h
> index 39ebf37..1d251ca 100644
> --- a/utils/nfsd/nfssvc.h
> +++ b/utils/nfsd/nfssvc.h
> @@ -26,6 +26,7 @@ int nfssvc_set_sockets(const unsigned int protobits,
> const char *host, const char *port);
> void nfssvc_set_time(const char *type, const int seconds);
> int nfssvc_set_rdmaport(const char *port);
> +int nfssvc_set_vsock(const char *port);
> void nfssvc_setvers(unsigned int ctlbits, unsigned int minorvers4, unsigned int minorvers4set);
> int nfssvc_threads(int nrservs);
> void nfssvc_get_minormask(unsigned int *mask);
> diff --git a/utils/nfsd/nfsd.c b/utils/nfsd/nfsd.c
> index 2b38249..f969da0 100644
> --- a/utils/nfsd/nfsd.c
> +++ b/utils/nfsd/nfsd.c
> @@ -53,6 +53,7 @@ static struct option longopts[] =
> { "rdma", 2, 0, 'R' },
> { "grace-time", 1, 0, 'G'},
> { "lease-time", 1, 0, 'L'},
> + { "vsock", 1, 0, 'v' },
> { NULL, 0, 0, 0 }
> };
>
> @@ -61,6 +62,7 @@ main(int argc, char **argv)
> {
> int count = NFSD_NPROC, c, i, error = 0, portnum, fd, found_one;
> char *p, *progname, *port, *rdma_port = NULL;
> + char *vsock_port = NULL;
> char **haddr = NULL;
> int hcounter = 0;
> struct conf_list *hosts;
> @@ -145,7 +147,7 @@ main(int argc, char **argv)
> }
> }
>
> - while ((c = getopt_long(argc, argv, "dH:hN:V:p:P:stTituUrG:L:", longopts, NULL)) != EOF) {
> + while ((c = getopt_long(argc, argv, "dH:hN:V:p:P:stTituUrG:L:v:", longopts, NULL)) != EOF) {
> switch(c) {
> case 'd':
> xlog_config(D_ALL, 1);
> @@ -180,7 +182,9 @@ main(int argc, char **argv)
> else
> rdma_port = "nfsrdma";
> break;
> -
> + case 'v': /* --vsock */
> + vsock_port = optarg;
> + break;
> case 'N':
> switch((c = strtol(optarg, &p, 0))) {
> case 4:
> @@ -306,7 +310,8 @@ main(int argc, char **argv)
> }
>
> if (NFSCTL_VERISSET(versbits, 4) &&
> - !NFSCTL_TCPISSET(protobits)) {
> + !NFSCTL_TCPISSET(protobits) &&
> + !vsock_port) {
> xlog(L_ERROR, "version 4 requires the TCP protocol");
> exit(1);
> }
> @@ -350,6 +355,13 @@ main(int argc, char **argv)
> if (!error)
> socket_up = 1;
> }
> +
> + if (vsock_port) {
> + error = nfssvc_set_vsock(vsock_port);
> + if (!error)
> + socket_up = 1;
> + }
> +
> set_threads:
> /* don't start any threads if unable to hand off any sockets */
> if (!socket_up) {
> diff --git a/utils/nfsd/nfssvc.c b/utils/nfsd/nfssvc.c
> index e8609c1..e867454 100644
> --- a/utils/nfsd/nfssvc.c
> +++ b/utils/nfsd/nfssvc.c
> @@ -15,6 +15,7 @@
> #include <netdb.h>
> #include <netinet/in.h>
> #include <arpa/inet.h>
> +#include <linux/vm_sockets.h>
> #include <sys/stat.h>
> #include <unistd.h>
> #include <fcntl.h>
> @@ -304,6 +305,67 @@ nfssvc_set_rdmaport(const char *port)
> return ret;
> }
>
> +int
> +nfssvc_set_vsock(const char *port)
> +{
> + struct sockaddr_vm svm;
> + int nport;
> + char buf[20];
> + int rc = 1;
> + int sockfd = -1;
> + int fd = -1;
> + char *ep;
> +
> + nport = strtol(port, &ep, 10);
> + if (!*port || *ep) {
> + xlog(L_ERROR, "unable to interpret port name %s",
> + port);
> + goto out;
> + }
> +
> + sockfd = socket(AF_VSOCK, SOCK_STREAM, 0);
> + if (sockfd < 0) {
> + xlog(L_ERROR, "unable to create AF_VSOCK socket: "
> + "errno %d (%m)", errno);
> + goto out;
> + }
> +
> + svm.svm_family = AF_VSOCK;
> + svm.svm_port = nport;
> + svm.svm_cid = VMADDR_CID_ANY;
> +
> + if (bind(sockfd, (struct sockaddr*)&svm, sizeof(svm))) {
> + xlog(L_ERROR, "unable to bind AF_VSOCK socket: "
> + "errno %d (%m)", errno);
> + goto out;
> + }
> +
> + if (listen(sockfd, 64)) {
> + xlog(L_ERROR, "unable to create listening socket: "
> + "errno %d (%m)", errno);
> + goto out;
> + }
> +
> + fd = open(NFSD_PORTS_FILE, O_WRONLY);
> + if (fd < 0) {
> + xlog(L_ERROR, "couldn't open ports file: errno "
> + "%d (%m)", errno);
> + goto out;
> + }
> + snprintf(buf, sizeof(buf), "%d\n", sockfd);
> + if (write(fd, buf, strlen(buf)) != (ssize_t)strlen(buf)) {
> + xlog(L_ERROR, "unable to request vsock services: %m");
> + goto out;
> + }
> + rc = 0;
> +out:
> + if (fd != -1)
> + close(fd);
> + if (sockfd != -1)
> + close(sockfd);
> + return rc;
> +}
> +
> void
> nfssvc_set_time(const char *type, const int seconds)
> {
>

2017-06-30 15:52:25

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

Hi Stefan-


> On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <[email protected]> wrote:
>
> Neither libtirpc nor getprotobyname(3) know about AF_VSOCK.

Why?

Basically you are building a lot of specialized
awareness in applications and leaving the
network layer alone. That seems backwards to me.


> For similar
> reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.

rdma/rdma6 are specified by standards, and appear
in the IANA Network Identifiers database:

https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml

Is there a standard netid for vsock? If not,
there needs to be some discussion with the nfsv4
Working Group to get this worked out.

Because AF_VSOCK is an address family and the RPC
framing is the same as TCP, the netid should be
something like "tcpv" and not "vsock". I've
complained about this before and there has been
no response of any kind.

I'll note that rdma/rdma6 do not use alternate
address families: an IP address is specified and
mapped to a GUID by the underlying transport.
We purposely did not expose GUIDs to NFS, which
is based on AF_INET/AF_INET6.

rdma co-exists with IP. vsock doesn't have this
fallback.

It might be a better approach to use well-known
(say, link-local or loopback) addresses and let
the underlying network layer figure it out.

Then hide all this stuff with DNS and let the
client mount the server by hostname and use
normal sockaddr's and "proto=tcp". Then you don't
need _any_ application layer changes.

Without hostnames, how does a client pick a
Kerberos service principal for the server?

Does rpcbind implement "vsock" netids?

Does the NFSv4.0 client advertise "vsock" in
SETCLIENTID, and provide a "vsock" callback
service?


> It is now possible to mount a file system from the host (hypervisor)
> over AF_VSOCK like this:
>
> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
>
> The VM's cid address is 3 and the hypervisor is 2.

The mount command is supposed to supply "clientaddr"
automatically. This mount option is exposed only for
debugging purposes or very special cases (like
disabling NFSv4 callback operations).

I mean the whole point of this exercise is to get
rid of network configuration, but here you're
adding the need to additionally specify both the
proto option and the clientaddr option to get this
to work. Seems like that isn't zero-configuration
at all.

Wouldn't it be nicer if it worked like this:

(guest)$ cat /etc/hosts
129.0.0.2 localhyper
(guest)$ mount.nfs localhyper:/export /mnt

And the result was a working NFS mount of the
local hypervisor, using whatever NFS version the
two both support, with no changes needed to the
NFS implementation or the understanding of the
system administrator?


> Signed-off-by: Stefan Hajnoczi <[email protected]>
> ---
> support/nfs/getport.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/support/nfs/getport.c b/support/nfs/getport.c
> index 081594c..0b857af 100644
> --- a/support/nfs/getport.c
> +++ b/support/nfs/getport.c
> @@ -217,8 +217,7 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> struct protoent *proto;
>
> /*
> - * IANA does not define a protocol number for rdma netids,
> - * since "rdma" is not an IP protocol.
> + * IANA does not define protocol numbers for non-IP netids.
> */
> if (strcmp(netid, "rdma") == 0) {
> *family = AF_INET;
> @@ -230,6 +229,11 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> *protocol = NFSPROTO_RDMA;
> return 1;
> }
> + if (strcmp(netid, "vsock") == 0) {
> + *family = AF_VSOCK;
> + *protocol = 0;
> + return 1;
> + }
>
> nconf = getnetconfigent(netid);
> if (nconf == NULL)
> @@ -258,14 +262,18 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> struct protoent *proto;
>
> /*
> - * IANA does not define a protocol number for rdma netids,
> - * since "rdma" is not an IP protocol.
> + * IANA does not define protocol numbers for non-IP netids.
> */
> if (strcmp(netid, "rdma") == 0) {
> *family = AF_INET;
> *protocol = NFSPROTO_RDMA;
> return 1;
> }
> + if (strcmp(netid, "vsock") == 0) {
> + *family = AF_VSOCK;
> + *protocol = 0;
> + return 1;
> + }
>
> proto = getprotobyname(netid);
> if (proto == NULL)
> --
> 2.9.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-07-03 08:55:14

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 01/12] mount: don't use IPPROTO_UDP for address resolution

On Fri, Jun 30, 2017 at 10:34:54AM -0400, Steve Dickson wrote:
>
>
> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> > Although getaddrinfo(3) with IPPROTO_UDP works fine for AF_INET and
> > AF_INET6, the AF_VSOCK address family does not support IPPROTO_UDP and
> > produces an error.
> >
> > Drop IPPROTO_UDP and use the 0 default (TCP) which works for all address
> > families. Modern NFS uses TCP anyway so it's strange to specify UDP.
> >
> > Signed-off-by: Stefan Hajnoczi <[email protected]>
> > Reviewed-by: Jeff Layton <[email protected]>
> > ---
> > utils/mount/stropts.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
> > index c2a739b..99656dd 100644
> > --- a/utils/mount/stropts.c
> > +++ b/utils/mount/stropts.c
> > @@ -909,9 +909,7 @@ static int nfs_try_mount(struct nfsmount_info *mi)
> > int result = 0;
> >
> > if (mi->address == NULL) {
> > - struct addrinfo hint = {
> > - .ai_protocol = (int)IPPROTO_UDP,
> > - };
> > + struct addrinfo hint = {};
> Just curious as to why not simply pass a NULL hints parameter
> verses an empty hints structure?

It's not clear from the surrounding unified diff context but the code
does set .ai_family later on:

hint.ai_family = (int)mi->family;

Would you prefer it if I move that up into the variable definition?

struct addrinfo hint = {
.ai_family = (int)mi->family,
};


Attachments:
(No filename) (1.42 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-03 09:00:56

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses

On Fri, Jun 30, 2017 at 10:40:49AM -0400, Steve Dickson wrote:
> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> > Format vsock hosts as "vsock:<cid>" so the addresses can be easily
> > distinguished from IPv4 and IPv6 addresses.
> >
> > Signed-off-by: Stefan Hajnoczi <[email protected]>
> > ---
> > utils/mount/network.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/utils/mount/network.c b/utils/mount/network.c
> > index 281e935..b5dcaa5 100644
> > --- a/utils/mount/network.c
> > +++ b/utils/mount/network.c
> > @@ -45,6 +45,8 @@
> > #include <rpc/pmap_prot.h>
> > #include <rpc/pmap_clnt.h>
> >
> > +#include <linux/vm_sockets.h>
> In the previous patch you had this surrounded by #ifdef AF_VSOCK
> I'm not keen on sprinkling a bunch ifdefs around since
> I think it makes the code harder to read. So my question
> is why is the ifdef need in the previous patch and
> not needed in this patch and are they needed in the
> previous patch?

The lack of #ifdef is my mistake.

My impression of nfs-utils is that the code is written to work in a
variety of configurations and still support older kernels. So I am
wrapping AF_VSOCK logic with an #ifdef.

AF_VSOCK has been in Linux since v3.9 in commit
d021c344051af91f42c5ba9fdedc176740cbd238 ("VSOCK: Introduce VM
Sockets").

I'd love to eliminate the #ifdefs, but would it be acceptable to simply
drop them?


Attachments:
(No filename) (1.37 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-03 16:35:49

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 01/12] mount: don't use IPPROTO_UDP for address resolution



On 07/03/2017 04:55 AM, Stefan Hajnoczi wrote:
> On Fri, Jun 30, 2017 at 10:34:54AM -0400, Steve Dickson wrote:
>>
>>
>> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
>>> Although getaddrinfo(3) with IPPROTO_UDP works fine for AF_INET and
>>> AF_INET6, the AF_VSOCK address family does not support IPPROTO_UDP and
>>> produces an error.
>>>
>>> Drop IPPROTO_UDP and use the 0 default (TCP) which works for all address
>>> families. Modern NFS uses TCP anyway so it's strange to specify UDP.
>>>
>>> Signed-off-by: Stefan Hajnoczi <[email protected]>
>>> Reviewed-by: Jeff Layton <[email protected]>
>>> ---
>>> utils/mount/stropts.c | 4 +---
>>> 1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
>>> index c2a739b..99656dd 100644
>>> --- a/utils/mount/stropts.c
>>> +++ b/utils/mount/stropts.c
>>> @@ -909,9 +909,7 @@ static int nfs_try_mount(struct nfsmount_info *mi)
>>> int result = 0;
>>>
>>> if (mi->address == NULL) {
>>> - struct addrinfo hint = {
>>> - .ai_protocol = (int)IPPROTO_UDP,
>>> - };
>>> + struct addrinfo hint = {};
>> Just curious as to why not simply pass a NULL hints parameter
>> verses an empty hints structure?
>
> It's not clear from the surrounding unified diff context but the code
> does set .ai_family later on:
>
> hint.ai_family = (int)mi->family;
Ah... I did miss this...

>
> Would you prefer it if I move that up into the variable definition?
>
> struct addrinfo hint = {
> .ai_family = (int)mi->family,
> };
>
It's good as is.

steved.


2017-07-03 16:51:12

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses



On 07/03/2017 05:00 AM, Stefan Hajnoczi wrote:
> On Fri, Jun 30, 2017 at 10:40:49AM -0400, Steve Dickson wrote:
>> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
>>> Format vsock hosts as "vsock:<cid>" so the addresses can be easily
>>> distinguished from IPv4 and IPv6 addresses.
>>>
>>> Signed-off-by: Stefan Hajnoczi <[email protected]>
>>> ---
>>> utils/mount/network.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/utils/mount/network.c b/utils/mount/network.c
>>> index 281e935..b5dcaa5 100644
>>> --- a/utils/mount/network.c
>>> +++ b/utils/mount/network.c
>>> @@ -45,6 +45,8 @@
>>> #include <rpc/pmap_prot.h>
>>> #include <rpc/pmap_clnt.h>
>>>
>>> +#include <linux/vm_sockets.h>
>> In the previous patch you had this surrounded by #ifdef AF_VSOCK
>> I'm not keen on sprinkling a bunch ifdefs around since
>> I think it makes the code harder to read. So my question
>> is why is the ifdef need in the previous patch and
>> not needed in this patch and are they needed in the
>> previous patch?
>
> The lack of #ifdef is my mistake.
Fair enough.

>
> My impression of nfs-utils is that the code is written to work in a
> variety of configurations and still support older kernels. So I am
> wrapping AF_VSOCK logic with an #ifdef.
>
> AF_VSOCK has been in Linux since v3.9 in commit
> d021c344051af91f42c5ba9fdedc176740cbd238 ("VSOCK: Introduce VM
> Sockets").
>
> I'd love to eliminate the #ifdefs, but would it be acceptable to simply
> drop them?
Very good question...

CC-ing Felix... Would not ifdef-ing AF_VSOCK break compiling
with the musl libc?

Are there other implementations out there that would cause breakage?
I'm pretty sure nfs-utils is only used in Linux environments, right?

steved.

>

2017-07-03 21:05:40

by Felix Janda

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses

Steve Dickson wrote:
>
>
> On 07/03/2017 05:00 AM, Stefan Hajnoczi wrote:
> > On Fri, Jun 30, 2017 at 10:40:49AM -0400, Steve Dickson wrote:
> >> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> >>> Format vsock hosts as "vsock:<cid>" so the addresses can be easily
> >>> distinguished from IPv4 and IPv6 addresses.
> >>>
> >>> Signed-off-by: Stefan Hajnoczi <[email protected]>
> >>> ---
> >>> utils/mount/network.c | 8 ++++++++
> >>> 1 file changed, 8 insertions(+)
> >>>
> >>> diff --git a/utils/mount/network.c b/utils/mount/network.c
> >>> index 281e935..b5dcaa5 100644
> >>> --- a/utils/mount/network.c
> >>> +++ b/utils/mount/network.c
> >>> @@ -45,6 +45,8 @@
> >>> #include <rpc/pmap_prot.h>
> >>> #include <rpc/pmap_clnt.h>
> >>>
> >>> +#include <linux/vm_sockets.h>
> >> In the previous patch you had this surrounded by #ifdef AF_VSOCK
> >> I'm not keen on sprinkling a bunch ifdefs around since
> >> I think it makes the code harder to read. So my question
> >> is why is the ifdef need in the previous patch and
> >> not needed in this patch and are they needed in the
> >> previous patch?
> >
> > The lack of #ifdef is my mistake.
> Fair enough.
>
> >
> > My impression of nfs-utils is that the code is written to work in a
> > variety of configurations and still support older kernels. So I am
> > wrapping AF_VSOCK logic with an #ifdef.
> >
> > AF_VSOCK has been in Linux since v3.9 in commit
> > d021c344051af91f42c5ba9fdedc176740cbd238 ("VSOCK: Introduce VM
> > Sockets").
> >
> > I'd love to eliminate the #ifdefs, but would it be acceptable to simply
> > drop them?
> Very good question...
>
> CC-ing Felix... Would not ifdef-ing AF_VSOCK break compiling
> with the musl libc?

musl has AF_VSOCK since 2013 (v0.9.12):

http://git.musl-libc.org/cgit/musl/commit/?id=3d4583c3fba8989a596506619277ecd68768d9ab

I doubt that many people are using a version of musl older than that.

Felix

> Are there other implementations out there that would cause breakage?
> I'm pretty sure nfs-utils is only used in Linux environments, right?
>
> steved.

2017-07-06 17:16:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses

On Mon, Jul 03, 2017 at 10:00:48AM +0100, Stefan Hajnoczi wrote:
> On Fri, Jun 30, 2017 at 10:40:49AM -0400, Steve Dickson wrote:
> > On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> > > Format vsock hosts as "vsock:<cid>" so the addresses can be easily
> > > distinguished from IPv4 and IPv6 addresses.
> > >
> > > Signed-off-by: Stefan Hajnoczi <[email protected]>
> > > ---
> > > utils/mount/network.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/utils/mount/network.c b/utils/mount/network.c
> > > index 281e935..b5dcaa5 100644
> > > --- a/utils/mount/network.c
> > > +++ b/utils/mount/network.c
> > > @@ -45,6 +45,8 @@
> > > #include <rpc/pmap_prot.h>
> > > #include <rpc/pmap_clnt.h>
> > >
> > > +#include <linux/vm_sockets.h>
> > In the previous patch you had this surrounded by #ifdef AF_VSOCK
> > I'm not keen on sprinkling a bunch ifdefs around since
> > I think it makes the code harder to read. So my question
> > is why is the ifdef need in the previous patch and
> > not needed in this patch and are they needed in the
> > previous patch?
>
> The lack of #ifdef is my mistake.
>
> My impression of nfs-utils is that the code is written to work in a
> variety of configurations and still support older kernels. So I am
> wrapping AF_VSOCK logic with an #ifdef.

It needs to be able to support older kernels at run-time, and I don't
understand how #ifdefs would help with that.

--b.

2017-07-07 03:18:05

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Jun 30 2017, Chuck Lever wrote:
>
> Wouldn't it be nicer if it worked like this:
>
> (guest)$ cat /etc/hosts
> 129.0.0.2 localhyper
> (guest)$ mount.nfs localhyper:/export /mnt
>
> And the result was a working NFS mount of the
> local hypervisor, using whatever NFS version the
> two both support, with no changes needed to the
> NFS implementation or the understanding of the
> system administrator?

Yes. Yes. Definitely Yes.
Though I suspect you mean "127.0.0.2", not "129..."??

There must be some way to redirect TCP connections to some address
transparently through to the vsock protocol.
The "sshuttle" program does this to transparently forward TCP connections
over an ssh connection. Using a similar technique to forward
connections over vsock shouldn't be hard.

Or is performance really critical, and you get too much copying when you
try forwarding connections? I suspect that is fixable, but it would be
a little less straight forward.

I would really *not* like to see vsock support being bolted into one
network tool after another.

NeilBrown


Attachments:
signature.asc (832.00 B)

2017-07-07 04:13:54

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Jul 07 2017, NeilBrown wrote:

> On Fri, Jun 30 2017, Chuck Lever wrote:
>>
>> Wouldn't it be nicer if it worked like this:
>>
>> (guest)$ cat /etc/hosts
>> 129.0.0.2 localhyper
>> (guest)$ mount.nfs localhyper:/export /mnt
>>
>> And the result was a working NFS mount of the
>> local hypervisor, using whatever NFS version the
>> two both support, with no changes needed to the
>> NFS implementation or the understanding of the
>> system administrator?
>
> Yes. Yes. Definitely Yes.
> Though I suspect you mean "127.0.0.2", not "129..."??
>
> There must be some way to redirect TCP connections to some address
> transparently through to the vsock protocol.
> The "sshuttle" program does this to transparently forward TCP connections
> over an ssh connection. Using a similar technique to forward
> connections over vsock shouldn't be hard.
>
> Or is performance really critical, and you get too much copying when you
> try forwarding connections? I suspect that is fixable, but it would be
> a little less straight forward.
>
> I would really *not* like to see vsock support being bolted into one
> network tool after another.

I've been digging into this a big more. I came across
https://vmsplice.net/~stefan/stefanha-kvm-forum-2015.pdf

which (on page 7) lists some reasons not to use TCP/IP between guest
and host.

. Adding & configuring guest interfaces is invasive

That is possibly true. But adding support for a new address family to
NFS, NFSD, and nfs-utils is also very invasive. You would need to
install this software on the guest. I suggest you install different
software on the guest which solves the problem better.

. Prone to break due to config changes inside guest

This is, I suspect, a key issue. With vsock, the address of the
guest-side interface is defined by options passed to qemu. With
normal IP addressing, the guest has to configure the address.

However I think that IPv6 autoconfig makes this work well without vsock.
If I create a bridge interface on the host, run
ip -6 addr add fe80::1 dev br0
then run a guest with
-net nic,macaddr=Ch:oo:se:an:ad:dr \
-net bridge,br=br0 \

then the client can
mount [fe80::1%interfacename]:/path /mountpoint

and the host will see a connection from
fe80::ch:oo:se:an:ad:dr

So from the guest side, I have achieved zero-config NFS mounts from the
host.

I don't think the server can filter connections based on which interface
a link-local address came from. If that was a problem that someone
wanted to be fixed, I'm sure we can fix it.

If you need to be sure that clients don't fake their IPv6 address, I'm
sure netfilter is up to the task.


. Creates network interfaces on host that must be managed

What vsock does is effectively create a hidden interface on the host that only the
kernel knows about and so the sysadmin cannot break it. The only
difference between this and an explicit interface on the host is that
the latter requires a competent sysadmin.

If you have other reasons for preferring the use of vsock for NFS, I'd be
happy to hear them. So far I'm not convinced.

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2017-07-07 04:14:31

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid


> On Jul 6, 2017, at 11:17 PM, NeilBrown <[email protected]> wrote:
>
> On Fri, Jun 30 2017, Chuck Lever wrote:
>>
>> Wouldn't it be nicer if it worked like this:
>>
>> (guest)$ cat /etc/hosts
>> 129.0.0.2 localhyper
>> (guest)$ mount.nfs localhyper:/export /mnt
>>
>> And the result was a working NFS mount of the
>> local hypervisor, using whatever NFS version the
>> two both support, with no changes needed to the
>> NFS implementation or the understanding of the
>> system administrator?
>
> Yes. Yes. Definitely Yes.
> Though I suspect you mean "127.0.0.2", not "129..."??

I meant 129.x. 127.0.0 has well-defined semantics as a
loopback to the same host. The hypervisor is clearly a
network entity that is distinct from the local host.

But maybe you could set up 127.0.0.2, .3 for this purpose?
Someone smarter than me could figure out what is best to
use here. I'm not familiar with all the rules for loopback
and link-local IPv4 addressing.

Loopback is the correct analogy, though. It has predictable
host numbers that can be known in advance, and loopback
networking is set up automatically on a host, without the
need for a physical network interface. These are the stated
goals for vsock.

The benefit for re-using loopback here is that every
application that can speak AF_INET can already use it. For
NFS that means all the traditional features work: rpcbind,
NFSv4.0 callback, IP-based share access control, and Kerberos,
and especially DNS so that you can mount by hostname.


> There must be some way to redirect TCP connections to some address
> transparently through to the vsock protocol.
> The "sshuttle" program does this to transparently forward TCP connections
> over an ssh connection. Using a similar technique to forward
> connections over vsock shouldn't be hard.
>
> Or is performance really critical, and you get too much copying when you
> try forwarding connections? I suspect that is fixable, but it would be
> a little less straight forward.
>
> I would really *not* like to see vsock support being bolted into one
> network tool after another.


--
Chuck Lever




2017-07-10 18:10:00

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses

On Thu, Jul 06, 2017 at 01:16:26PM -0400, J. Bruce Fields wrote:
> On Mon, Jul 03, 2017 at 10:00:48AM +0100, Stefan Hajnoczi wrote:
> > On Fri, Jun 30, 2017 at 10:40:49AM -0400, Steve Dickson wrote:
> > > On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> > > > Format vsock hosts as "vsock:<cid>" so the addresses can be easily
> > > > distinguished from IPv4 and IPv6 addresses.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <[email protected]>
> > > > ---
> > > > utils/mount/network.c | 8 ++++++++
> > > > 1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/utils/mount/network.c b/utils/mount/network.c
> > > > index 281e935..b5dcaa5 100644
> > > > --- a/utils/mount/network.c
> > > > +++ b/utils/mount/network.c
> > > > @@ -45,6 +45,8 @@
> > > > #include <rpc/pmap_prot.h>
> > > > #include <rpc/pmap_clnt.h>
> > > >
> > > > +#include <linux/vm_sockets.h>
> > > In the previous patch you had this surrounded by #ifdef AF_VSOCK
> > > I'm not keen on sprinkling a bunch ifdefs around since
> > > I think it makes the code harder to read. So my question
> > > is why is the ifdef need in the previous patch and
> > > not needed in this patch and are they needed in the
> > > previous patch?
> >
> > The lack of #ifdef is my mistake.
> >
> > My impression of nfs-utils is that the code is written to work in a
> > variety of configurations and still support older kernels. So I am
> > wrapping AF_VSOCK logic with an #ifdef.
>
> It needs to be able to support older kernels at run-time, and I don't
> understand how #ifdefs would help with that.

I will test the nfs-utils binaries against a kernel without AF_VSOCK
support. What should happen is that an error is returned when creating
a socket, adding an export, etc fails. I'll make sure the errors are
graceful/meaningful.

Stefan


Attachments:
(No filename) (1.77 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-10 18:14:50

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses

On Mon, Jul 03, 2017 at 12:51:07PM -0400, Steve Dickson wrote:
> On 07/03/2017 05:00 AM, Stefan Hajnoczi wrote:
> > On Fri, Jun 30, 2017 at 10:40:49AM -0400, Steve Dickson wrote:
> >> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> Are there other implementations out there that would cause breakage?
> I'm pretty sure nfs-utils is only used in Linux environments, right?

Does nfs-utils have a set of minimum distro versions that are supported?

For example, "the latest nfs-utils release is supported on RHEL 6, SLES
11, and Debian 8 jessie".

If yes, then I could go check that these distros ship AF_VSOCK and
<linux/vm_sockets.h>.

Stefan


Attachments:
(No filename) (644.00 B)
signature.asc (455.00 B)
Download all attachments

2017-07-10 18:35:12

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Jun 30, 2017 at 11:01:13AM -0400, Steve Dickson wrote:
> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> > Neither libtirpc nor getprotobyname(3) know about AF_VSOCK. For similar
> > reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.
> >
> > It is now possible to mount a file system from the host (hypervisor)
> > over AF_VSOCK like this:
> >
> > (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
> >
> > The VM's cid address is 3 and the hypervisor is 2.
> So this is how vsocks are going to look...
> There is not going to be a way to lookup an vsock address?
> Since the format of the clientaddr parameter shouldn't
> that be documented in the man page?

AF_VSOCK does not have name resolution. The scope of the CID addresses
is just the hypervisor that the VMs are running on. Inter-VM
communication is not allowed. The virtualization software has the CIDs
so there's not much use for name resolution.

> I guess a general question, is this new mount type
> documented anywhere?

Thanks for pointing this out. I'll update the man pages in the next
revision of this patch series.


Attachments:
(No filename) (1.11 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-10 18:40:01

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 11/12] nfsd: add --vsock (-v) option to nfsd

On Fri, Jun 30, 2017 at 11:25:31AM -0400, Steve Dickson wrote:
> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> > The following command-line serves NFSv4.1 over AF_VSOCK:
> >
> > nfsd -TU -N3 -V4.1 -v 2049
> So this specifying no-tcp, no-udp (which is no longer needed since
> udp is off by default), no-v3, only-v4.1, vsock listiner,
> and you are missing a -p because I don't think you want to
> start up 2049 process.

-v takes an argument (the AF_VSOCK port number to listen on), so the
2049 is consumed and not interpreted as the number of processes.

> How does this work with the standard NFS server? Are the
> co-compatible?

I'll test tcp + vsock in a single nfsd and make sure it works for the
next revision of this patch series. Nothing in the kernel or nfs-utils
code intentionally prevents them from being used at the same time. My
intention is that /etc/exports dictates which exports are visible over
TCP, vsock, or both.

> Also the --vsock needs to be documented. Also can the
> --vsock flag just mean all of these specifics... Meaning
> so they don't have to be specified on the command line?

Thanks, will document the --vsock <port> option.


Attachments:
(No filename) (1.14 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-12 14:26:10

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 03/12] mount: present AF_VSOCK addresses



On 07/10/2017 02:14 PM, Stefan Hajnoczi wrote:
> On Mon, Jul 03, 2017 at 12:51:07PM -0400, Steve Dickson wrote:
>> On 07/03/2017 05:00 AM, Stefan Hajnoczi wrote:
>>> On Fri, Jun 30, 2017 at 10:40:49AM -0400, Steve Dickson wrote:
>>>> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
>> Are there other implementations out there that would cause breakage?
>> I'm pretty sure nfs-utils is only used in Linux environments, right?
>
> Does nfs-utils have a set of minimum distro versions that are supported?
Not that I'm aware of... Now there are certain package versions nfs-utils
is dependent on but not particular distro.

steved.

>
> For example, "the latest nfs-utils release is supported on RHEL 6, SLES
> 11, and Debian 8 jessie".
>
> If yes, then I could go check that these distros ship AF_VSOCK and
> <linux/vm_sockets.h>.
>
> Stefan
>

2017-07-19 15:11:48

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Jun 30, 2017 at 11:52:15AM -0400, Chuck Lever wrote:
> > On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <[email protected]> wrote:
> >
> > Neither libtirpc nor getprotobyname(3) know about AF_VSOCK.
>
> Why?
>
> Basically you are building a lot of specialized
> awareness in applications and leaving the
> network layer alone. That seems backwards to me.

Yes. I posted glibc patches but there were concerns that getaddrinfo(3)
is IPv4/IPv6 only and applications need to be ported to AF_VSOCK anyway,
so there's not much to gain by adding it:
https://cygwin.com/ml/libc-alpha/2016-10/msg00126.html

> > For similar
> > reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.
>
> rdma/rdma6 are specified by standards, and appear
> in the IANA Network Identifiers database:
>
> https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml
>
> Is there a standard netid for vsock? If not,
> there needs to be some discussion with the nfsv4
> Working Group to get this worked out.
>
> Because AF_VSOCK is an address family and the RPC
> framing is the same as TCP, the netid should be
> something like "tcpv" and not "vsock". I've
> complained about this before and there has been
> no response of any kind.
>
> I'll note that rdma/rdma6 do not use alternate
> address families: an IP address is specified and
> mapped to a GUID by the underlying transport.
> We purposely did not expose GUIDs to NFS, which
> is based on AF_INET/AF_INET6.
>
> rdma co-exists with IP. vsock doesn't have this
> fallback.

Thanks for explaining the tcp + rdma relationship, that makes sense.

There is no standard netid for vsock yet.

Sorry I didn't ask about "tcpv" when you originally proposed it, I lost
track of that discussion. You said:

If this really is just TCP on a new address family, then "tcpv"
is more in line with previous work, and you can get away with
just an IANA action for a new netid, since RPC-over-TCP is
already specified.

Does "just TCP" mean a "connection-oriented, stream-oriented transport
using RFC 1831 Record Marking"? Or does "TCP" have any other
attributes?

NFS over AF_VSOCK definitely is "connection-oriented, stream-oriented
transport using RFC 1831 Record Marking". I'm just not sure whether
there are any other assumptions beyond this that AF_VSOCK might not meet
because it isn't IP and has 32-bit port numbers.

> It might be a better approach to use well-known
> (say, link-local or loopback) addresses and let
> the underlying network layer figure it out.
>
> Then hide all this stuff with DNS and let the
> client mount the server by hostname and use
> normal sockaddr's and "proto=tcp". Then you don't
> need _any_ application layer changes.
>
> Without hostnames, how does a client pick a
> Kerberos service principal for the server?

I'm not sure Kerberos would be used with AF_VSOCK. The hypervisor knows
about the VMs, addresses cannot be spoofed, and VMs can only communicate
with the hypervisor. This leads to a simple trust relationship.

> Does rpcbind implement "vsock" netids?

I have not modified rpcbind. My understanding is that rpcbind isn't
required for NFSv4. Since this is a new transport there is no plan for
it to run old protocol versions.

> Does the NFSv4.0 client advertise "vsock" in
> SETCLIENTID, and provide a "vsock" callback
> service?

The kernel patches implement backchannel support although I haven't
exercised it.

> > It is now possible to mount a file system from the host (hypervisor)
> > over AF_VSOCK like this:
> >
> > (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
> >
> > The VM's cid address is 3 and the hypervisor is 2.
>
> The mount command is supposed to supply "clientaddr"
> automatically. This mount option is exposed only for
> debugging purposes or very special cases (like
> disabling NFSv4 callback operations).
>
> I mean the whole point of this exercise is to get
> rid of network configuration, but here you're
> adding the need to additionally specify both the
> proto option and the clientaddr option to get this
> to work. Seems like that isn't zero-configuration
> at all.

Thanks for pointing this out. Will fix in v2, there should be no need
to manually specify the client address, this is a remnant from early
development.

> Wouldn't it be nicer if it worked like this:
>
> (guest)$ cat /etc/hosts
> 129.0.0.2 localhyper
> (guest)$ mount.nfs localhyper:/export /mnt
>
> And the result was a working NFS mount of the
> local hypervisor, using whatever NFS version the
> two both support, with no changes needed to the
> NFS implementation or the understanding of the
> system administrator?

This is an interesting idea, thanks! It would be neat to have AF_INET
access over the loopback interface on both guest and host.


Attachments:
(No filename) (4.68 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-19 15:35:36

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Wed, 2017-07-19 at 16:11 +0100, Stefan Hajnoczi wrote:
> On Fri, Jun 30, 2017 at 11:52:15AM -0400, Chuck Lever wrote:
> > > On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <[email protected]> wrote:
> > >
> > > Neither libtirpc nor getprotobyname(3) know about AF_VSOCK.
> >
> > Why?
> >
> > Basically you are building a lot of specialized
> > awareness in applications and leaving the
> > network layer alone. That seems backwards to me.
>
> Yes. I posted glibc patches but there were concerns that getaddrinfo(3)
> is IPv4/IPv6 only and applications need to be ported to AF_VSOCK anyway,
> so there's not much to gain by adding it:
> https://cygwin.com/ml/libc-alpha/2016-10/msg00126.html
>
> > > For similar
> > > reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.
> >
> > rdma/rdma6 are specified by standards, and appear
> > in the IANA Network Identifiers database:
> >
> > https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml
> >
> > Is there a standard netid for vsock? If not,
> > there needs to be some discussion with the nfsv4
> > Working Group to get this worked out.
> >
> > Because AF_VSOCK is an address family and the RPC
> > framing is the same as TCP, the netid should be
> > something like "tcpv" and not "vsock". I've
> > complained about this before and there has been
> > no response of any kind.
> >
> > I'll note that rdma/rdma6 do not use alternate
> > address families: an IP address is specified and
> > mapped to a GUID by the underlying transport.
> > We purposely did not expose GUIDs to NFS, which
> > is based on AF_INET/AF_INET6.
> >
> > rdma co-exists with IP. vsock doesn't have this
> > fallback.
>
> Thanks for explaining the tcp + rdma relationship, that makes sense.
>
> There is no standard netid for vsock yet.
>
> Sorry I didn't ask about "tcpv" when you originally proposed it, I lost
> track of that discussion. You said:
>
> If this really is just TCP on a new address family, then "tcpv"
> is more in line with previous work, and you can get away with
> just an IANA action for a new netid, since RPC-over-TCP is
> already specified.
>
> Does "just TCP" mean a "connection-oriented, stream-oriented transport
> using RFC 1831 Record Marking"? Or does "TCP" have any other
> attributes?
>
> NFS over AF_VSOCK definitely is "connection-oriented, stream-oriented
> transport using RFC 1831 Record Marking". I'm just not sure whether
> there are any other assumptions beyond this that AF_VSOCK might not meet
> because it isn't IP and has 32-bit port numbers.
>
> > It might be a better approach to use well-known
> > (say, link-local or loopback) addresses and let
> > the underlying network layer figure it out.
> >
> > Then hide all this stuff with DNS and let the
> > client mount the server by hostname and use
> > normal sockaddr's and "proto=tcp". Then you don't
> > need _any_ application layer changes.
> >
> > Without hostnames, how does a client pick a
> > Kerberos service principal for the server?
>
> I'm not sure Kerberos would be used with AF_VSOCK. The hypervisor knows
> about the VMs, addresses cannot be spoofed, and VMs can only communicate
> with the hypervisor. This leads to a simple trust relationship.
>
> > Does rpcbind implement "vsock" netids?
>
> I have not modified rpcbind. My understanding is that rpcbind isn't
> required for NFSv4. Since this is a new transport there is no plan for
> it to run old protocol versions.
>
> > Does the NFSv4.0 client advertise "vsock" in
> > SETCLIENTID, and provide a "vsock" callback
> > service?
>
> The kernel patches implement backchannel support although I haven't
> exercised it.
>
> > > It is now possible to mount a file system from the host (hypervisor)
> > > over AF_VSOCK like this:
> > >
> > > (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
> > >
> > > The VM's cid address is 3 and the hypervisor is 2.
> >
> > The mount command is supposed to supply "clientaddr"
> > automatically. This mount option is exposed only for
> > debugging purposes or very special cases (like
> > disabling NFSv4 callback operations).
> >
> > I mean the whole point of this exercise is to get
> > rid of network configuration, but here you're
> > adding the need to additionally specify both the
> > proto option and the clientaddr option to get this
> > to work. Seems like that isn't zero-configuration
> > at all.
>
> Thanks for pointing this out. Will fix in v2, there should be no need
> to manually specify the client address, this is a remnant from early
> development.
>
> > Wouldn't it be nicer if it worked like this:
> >
> > (guest)$ cat /etc/hosts
> > 129.0.0.2 localhyper
> > (guest)$ mount.nfs localhyper:/export /mnt
> >
> > And the result was a working NFS mount of the
> > local hypervisor, using whatever NFS version the
> > two both support, with no changes needed to the
> > NFS implementation or the understanding of the
> > system administrator?
>
> This is an interesting idea, thanks! It would be neat to have AF_INET
> access over the loopback interface on both guest and host.

I too really like this idea better as it seems a lot less invasive.
Existing applications would "just work" without needing to be changed,
and you get name resolution to boot.

Chuck, is 129.0.0.X within some reserved block of addrs such that you
could get a standard range for this? I didn't see that block listed here
during my half-assed web search:

https://en.wikipedia.org/wiki/Reserved_IP_addresses

Maybe you meant 192.0.0.X ? It might be easier and more future proof to
get a chunk of ipv6 addrs carved out though.

--
Jeff Layton <[email protected]>

2017-07-19 15:40:24

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid


> On Jul 19, 2017, at 17:35, Jeff Layton <[email protected]> wrote:
>
> On Wed, 2017-07-19 at 16:11 +0100, Stefan Hajnoczi wrote:
>> On Fri, Jun 30, 2017 at 11:52:15AM -0400, Chuck Lever wrote:
>>>> On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <[email protected]> wrote:
>>>>
>>>> Neither libtirpc nor getprotobyname(3) know about AF_VSOCK.
>>>
>>> Why?
>>>
>>> Basically you are building a lot of specialized
>>> awareness in applications and leaving the
>>> network layer alone. That seems backwards to me.
>>
>> Yes. I posted glibc patches but there were concerns that getaddrinfo(3)
>> is IPv4/IPv6 only and applications need to be ported to AF_VSOCK anyway,
>> so there's not much to gain by adding it:
>> https://cygwin.com/ml/libc-alpha/2016-10/msg00126.html
>>
>>>> For similar
>>>> reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.
>>>
>>> rdma/rdma6 are specified by standards, and appear
>>> in the IANA Network Identifiers database:
>>>
>>> https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml
>>>
>>> Is there a standard netid for vsock? If not,
>>> there needs to be some discussion with the nfsv4
>>> Working Group to get this worked out.
>>>
>>> Because AF_VSOCK is an address family and the RPC
>>> framing is the same as TCP, the netid should be
>>> something like "tcpv" and not "vsock". I've
>>> complained about this before and there has been
>>> no response of any kind.
>>>
>>> I'll note that rdma/rdma6 do not use alternate
>>> address families: an IP address is specified and
>>> mapped to a GUID by the underlying transport.
>>> We purposely did not expose GUIDs to NFS, which
>>> is based on AF_INET/AF_INET6.
>>>
>>> rdma co-exists with IP. vsock doesn't have this
>>> fallback.
>>
>> Thanks for explaining the tcp + rdma relationship, that makes sense.
>>
>> There is no standard netid for vsock yet.
>>
>> Sorry I didn't ask about "tcpv" when you originally proposed it, I lost
>> track of that discussion. You said:
>>
>> If this really is just TCP on a new address family, then "tcpv"
>> is more in line with previous work, and you can get away with
>> just an IANA action for a new netid, since RPC-over-TCP is
>> already specified.
>>
>> Does "just TCP" mean a "connection-oriented, stream-oriented transport
>> using RFC 1831 Record Marking"? Or does "TCP" have any other
>> attributes?
>>
>> NFS over AF_VSOCK definitely is "connection-oriented, stream-oriented
>> transport using RFC 1831 Record Marking". I'm just not sure whether
>> there are any other assumptions beyond this that AF_VSOCK might not meet
>> because it isn't IP and has 32-bit port numbers.
>>
>>> It might be a better approach to use well-known
>>> (say, link-local or loopback) addresses and let
>>> the underlying network layer figure it out.
>>>
>>> Then hide all this stuff with DNS and let the
>>> client mount the server by hostname and use
>>> normal sockaddr's and "proto=tcp". Then you don't
>>> need _any_ application layer changes.
>>>
>>> Without hostnames, how does a client pick a
>>> Kerberos service principal for the server?
>>
>> I'm not sure Kerberos would be used with AF_VSOCK. The hypervisor knows
>> about the VMs, addresses cannot be spoofed, and VMs can only communicate
>> with the hypervisor. This leads to a simple trust relationship.
>>
>>> Does rpcbind implement "vsock" netids?
>>
>> I have not modified rpcbind. My understanding is that rpcbind isn't
>> required for NFSv4. Since this is a new transport there is no plan for
>> it to run old protocol versions.
>>
>>> Does the NFSv4.0 client advertise "vsock" in
>>> SETCLIENTID, and provide a "vsock" callback
>>> service?
>>
>> The kernel patches implement backchannel support although I haven't
>> exercised it.
>>
>>>> It is now possible to mount a file system from the host (hypervisor)
>>>> over AF_VSOCK like this:
>>>>
>>>> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
>>>>
>>>> The VM's cid address is 3 and the hypervisor is 2.
>>>
>>> The mount command is supposed to supply "clientaddr"
>>> automatically. This mount option is exposed only for
>>> debugging purposes or very special cases (like
>>> disabling NFSv4 callback operations).
>>>
>>> I mean the whole point of this exercise is to get
>>> rid of network configuration, but here you're
>>> adding the need to additionally specify both the
>>> proto option and the clientaddr option to get this
>>> to work. Seems like that isn't zero-configuration
>>> at all.
>>
>> Thanks for pointing this out. Will fix in v2, there should be no need
>> to manually specify the client address, this is a remnant from early
>> development.
>>
>>> Wouldn't it be nicer if it worked like this:
>>>
>>> (guest)$ cat /etc/hosts
>>> 129.0.0.2 localhyper
>>> (guest)$ mount.nfs localhyper:/export /mnt
>>>
>>> And the result was a working NFS mount of the
>>> local hypervisor, using whatever NFS version the
>>> two both support, with no changes needed to the
>>> NFS implementation or the understanding of the
>>> system administrator?
>>
>> This is an interesting idea, thanks! It would be neat to have AF_INET
>> access over the loopback interface on both guest and host.
>
> I too really like this idea better as it seems a lot less invasive.
> Existing applications would "just work" without needing to be changed,
> and you get name resolution to boot.
>
> Chuck, is 129.0.0.X within some reserved block of addrs such that you
> could get a standard range for this? I didn't see that block listed here
> during my half-assed web search:
>
> https://en.wikipedia.org/wiki/Reserved_IP_addresses

I thought there would be some range of link-local addresses
that could make this work with IPv4, similar to 192. or 10.
that are "unroutable" site-local addresses.

If there isn't then IPv6 might have what we need.


> Maybe you meant 192.0.0.X ? It might be easier and more future proof to
> get a chunk of ipv6 addrs carved out though.


--
Chuck Lever




2017-07-19 15:50:32

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid


> On Jul 19, 2017, at 17:11, Stefan Hajnoczi <[email protected]> wrote:
>
> On Fri, Jun 30, 2017 at 11:52:15AM -0400, Chuck Lever wrote:
>>> On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <[email protected]> wrote:
>>>
>>> Neither libtirpc nor getprotobyname(3) know about AF_VSOCK.
>>
>> Why?
>>
>> Basically you are building a lot of specialized
>> awareness in applications and leaving the
>> network layer alone. That seems backwards to me.
>
> Yes. I posted glibc patches but there were concerns that getaddrinfo(3)
> is IPv4/IPv6 only and applications need to be ported to AF_VSOCK anyway,
> so there's not much to gain by adding it:
> https://cygwin.com/ml/libc-alpha/2016-10/msg00126.html
>
>>> For similar
>>> reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.
>>
>> rdma/rdma6 are specified by standards, and appear
>> in the IANA Network Identifiers database:
>>
>> https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml
>>
>> Is there a standard netid for vsock? If not,
>> there needs to be some discussion with the nfsv4
>> Working Group to get this worked out.
>>
>> Because AF_VSOCK is an address family and the RPC
>> framing is the same as TCP, the netid should be
>> something like "tcpv" and not "vsock". I've
>> complained about this before and there has been
>> no response of any kind.
>>
>> I'll note that rdma/rdma6 do not use alternate
>> address families: an IP address is specified and
>> mapped to a GUID by the underlying transport.
>> We purposely did not expose GUIDs to NFS, which
>> is based on AF_INET/AF_INET6.
>>
>> rdma co-exists with IP. vsock doesn't have this
>> fallback.
>
> Thanks for explaining the tcp + rdma relationship, that makes sense.
>
> There is no standard netid for vsock yet.
>
> Sorry I didn't ask about "tcpv" when you originally proposed it, I lost
> track of that discussion. You said:
>
> If this really is just TCP on a new address family, then "tcpv"
> is more in line with previous work, and you can get away with
> just an IANA action for a new netid, since RPC-over-TCP is
> already specified.
>
> Does "just TCP" mean a "connection-oriented, stream-oriented transport
> using RFC 1831 Record Marking"? Or does "TCP" have any other
> attributes?
>
> NFS over AF_VSOCK definitely is "connection-oriented, stream-oriented
> transport using RFC 1831 Record Marking". I'm just not sure whether
> there are any other assumptions beyond this that AF_VSOCK might not meet
> because it isn't IP and has 32-bit port numbers.

Right, it is TCP in the sense that it is connection-oriented
and so on. It looks like a stream socket to the RPC client.
TI-RPC calls this "tpi_cots_ord".

But it isn't TCP in the sense that you aren't moving TCP
segments over the link.

I think the "IP / 32-bit ports" is handled entirely within
the address variant that your link is using.

>> It might be a better approach to use well-known
>> (say, link-local or loopback) addresses and let
>> the underlying network layer figure it out.
>>
>> Then hide all this stuff with DNS and let the
>> client mount the server by hostname and use
>> normal sockaddr's and "proto=tcp". Then you don't
>> need _any_ application layer changes.
>>
>> Without hostnames, how does a client pick a
>> Kerberos service principal for the server?
>
> I'm not sure Kerberos would be used with AF_VSOCK. The hypervisor knows
> about the VMs, addresses cannot be spoofed, and VMs can only communicate
> with the hypervisor. This leads to a simple trust relationship.

The clients can be exploited if they are exposed in any
way to remote users. Having at least sec=krb5 might be
a way to block attackers from accessing data on the NFS
server from a compromised client.

In any event, NFSv4 will need ID mapping. Do you have a
sense of how the server and clients will determine their
NFSv4 ID mapping domain name? How will the server and
client user ID databases be kept in synchrony? You might
have some issues if there is a "cel" in multiple guests
that are actually different users.


>> Does rpcbind implement "vsock" netids?
>
> I have not modified rpcbind. My understanding is that rpcbind isn't
> required for NFSv4. Since this is a new transport there is no plan for
> it to run old protocol versions.
>
>> Does the NFSv4.0 client advertise "vsock" in
>> SETCLIENTID, and provide a "vsock" callback
>> service?
>
> The kernel patches implement backchannel support although I haven't
> exercised it.
>
>>> It is now possible to mount a file system from the host (hypervisor)
>>> over AF_VSOCK like this:
>>>
>>> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
>>>
>>> The VM's cid address is 3 and the hypervisor is 2.
>>
>> The mount command is supposed to supply "clientaddr"
>> automatically. This mount option is exposed only for
>> debugging purposes or very special cases (like
>> disabling NFSv4 callback operations).
>>
>> I mean the whole point of this exercise is to get
>> rid of network configuration, but here you're
>> adding the need to additionally specify both the
>> proto option and the clientaddr option to get this
>> to work. Seems like that isn't zero-configuration
>> at all.
>
> Thanks for pointing this out. Will fix in v2, there should be no need
> to manually specify the client address, this is a remnant from early
> development.
>
>> Wouldn't it be nicer if it worked like this:
>>
>> (guest)$ cat /etc/hosts
>> 129.0.0.2 localhyper
>> (guest)$ mount.nfs localhyper:/export /mnt
>>
>> And the result was a working NFS mount of the
>> local hypervisor, using whatever NFS version the
>> two both support, with no changes needed to the
>> NFS implementation or the understanding of the
>> system administrator?
>
> This is an interesting idea, thanks! It would be neat to have AF_INET
> access over the loopback interface on both guest and host.

--
Chuck Lever




2017-07-25 10:05:15

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> On Fri, Jul 07 2017, NeilBrown wrote:
>
> > On Fri, Jun 30 2017, Chuck Lever wrote:
> >>
> >> Wouldn't it be nicer if it worked like this:
> >>
> >> (guest)$ cat /etc/hosts
> >> 129.0.0.2 localhyper
> >> (guest)$ mount.nfs localhyper:/export /mnt
> >>
> >> And the result was a working NFS mount of the
> >> local hypervisor, using whatever NFS version the
> >> two both support, with no changes needed to the
> >> NFS implementation or the understanding of the
> >> system administrator?
> >
> > Yes. Yes. Definitely Yes.
> > Though I suspect you mean "127.0.0.2", not "129..."??
> >
> > There must be some way to redirect TCP connections to some address
> > transparently through to the vsock protocol.
> > The "sshuttle" program does this to transparently forward TCP connections
> > over an ssh connection. Using a similar technique to forward
> > connections over vsock shouldn't be hard.
> >
> > Or is performance really critical, and you get too much copying when you
> > try forwarding connections? I suspect that is fixable, but it would be
> > a little less straight forward.
> >
> > I would really *not* like to see vsock support being bolted into one
> > network tool after another.
>
> I've been digging into this a big more. I came across
> https://vmsplice.net/~stefan/stefanha-kvm-forum-2015.pdf
>
> which (on page 7) lists some reasons not to use TCP/IP between guest
> and host.
>
> . Adding & configuring guest interfaces is invasive
>
> That is possibly true. But adding support for a new address family to
> NFS, NFSD, and nfs-utils is also very invasive. You would need to
> install this software on the guest. I suggest you install different
> software on the guest which solves the problem better.

Two different types of "invasive":
1. Requiring guest configuration changes that are likely to cause
conflicts.
2. Requiring changes to the software stack. Once installed there are no
conflicts.

I'm interested and open to a different solution but it must avoid
invasive configuration changes, especially inside the guest.

> . Prone to break due to config changes inside guest
>
> This is, I suspect, a key issue. With vsock, the address of the
> guest-side interface is defined by options passed to qemu. With
> normal IP addressing, the guest has to configure the address.
>
> However I think that IPv6 autoconfig makes this work well without vsock.
> If I create a bridge interface on the host, run
> ip -6 addr add fe80::1 dev br0
> then run a guest with
> -net nic,macaddr=Ch:oo:se:an:ad:dr \
> -net bridge,br=br0 \
>
> then the client can
> mount [fe80::1%interfacename]:/path /mountpoint
>
> and the host will see a connection from
> fe80::ch:oo:se:an:ad:dr
>
> So from the guest side, I have achieved zero-config NFS mounts from the
> host.

It is not zero-configuration since [fe80::1%interfacename] contains a
variable, "interfacename", whose value is unknown ahead of time. This
will make documentation as well as ability to share configuration
between VMs more difficult. In other words, we're back to something
that requires per-guest configuration and doesn't just work everywhere.

> I don't think the server can filter connections based on which interface
> a link-local address came from. If that was a problem that someone
> wanted to be fixed, I'm sure we can fix it.
>
> If you need to be sure that clients don't fake their IPv6 address, I'm
> sure netfilter is up to the task.

Yes, it's common to prevent spoofing on the host using netfilter and I
think it wouldn't be a problem.

> . Creates network interfaces on host that must be managed
>
> What vsock does is effectively create a hidden interface on the host that only the
> kernel knows about and so the sysadmin cannot break it. The only
> difference between this and an explicit interface on the host is that
> the latter requires a competent sysadmin.
>
> If you have other reasons for preferring the use of vsock for NFS, I'd be
> happy to hear them. So far I'm not convinced.

Before working on AF_VSOCK I originally proposed adding dedicated
network interfaces to guests, similar to what you've suggested, but
there was resistance for additional reasons that weren't covered in the
presentation:

Using AF_INET exposes the host's network stack to guests, and through
accidental misconfiguration even external traffic could reach the host's
network stack. AF_VSOCK doesn't do routing or forwarding so we can be
sure that any activity is intentional.

Some virtualization use cases run guests without any network interfaces
as a matter of security policy. One could argue that AF_VSOCK is just
another network channel, but due to it's restricted usage, the attack
surface is much smaller than an AF_INET network interface.


Attachments:
(No filename) (4.73 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-25 12:29:35

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Jul 07, 2017 at 01:17:54PM +1000, NeilBrown wrote:
> On Fri, Jun 30 2017, Chuck Lever wrote:
> >
> > Wouldn't it be nicer if it worked like this:
> >
> > (guest)$ cat /etc/hosts
> > 129.0.0.2 localhyper
> > (guest)$ mount.nfs localhyper:/export /mnt
> >
> > And the result was a working NFS mount of the
> > local hypervisor, using whatever NFS version the
> > two both support, with no changes needed to the
> > NFS implementation or the understanding of the
> > system administrator?
>
> Yes. Yes. Definitely Yes.
> Though I suspect you mean "127.0.0.2", not "129..."??
>
> There must be some way to redirect TCP connections to some address
> transparently through to the vsock protocol.
> The "sshuttle" program does this to transparently forward TCP connections
> over an ssh connection. Using a similar technique to forward
> connections over vsock shouldn't be hard.

Thanks for the sshuttle reference. I've taken a look at it and the
underlying iptables extensions.

sshuttle does not have the ability to accept incoming connections but
that can be achieved by adding the IP to the loopback device.

Here is how bi-directional TCP connections can be tunnelled without
network interfaces:

host <-> vsock transport <-> guest
129.0.0.2 (lo) 129.0.0.2 (lo)
129.0.0.3 (lo) 129.0.0.3 (lo)

iptables REDIRECT is used to catch 129.0.0.2->.3 connections on the host
and 129.0.0.3->.2 connections in the guest. A "connect" command is then
sent across the tunnel to establish a new TCP connection on the other
side.

Note that this isn't NAT since both sides see the correct IP addresses.

Unlike using a network interface (even tun/tap) this tunnelling approach
is restricted to TCP connections. It doesn't have UDP, etc.

Issues:

1. Adding IPs to dev lo has side effects. For example, firewall rules
on dev lo will affect the traffic. This alone probably prevents the
approach from working without conflicts on existing guests.

2. Is there a safe address range to use? Using IPv6 link-local
addresses as suggested in this thread might work, especially when
using an OUI so we can be sure there are no address collisions.

3. Performance has already been mentioned since a userspace process
tunnels from loopback TCP to the vsock transport. splice(2) can
probably be used.

Stefan


Attachments:
(No filename) (2.33 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-27 05:14:04

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Tue, Jul 25 2017, Stefan Hajnoczi wrote:

> On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
>> On Fri, Jul 07 2017, NeilBrown wrote:
>>
>> > On Fri, Jun 30 2017, Chuck Lever wrote:
>> >>
>> >> Wouldn't it be nicer if it worked like this:
>> >>
>> >> (guest)$ cat /etc/hosts
>> >> 129.0.0.2 localhyper
>> >> (guest)$ mount.nfs localhyper:/export /mnt
>> >>
>> >> And the result was a working NFS mount of the
>> >> local hypervisor, using whatever NFS version the
>> >> two both support, with no changes needed to the
>> >> NFS implementation or the understanding of the
>> >> system administrator?
>> >
>> > Yes. Yes. Definitely Yes.
>> > Though I suspect you mean "127.0.0.2", not "129..."??
>> >
>> > There must be some way to redirect TCP connections to some address
>> > transparently through to the vsock protocol.
>> > The "sshuttle" program does this to transparently forward TCP connections
>> > over an ssh connection. Using a similar technique to forward
>> > connections over vsock shouldn't be hard.
>> >
>> > Or is performance really critical, and you get too much copying when you
>> > try forwarding connections? I suspect that is fixable, but it would be
>> > a little less straight forward.
>> >
>> > I would really *not* like to see vsock support being bolted into one
>> > network tool after another.
>>
>> I've been digging into this a big more. I came across
>> https://vmsplice.net/~stefan/stefanha-kvm-forum-2015.pdf
>>
>> which (on page 7) lists some reasons not to use TCP/IP between guest
>> and host.
>>
>> . Adding & configuring guest interfaces is invasive
>>
>> That is possibly true. But adding support for a new address family to
>> NFS, NFSD, and nfs-utils is also very invasive. You would need to
>> install this software on the guest. I suggest you install different
>> software on the guest which solves the problem better.
>
> Two different types of "invasive":
> 1. Requiring guest configuration changes that are likely to cause
> conflicts.
> 2. Requiring changes to the software stack. Once installed there are no
> conflicts.
>
> I'm interested and open to a different solution but it must avoid
> invasive configuration changes, especially inside the guest.

Sounds fair.


>
>> . Prone to break due to config changes inside guest
>>
>> This is, I suspect, a key issue. With vsock, the address of the
>> guest-side interface is defined by options passed to qemu. With
>> normal IP addressing, the guest has to configure the address.
>>
>> However I think that IPv6 autoconfig makes this work well without vsock.
>> If I create a bridge interface on the host, run
>> ip -6 addr add fe80::1 dev br0
>> then run a guest with
>> -net nic,macaddr=Ch:oo:se:an:ad:dr \
>> -net bridge,br=br0 \
>>
>> then the client can
>> mount [fe80::1%interfacename]:/path /mountpoint
>>
>> and the host will see a connection from
>> fe80::ch:oo:se:an:ad:dr
>>
>> So from the guest side, I have achieved zero-config NFS mounts from the
>> host.
>
> It is not zero-configuration since [fe80::1%interfacename] contains a
> variable, "interfacename", whose value is unknown ahead of time. This
> will make documentation as well as ability to share configuration
> between VMs more difficult. In other words, we're back to something
> that requires per-guest configuration and doesn't just work everywhere.

Maybe. Why isn't the interfacename known ahead of time. Once upon a
time it was always "eth0", but I guess guests can rename it....

You can use a number instead of a name. %1 would always be lo.
%2 seems to always (often?) be the first physical interface. Presumably
the order in which you describe interfaces to qemu directly maps to the
order that Linux sees. Maybe %2 could always work. Maybe we could make
it so that it always works, even if that requires small changes to Linux
(and/or qemu).

>
>> I don't think the server can filter connections based on which interface
>> a link-local address came from. If that was a problem that someone
>> wanted to be fixed, I'm sure we can fix it.
>>
>> If you need to be sure that clients don't fake their IPv6 address, I'm
>> sure netfilter is up to the task.
>
> Yes, it's common to prevent spoofing on the host using netfilter and I
> think it wouldn't be a problem.
>
>> . Creates network interfaces on host that must be managed
>>
>> What vsock does is effectively create a hidden interface on the host that only the
>> kernel knows about and so the sysadmin cannot break it. The only
>> difference between this and an explicit interface on the host is that
>> the latter requires a competent sysadmin.
>>
>> If you have other reasons for preferring the use of vsock for NFS, I'd be
>> happy to hear them. So far I'm not convinced.
>
> Before working on AF_VSOCK I originally proposed adding dedicated
> network interfaces to guests, similar to what you've suggested, but
> there was resistance for additional reasons that weren't covered in the
> presentation:

I would like to suggest that this is critical information for
understanding the design rationale for AF_VSOCK and should be easily
found from http://wiki.qemu.org/Features/VirtioVsock

>
> Using AF_INET exposes the host's network stack to guests, and through
> accidental misconfiguration even external traffic could reach the host's
> network stack. AF_VSOCK doesn't do routing or forwarding so we can be
> sure that any activity is intentional.

If I understand this correctly, the suggested configuration has the host
completely isolated from network traffic, and the guests directly
control the physical network interfaces, so the guests see external
traffic, but neither the guests nor the wider network can communicate
with the host.

Except that sometimes the guests do need to communicate with the host so
we create a whole new protocol just for that.

>
> Some virtualization use cases run guests without any network interfaces
> as a matter of security policy. One could argue that AF_VSOCK is just
> another network channel, but due to it's restricted usage, the attack
> surface is much smaller than an AF_INET network interface.

No network interfaces, but they still want to use NFS. Does anyone
think that sounds rational?

"due to it's restricted usage, the attack surface is much smaller"
or "due to it's niche use-cache, bug are likely to go undetected for
longer". I'm not convinced that is sensible security policy.


I think I see where you are coming from now - thanks. I'm not convinced
though. It feels like someone is paranoid about possible exploits using
protocols that they think they understand, so they ask you to create a
new protocol that they don't understand (and so cannot be afraid of).

Maybe the NFS server should be run in a guest. Surely that would
protects the host's network stack. This would be a rather paranoid
configuration, but it seems to match the paranoia of the requirements.

I'm not against people being paranoid. I am against major code changes
to well established software, just to placate that paranoia.

To achieve zero-config, I think link-local addresses are by far the best
answer. To achieve isolation, some targeted filtering seems like the
best approach.

If you really want traffic between guest and host to go over a vsock,
then some sort of packet redirection should be possible.

NeilBrown


Attachments:
signature.asc (832.00 B)

2017-07-27 10:58:43

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> >> On Fri, Jul 07 2017, NeilBrown wrote:
> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
> >> I don't think the server can filter connections based on which interface
> >> a link-local address came from. If that was a problem that someone
> >> wanted to be fixed, I'm sure we can fix it.
> >>
> >> If you need to be sure that clients don't fake their IPv6 address, I'm
> >> sure netfilter is up to the task.
> >
> > Yes, it's common to prevent spoofing on the host using netfilter and I
> > think it wouldn't be a problem.
> >
> >> . Creates network interfaces on host that must be managed
> >>
> >> What vsock does is effectively create a hidden interface on the host that only the
> >> kernel knows about and so the sysadmin cannot break it. The only
> >> difference between this and an explicit interface on the host is that
> >> the latter requires a competent sysadmin.
> >>
> >> If you have other reasons for preferring the use of vsock for NFS, I'd be
> >> happy to hear them. So far I'm not convinced.
> >
> > Before working on AF_VSOCK I originally proposed adding dedicated
> > network interfaces to guests, similar to what you've suggested, but
> > there was resistance for additional reasons that weren't covered in the
> > presentation:
>
> I would like to suggest that this is critical information for
> understanding the design rationale for AF_VSOCK and should be easily
> found from http://wiki.qemu.org/Features/VirtioVsock

Thanks, I have updated the wiki.

> To achieve zero-config, I think link-local addresses are by far the best
> answer. To achieve isolation, some targeted filtering seems like the
> best approach.
>
> If you really want traffic between guest and host to go over a vsock,
> then some sort of packet redirection should be possible.

The issue we seem to hit with designs using AF_INET and network
interfaces is that they cannot meet the "it must avoid invasive
configuration changes, especially inside the guest" requirement. It's
very hard to autoconfigure in a way that doesn't conflict with the
user's network configuration inside the guest.

One thought about solving the interface naming problem: if the dedicated
NIC uses a well-known OUI dedicated for this purpose then udev could
assign a persistent name (e.g. "virtguestif"). This gets us one step
closer to non-invasive automatic configuration.

Stefan


Attachments:
(No filename) (2.48 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-27 11:33:50

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Thu, 2017-07-27 at 11:58 +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
> > On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
> > > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> > > > On Fri, Jul 07 2017, NeilBrown wrote:
> > > > > On Fri, Jun 30 2017, Chuck Lever wrote:
> > > >
> > > > I don't think the server can filter connections based on which
> > > > interface
> > > > a link-local address came from. If that was a problem that
> > > > someone
> > > > wanted to be fixed, I'm sure we can fix it.
> > > >
> > > > If you need to be sure that clients don't fake their IPv6
> > > > address, I'm
> > > > sure netfilter is up to the task.
> > >
> > > Yes, it's common to prevent spoofing on the host using netfilter
> > > and I
> > > think it wouldn't be a problem.
> > >
> > > > . Creates network interfaces on host that must be managed
> > > >
> > > > What vsock does is effectively create a hidden interface on the
> > > > host that only the
> > > > kernel knows about and so the sysadmin cannot break it. The
> > > > only
> > > > difference between this and an explicit interface on the host
> > > > is that
> > > > the latter requires a competent sysadmin.
> > > >
> > > > If you have other reasons for preferring the use of vsock for
> > > > NFS, I'd be
> > > > happy to hear them. So far I'm not convinced.
> > >
> > > Before working on AF_VSOCK I originally proposed adding dedicated
> > > network interfaces to guests, similar to what you've suggested,
> > > but
> > > there was resistance for additional reasons that weren't covered
> > > in the
> > > presentation:
> >
> > I would like to suggest that this is critical information for
> > understanding the design rationale for AF_VSOCK and should be
> > easily
> > found from http://wiki.qemu.org/Features/VirtioVsock
>
> Thanks, I have updated the wiki.
>
> > To achieve zero-config, I think link-local addresses are by far the
> > best
> > answer. To achieve isolation, some targeted filtering seems like
> > the
> > best approach.
> >
> > If you really want traffic between guest and host to go over a
> > vsock,
> > then some sort of packet redirection should be possible.
>
> The issue we seem to hit with designs using AF_INET and network
> interfaces is that they cannot meet the "it must avoid invasive
> configuration changes, especially inside the guest"
> requirement. It's
> very hard to autoconfigure in a way that doesn't conflict with the
> user's network configuration inside the guest.
>
> One thought about solving the interface naming problem: if the
> dedicated
> NIC uses a well-known OUI dedicated for this purpose then udev could
> assign a persistent name (e.g. "virtguestif"). This gets us one step
> closer to non-invasive automatic configuration.

Link-local IPv6 addresses are always present once you bring up an IPv6
interface. You can use them to communicate with other hosts on the same
network segment. It's just not routable. That seems entirely fine here
where you're not dealing with routing anyway.

What I would (naively) envision is a new network interface driver that
presents itself as "hvlo0" or soemthing, much like we do with the
loopback interface. You just need the guest to ensure that it plugs in
that driver and brings up the interface for ipv6.

Then the only issue is discovery of addresses. The HV should be able to
figure that out and present it. Maybe roll up a new nsswitch module
that queries the HV directly somehow? The nice thing there is that you
get name resolution "for free", since it's just plain old IPv6 traffic
at that point.

AF_VSOCK just seems like a very invasive solution to this problem
that's going to add a lot of maintenance burden to a lot of different
code.
--
Jeff Layton <[email protected]>

2017-07-27 23:11:34

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Thu, Jul 27 2017, Stefan Hajnoczi wrote:

> On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
>> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
>> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
>> >> On Fri, Jul 07 2017, NeilBrown wrote:
>> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
>> >> I don't think the server can filter connections based on which interface
>> >> a link-local address came from. If that was a problem that someone
>> >> wanted to be fixed, I'm sure we can fix it.
>> >>
>> >> If you need to be sure that clients don't fake their IPv6 address, I'm
>> >> sure netfilter is up to the task.
>> >
>> > Yes, it's common to prevent spoofing on the host using netfilter and I
>> > think it wouldn't be a problem.
>> >
>> >> . Creates network interfaces on host that must be managed
>> >>
>> >> What vsock does is effectively create a hidden interface on the host that only the
>> >> kernel knows about and so the sysadmin cannot break it. The only
>> >> difference between this and an explicit interface on the host is that
>> >> the latter requires a competent sysadmin.
>> >>
>> >> If you have other reasons for preferring the use of vsock for NFS, I'd be
>> >> happy to hear them. So far I'm not convinced.
>> >
>> > Before working on AF_VSOCK I originally proposed adding dedicated
>> > network interfaces to guests, similar to what you've suggested, but
>> > there was resistance for additional reasons that weren't covered in the
>> > presentation:
>>
>> I would like to suggest that this is critical information for
>> understanding the design rationale for AF_VSOCK and should be easily
>> found from http://wiki.qemu.org/Features/VirtioVsock
>
> Thanks, I have updated the wiki.

Thanks. How this one:

Can be used with VMs that have no network interfaces

is really crying out for some sort of justification.

And given that ethernet/tcpip must be some of the most attacked (and
hence hardened" code in Linux, some explanation of why it is thought
that they expose more of an attack surface than some brand new code,
might be helpful.

>
>> To achieve zero-config, I think link-local addresses are by far the best
>> answer. To achieve isolation, some targeted filtering seems like the
>> best approach.
>>
>> If you really want traffic between guest and host to go over a vsock,
>> then some sort of packet redirection should be possible.
>
> The issue we seem to hit with designs using AF_INET and network
> interfaces is that they cannot meet the "it must avoid invasive
> configuration changes, especially inside the guest" requirement. It's
> very hard to autoconfigure in a way that doesn't conflict with the
> user's network configuration inside the guest.
>
> One thought about solving the interface naming problem: if the dedicated
> NIC uses a well-known OUI dedicated for this purpose then udev could
> assign a persistent name (e.g. "virtguestif"). This gets us one step
> closer to non-invasive automatic configuration.

I think this is well worth pursuing. As you say, an OUI allows the
guest to reliably detect the right interface to use a link-local address
on.

Thanks,
NeilBrown


>
> Stefan


Attachments:
signature.asc (832.00 B)

2017-07-28 00:35:33

by Matt Benjamin

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

Hi,

On Fri, Jun 30, 2017 at 11:52 AM, Chuck Lever <[email protected]> wrote:
> Hi Stefan-

>
> Is there a standard netid for vsock? If not,
> there needs to be some discussion with the nfsv4
> Working Group to get this worked out.
>
> Because AF_VSOCK is an address family and the RPC
> framing is the same as TCP, the netid should be
> something like "tcpv" and not "vsock". I've
> complained about this before and there has been
> no response of any kind.

the onc record marking is just the length/end-of-transmission bit, and
the bytes. something is being borrowed, but it isn't tcp

>
> I'll note that rdma/rdma6 do not use alternate
> address families: an IP address is specified and
> mapped to a GUID by the underlying transport.
> We purposely did not expose GUIDs to NFS, which
> is based on AF_INET/AF_INET6.

but, as you state, vsock is an address family.

>
> rdma co-exists with IP. vsock doesn't have this
> fallback.

doesn't appear to be needed.

>
> It might be a better approach to use well-known
> (say, link-local or loopback) addresses and let
> the underlying network layer figure it out.
>
> Then hide all this stuff with DNS and let the
> client mount the server by hostname and use
> normal sockaddr's and "proto=tcp". Then you don't
> need _any_ application layer changes.

the changes in nfs-ganesha and ntirpc along these lines were rather trivial.

>
> Without hostnames, how does a client pick a
> Kerberos service principal for the server?

no mechanism has been proposed

>
> Does rpcbind implement "vsock" netids?

are they needed?

>
> Does the NFSv4.0 client advertise "vsock" in
> SETCLIENTID, and provide a "vsock" callback
> service?

It should at least do the latter; does it need to advertise
differently in SETCLIENTID?

>
>
>> It is now possible to mount a file system from the host (hypervisor)
>> over AF_VSOCK like this:
>>
>> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
>>
>> The VM's cid address is 3 and the hypervisor is 2.
>
> The mount command is supposed to supply "clientaddr"
> automatically. This mount option is exposed only for
> debugging purposes or very special cases (like
> disabling NFSv4 callback operations).
>
> I mean the whole point of this exercise is to get
> rid of network configuration, but here you're
> adding the need to additionally specify both the
> proto option and the clientaddr option to get this
> to work. Seems like that isn't zero-configuration
> at all.

This whole line of criticism seems to me kind of off-kilter. The
concept of cross-vm pipes appears pretty classical, and one can see
why it might not need to follow Internet conventions.

I'll give you that I never found the zeroconf or security rationales
as compelling--which is to say, I wouldn't restrict vsock to
guest-host communications, except by policy.

>
> Wouldn't it be nicer if it worked like this:
>
> (guest)$ cat /etc/hosts
> 129.0.0.2 localhyper
> (guest)$ mount.nfs localhyper:/export /mnt
>
> And the result was a working NFS mount of the
> local hypervisor, using whatever NFS version the
> two both support, with no changes needed to the
> NFS implementation or the understanding of the
> system administrator?
>
>

not clear; I can understand 2:/export pretty easily, and I don't
think any minds would be blown if "localhyper:" effected 2:.

>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Matt

2017-08-03 15:24:47

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote:
> On Thu, Jul 27 2017, Stefan Hajnoczi wrote:
> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> >> >> On Fri, Jul 07 2017, NeilBrown wrote:
> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
> >> To achieve zero-config, I think link-local addresses are by far the best
> >> answer. To achieve isolation, some targeted filtering seems like the
> >> best approach.
> >>
> >> If you really want traffic between guest and host to go over a vsock,
> >> then some sort of packet redirection should be possible.
> >
> > The issue we seem to hit with designs using AF_INET and network
> > interfaces is that they cannot meet the "it must avoid invasive
> > configuration changes, especially inside the guest" requirement. It's
> > very hard to autoconfigure in a way that doesn't conflict with the
> > user's network configuration inside the guest.
> >
> > One thought about solving the interface naming problem: if the dedicated
> > NIC uses a well-known OUI dedicated for this purpose then udev could
> > assign a persistent name (e.g. "virtguestif"). This gets us one step
> > closer to non-invasive automatic configuration.
>
> I think this is well worth pursuing. As you say, an OUI allows the
> guest to reliably detect the right interface to use a link-local address
> on.

IPv6 link-local addressing with a well-known MAC address range solves
address collisions. The presence of a network interface still has the
following issues:

1. Network management tools (e.g. NetworkManager) inside the guest
detect the interface and may auto-configure it (e.g. DHCP). Guest
administrators are confronted with a new interface - this opens up
the possibility that they change its configuration.

2. Default drop firewall policies conflict with the interface. The
guest administrator would have to manually configure exceptions for
their firewall.

3. udev is a Linux-only solution and other OSes do not offer a
configurable interface naming scheme. Manual configuration would
be required.

I still see these as blockers preventing guest<->host file system
sharing. Users can already manually add a NIC and configure NFS today,
but the goal here is to offer this as a feature that works in an
automated way (useful both for GUI-style virtual machine management and
for OpenStack clouds where guest configuration must be simple and
scale).

In contrast, AF_VSOCK works as long as the driver is loaded. There is
no configuration.

The changes required to Linux and nfs-utils are related to the sunrpc
transport and configuration. They do not introduce risks to core NFS or
TCP/IP. I would really like to get patches merged because I currently
have to direct interested users to building Linux and nfs-utils from
source to try this out.

Stefan


Attachments:
(No filename) (2.88 kB)
signature.asc (455.00 B)
Download all attachments

2017-08-03 21:45:34

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Thu, Aug 03 2017, Stefan Hajnoczi wrote:

> On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote:
>> On Thu, Jul 27 2017, Stefan Hajnoczi wrote:
>> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
>> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
>> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
>> >> >> On Fri, Jul 07 2017, NeilBrown wrote:
>> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
>> >> To achieve zero-config, I think link-local addresses are by far the best
>> >> answer. To achieve isolation, some targeted filtering seems like the
>> >> best approach.
>> >>
>> >> If you really want traffic between guest and host to go over a vsock,
>> >> then some sort of packet redirection should be possible.
>> >
>> > The issue we seem to hit with designs using AF_INET and network
>> > interfaces is that they cannot meet the "it must avoid invasive
>> > configuration changes, especially inside the guest" requirement. It's
>> > very hard to autoconfigure in a way that doesn't conflict with the
>> > user's network configuration inside the guest.
>> >
>> > One thought about solving the interface naming problem: if the dedicated
>> > NIC uses a well-known OUI dedicated for this purpose then udev could
>> > assign a persistent name (e.g. "virtguestif"). This gets us one step
>> > closer to non-invasive automatic configuration.
>>
>> I think this is well worth pursuing. As you say, an OUI allows the
>> guest to reliably detect the right interface to use a link-local address
>> on.
>
> IPv6 link-local addressing with a well-known MAC address range solves
> address collisions. The presence of a network interface still has the
> following issues:
>
> 1. Network management tools (e.g. NetworkManager) inside the guest
> detect the interface and may auto-configure it (e.g. DHCP).

Why would this matter? Auto-configuring may add addresses to the
interface, but will not remove the link-local address.

> Guest
> administrators are confronted with a new interface - this opens up
> the possibility that they change its configuration.

True, the admin might delete the link-local address themselves. They
might also delete /sbin/mount.nfs. Maybe they could even "rm -rf /".
A rogue admin can always shoot themselves in the foot. Trying to
prevent this is pointless.

>
> 2. Default drop firewall policies conflict with the interface. The
> guest administrator would have to manually configure exceptions for
> their firewall.

This gets back to my original point. You are willing to stick
required-configuration in the kernel and in nfs-utils, but you are not
willing to require some fixed configuration which actually addresses
your problem. If you want an easy way to punch a firewall hole for a
particular port on a particular interface, then resolve that by talking
with people who understand firewalls. Not by creating a new protocol
which cannot be firewalled.

>
> 3. udev is a Linux-only solution and other OSes do not offer a
> configurable interface naming scheme. Manual configuration would
> be required.

Not my problem.
If some other OS is lacking important functionality, you do fix it by
adding rubbish to Linux. You fix it by fixing those OSes.
For example, is Linux didn't have udev or anything like it, I might be
open to enhancing mount.nfs so that an address syntax like:
fe80::1%*:xx:yy:xx:*
would mean the that glob pattern should be matched again the MAC address
of each interface and the first such interface used. This would be a
focused change addressed at fixing a specific issue. I might not
actually like it, but if it was the best/simplest mechanism to achieve
the goal, I doubt I would fight it. Fortunately I don't need decide
as we already have udev.
If some of OS doesn't have a way to find the interface for a particular
MAC address, maybe you need to create one.

>
> I still see these as blockers preventing guest<->host file system
> sharing. Users can already manually add a NIC and configure NFS today,
> but the goal here is to offer this as a feature that works in an
> automated way (useful both for GUI-style virtual machine management and
> for OpenStack clouds where guest configuration must be simple and
> scale).
>
> In contrast, AF_VSOCK works as long as the driver is loaded. There is
> no configuration.

I think we all agree that providing something that "just works" is a
worth goal. In only question is about how much new code can be
justified, and where it should be put.

Given that almost everything you need already exists, it seems best to
just tie those pieces together.

NeilBrown


>
> The changes required to Linux and nfs-utils are related to the sunrpc
> transport and configuration. They do not introduce risks to core NFS or
> TCP/IP. I would really like to get patches merged because I currently
> have to direct interested users to building Linux and nfs-utils from
> source to try this out.
>
> Stefan


Attachments:
signature.asc (832.00 B)

2017-08-03 23:53:07

by Matt Benjamin

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

Hi Neil,

On Thu, Aug 3, 2017 at 5:45 PM, NeilBrown <[email protected]> wrote:
> On Thu, Aug 03 2017, Stefan Hajnoczi wrote:

Since the vsock address family is in the tin since 4.8, this argument
appears to be about, precisely, tying existing pieces together. The
ceph developers working on openstack manila did find the nfs over
vsock use case compelling. I appreciate this because it has
encouraged more interest in the cephfs community around using the
standardized NFS protocol for deployment.

Matt

>
> I think we all agree that providing something that "just works" is a
> worth goal. In only question is about how much new code can be
> justified, and where it should be put.
>
> Given that almost everything you need already exists, it seems best to
> just tie those pieces together.
>
> NeilBrown
>
>
>>
>> The changes required to Linux and nfs-utils are related to the sunrpc
>> transport and configuration. They do not introduce risks to core NFS or
>> TCP/IP. I would really like to get patches merged because I currently
>> have to direct interested users to building Linux and nfs-utils from
>> source to try this out.
>>
>> Stefan

2017-08-04 03:25:45

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Thu, Aug 03 2017, Matt Benjamin wrote:

> Hi Neil,
>
> On Thu, Aug 3, 2017 at 5:45 PM, NeilBrown <[email protected]> wrote:
>> On Thu, Aug 03 2017, Stefan Hajnoczi wrote:
>
> Since the vsock address family is in the tin since 4.8, this argument
> appears to be about, precisely, tying existing pieces together.

No, it is about adding new, unnecessary pieces into various places.

> The
> ceph developers working on openstack manila did find the nfs over
> vsock use case compelling. I appreciate this because it has
> encouraged more interest in the cephfs community around using the
> standardized NFS protocol for deployment.

I'm sure the ceph developers find zero-conf NFS a compelling use case.
I would be surprised if they care whether it is over vsock or IPv6.

But I'm losing interest here. I'm not a gate-keeper. If you can
convince Steve/Trond/Anna/Brice to accept your code, then good luck to
you. I don't think a convincing case has been made though.

NeilBrown


>
> Matt
>
>>
>> I think we all agree that providing something that "just works" is a
>> worth goal. In only question is about how much new code can be
>> justified, and where it should be put.
>>
>> Given that almost everything you need already exists, it seems best to
>> just tie those pieces together.
>>
>> NeilBrown
>>
>>
>>>
>>> The changes required to Linux and nfs-utils are related to the sunrpc
>>> transport and configuration. They do not introduce risks to core NFS or
>>> TCP/IP. I would really like to get patches merged because I currently
>>> have to direct interested users to building Linux and nfs-utils from
>>> source to try this out.
>>>
>>> Stefan


Attachments:
signature.asc (832.00 B)

2017-08-04 15:56:59

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Aug 04, 2017 at 07:45:22AM +1000, NeilBrown wrote:
> On Thu, Aug 03 2017, Stefan Hajnoczi wrote:
> > On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote:
> >> On Thu, Jul 27 2017, Stefan Hajnoczi wrote:
> >> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
> >> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
> >> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> >> >> >> On Fri, Jul 07 2017, NeilBrown wrote:
> >> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
> > I still see these as blockers preventing guest<->host file system
> > sharing. Users can already manually add a NIC and configure NFS today,
> > but the goal here is to offer this as a feature that works in an
> > automated way (useful both for GUI-style virtual machine management and
> > for OpenStack clouds where guest configuration must be simple and
> > scale).
> >
> > In contrast, AF_VSOCK works as long as the driver is loaded. There is
> > no configuration.
>
> I think we all agree that providing something that "just works" is a
> worth goal. In only question is about how much new code can be
> justified, and where it should be put.
>
> Given that almost everything you need already exists, it seems best to
> just tie those pieces together.

Neil,
You said downthread you're losing interest but there's a point that I
hope you have time to consider because it's key:

Even if the NFS transport can be set up automatically without
conflicting with the user's system configuration, it needs to stay
available going forward. A network interface is prone to user
configuration changes through network management tools, firewalls, and
other utilities. The risk of it breakage is significant.

That's not really a technical problem - it will be caused by some user
action - but using the existing Linux AF_VSOCK feature that whole class
of issues can be eliminated.

Stefan


Attachments:
(No filename) (1.86 kB)
signature.asc (455.00 B)
Download all attachments

2017-08-04 22:36:03

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Aug 04 2017, Stefan Hajnoczi wrote:

> On Fri, Aug 04, 2017 at 07:45:22AM +1000, NeilBrown wrote:
>> On Thu, Aug 03 2017, Stefan Hajnoczi wrote:
>> > On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote:
>> >> On Thu, Jul 27 2017, Stefan Hajnoczi wrote:
>> >> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
>> >> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
>> >> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
>> >> >> >> On Fri, Jul 07 2017, NeilBrown wrote:
>> >> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
>> > I still see these as blockers preventing guest<->host file system
>> > sharing. Users can already manually add a NIC and configure NFS today,
>> > but the goal here is to offer this as a feature that works in an
>> > automated way (useful both for GUI-style virtual machine management and
>> > for OpenStack clouds where guest configuration must be simple and
>> > scale).
>> >
>> > In contrast, AF_VSOCK works as long as the driver is loaded. There is
>> > no configuration.
>>
>> I think we all agree that providing something that "just works" is a
>> worth goal. In only question is about how much new code can be
>> justified, and where it should be put.
>>
>> Given that almost everything you need already exists, it seems best to
>> just tie those pieces together.
>
> Neil,
> You said downthread you're losing interest but there's a point that I
> hope you have time to consider because it's key:
>
> Even if the NFS transport can be set up automatically without
> conflicting with the user's system configuration, it needs to stay
> available going forward. A network interface is prone to user
> configuration changes through network management tools, firewalls, and
> other utilities. The risk of it breakage is significant.

I've already addressed this issue. I wrote:

True, the admin might delete the link-local address themselves. They
might also delete /sbin/mount.nfs. Maybe they could even "rm -rf /".
A rogue admin can always shoot themselves in the foot. Trying to
prevent this is pointless.


>
> That's not really a technical problem - it will be caused by some user
> action - but using the existing Linux AF_VSOCK feature that whole class
> of issues can be eliminated.

I suggest you look up the proverb about making things fool-proof and
learn to apply it.

Meanwhile I have another issue. Is it possible for tcpdump, or some
other tool, to capture all the packets flowing over a vsock? If it
isn't possible to analyse the traffic with wireshark, it will be much
harder to diagnose issues that customers have.

NeilBrown


>
> Stefan


Attachments:
signature.asc (832.00 B)

2017-08-08 14:07:42

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Sat, Aug 05, 2017 at 08:35:52AM +1000, NeilBrown wrote:
> On Fri, Aug 04 2017, Stefan Hajnoczi wrote:
>
> > On Fri, Aug 04, 2017 at 07:45:22AM +1000, NeilBrown wrote:
> >> On Thu, Aug 03 2017, Stefan Hajnoczi wrote:
> >> > On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote:
> >> >> On Thu, Jul 27 2017, Stefan Hajnoczi wrote:
> >> >> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
> >> >> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
> >> >> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> >> >> >> >> On Fri, Jul 07 2017, NeilBrown wrote:
> >> >> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
> >> > I still see these as blockers preventing guest<->host file system
> >> > sharing. Users can already manually add a NIC and configure NFS today,
> >> > but the goal here is to offer this as a feature that works in an
> >> > automated way (useful both for GUI-style virtual machine management and
> >> > for OpenStack clouds where guest configuration must be simple and
> >> > scale).
> >> >
> >> > In contrast, AF_VSOCK works as long as the driver is loaded. There is
> >> > no configuration.
> >>
> >> I think we all agree that providing something that "just works" is a
> >> worth goal. In only question is about how much new code can be
> >> justified, and where it should be put.
> >>
> >> Given that almost everything you need already exists, it seems best to
> >> just tie those pieces together.
> >
> > Neil,
> > You said downthread you're losing interest but there's a point that I
> > hope you have time to consider because it's key:
> >
> > Even if the NFS transport can be set up automatically without
> > conflicting with the user's system configuration, it needs to stay
> > available going forward. A network interface is prone to user
> > configuration changes through network management tools, firewalls, and
> > other utilities. The risk of it breakage is significant.
>
> I've already addressed this issue. I wrote:
>
> True, the admin might delete the link-local address themselves. They
> might also delete /sbin/mount.nfs. Maybe they could even "rm -rf /".
> A rogue admin can always shoot themselves in the foot. Trying to
> prevent this is pointless.

These are not things that I'm worried about. I agree that it's
pointless trying to prevent them.

The issue is genuine configuration changes either by the user or by
software they are running that simply interfere with the host<->guest
interface. For example, a default DROP iptables policy.

> Meanwhile I have another issue. Is it possible for tcpdump, or some
> other tool, to capture all the packets flowing over a vsock? If it
> isn't possible to analyse the traffic with wireshark, it will be much
> harder to diagnose issues that customers have.

Yes, packet capture is possible. The vsockmon driver was added in Linux
4.11. Wireshark has a dissector for AF_VSOCK.

Stefan


Attachments:
(No filename) (2.87 kB)
signature.asc (455.00 B)
Download all attachments