by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

Hi Stefan-

> On Jun 30, 2017, at 9:21 AM, Stefan Hajnoczi <[email protected]> wrote:
>
> Neither libtirpc nor getprotobyname(3) know about AF_VSOCK.

Why?

Basically you are building a lot of specialized
awareness in applications and leaving the
network layer alone. That seems backwards to me.

> For similar
> reasons as for "rdma"/"rmda6", translate "vsock" manually in getport.c.

rdma/rdma6 are specified by standards, and appear
in the IANA Network Identifiers database:

https://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml

Is there a standard netid for vsock? If not,
there needs to be some discussion with the nfsv4
Working Group to get this worked out.

Because AF_VSOCK is an address family and the RPC
framing is the same as TCP, the netid should be
something like "tcpv" and not "vsock". I've
complained about this before and there has been
no response of any kind.

I'll note that rdma/rdma6 do not use alternate
address families: an IP address is specified and
mapped to a GUID by the underlying transport.
We purposely did not expose GUIDs to NFS, which
is based on AF_INET/AF_INET6.

rdma co-exists with IP. vsock doesn't have this
fallback.

It might be a better approach to use well-known
(say, link-local or loopback) addresses and let
the underlying network layer figure it out.

Then hide all this stuff with DNS and let the
client mount the server by hostname and use
normal sockaddr's and "proto=tcp". Then you don't
need _any_ application layer changes.

Without hostnames, how does a client pick a
Kerberos service principal for the server?

Does rpcbind implement "vsock" netids?

Does the NFSv4.0 client advertise "vsock" in
SETCLIENTID, and provide a "vsock" callback
service?

> It is now possible to mount a file system from the host (hypervisor)
> over AF_VSOCK like this:
>
> (guest)$ mount.nfs 2:/export /mnt -v -o clientaddr=3,proto=vsock
>
> The VM's cid address is 3 and the hypervisor is 2.

The mount command is supposed to supply "clientaddr"
automatically. This mount option is exposed only for
debugging purposes or very special cases (like
disabling NFSv4 callback operations).

I mean the whole point of this exercise is to get
rid of network configuration, but here you're
adding the need to additionally specify both the
proto option and the clientaddr option to get this
to work. Seems like that isn't zero-configuration
at all.

Wouldn't it be nicer if it worked like this:

(guest)$ cat /etc/hosts
129.0.0.2 localhyper
(guest)$ mount.nfs localhyper:/export /mnt

And the result was a working NFS mount of the
local hypervisor, using whatever NFS version the
two both support, with no changes needed to the
NFS implementation or the understanding of the
system administrator?

> Signed-off-by: Stefan Hajnoczi <[email protected]>
> ---
> support/nfs/getport.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/support/nfs/getport.c b/support/nfs/getport.c
> index 081594c..0b857af 100644
> --- a/support/nfs/getport.c
> +++ b/support/nfs/getport.c
> @@ -217,8 +217,7 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> struct protoent *proto;
>
> /*
> - * IANA does not define a protocol number for rdma netids,
> - * since "rdma" is not an IP protocol.
> + * IANA does not define protocol numbers for non-IP netids.
> */
> if (strcmp(netid, "rdma") == 0) {
> *family = AF_INET;
> @@ -230,6 +229,11 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> *protocol = NFSPROTO_RDMA;
> return 1;
> }
> + if (strcmp(netid, "vsock") == 0) {
> + *family = AF_VSOCK;
> + *protocol = 0;
> + return 1;
> + }
>
> nconf = getnetconfigent(netid);
> if (nconf == NULL)
> @@ -258,14 +262,18 @@ nfs_get_proto(const char *netid, sa_family_t *family, unsigned long *protocol)
> struct protoent *proto;
>
> /*
> - * IANA does not define a protocol number for rdma netids,
> - * since "rdma" is not an IP protocol.
> + * IANA does not define protocol numbers for non-IP netids.
> */
> if (strcmp(netid, "rdma") == 0) {
> *family = AF_INET;
> *protocol = NFSPROTO_RDMA;
> return 1;
> }
> + if (strcmp(netid, "vsock") == 0) {
> + *family = AF_VSOCK;
> + *protocol = 0;
> + return 1;
> + }
>
> proto = getprotobyname(netid);
> if (proto == NULL)
> --
> 2.9.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever

2017-07-03 08:55:14

by Stefan Hajnoczi

[permalink] [raw]

Subject: Re: [PATCH nfs-utils v2 01/12] mount: don't use IPPROTO_UDP for address resolution

On Fri, Jun 30, 2017 at 10:34:54AM -0400, Steve Dickson wrote:
>
>
> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
> > Although getaddrinfo(3) with IPPROTO_UDP works fine for AF_INET and
> > AF_INET6, the AF_VSOCK address family does not support IPPROTO_UDP and
> > produces an error.
> >
> > Drop IPPROTO_UDP and use the 0 default (TCP) which works for all address
> > families. Modern NFS uses TCP anyway so it's strange to specify UDP.
> >
> > Signed-off-by: Stefan Hajnoczi <[email protected]>
> > Reviewed-by: Jeff Layton <[email protected]>
> > ---
> > utils/mount/stropts.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
> > index c2a739b..99656dd 100644
> > --- a/utils/mount/stropts.c
> > +++ b/utils/mount/stropts.c
> > @@ -909,9 +909,7 @@ static int nfs_try_mount(struct nfsmount_info *mi)
> > int result = 0;
> >
> > if (mi->address == NULL) {
> > - struct addrinfo hint = {
> > - .ai_protocol = (int)IPPROTO_UDP,
> > - };
> > + struct addrinfo hint = {};
> Just curious as to why not simply pass a NULL hints parameter
> verses an empty hints structure?

It's not clear from the surrounding unified diff context but the code
does set .ai_family later on:

hint.ai_family = (int)mi->family;

Would you prefer it if I move that up into the variable definition?

struct addrinfo hint = {
.ai_family = (int)mi->family,
};

Attachments:

(No filename) (1.42 kB)
signature.asc (455.00 B)
Download all attachments

2017-07-03 09:00:56

On Fri, Jun 30 2017, Chuck Lever wrote:
>
> Wouldn't it be nicer if it worked like this:
>
> (guest)$ cat /etc/hosts
> 129.0.0.2 localhyper
> (guest)$ mount.nfs localhyper:/export /mnt
>
> And the result was a working NFS mount of the
> local hypervisor, using whatever NFS version the
> two both support, with no changes needed to the
> NFS implementation or the understanding of the
> system administrator?

Yes. Yes. Definitely Yes.
Though I suspect you mean "127.0.0.2", not "129..."??

There must be some way to redirect TCP connections to some address
transparently through to the vsock protocol.
The "sshuttle" program does this to transparently forward TCP connections
over an ssh connection. Using a similar technique to forward
connections over vsock shouldn't be hard.

Or is performance really critical, and you get too much copying when you
try forwarding connections? I suspect that is fixable, but it would be
a little less straight forward.

I would really *not* like to see vsock support being bolted into one
network tool after another.

NeilBrown

Attachments:

signature.asc (832.00 B)

2017-07-07 04:13:54

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH nfs-utils v2 05/12] getport: recognize "vsock" netid

On Fri, Jul 07 2017, NeilBrown wrote:

> On Fri, Jun 30 2017, Chuck Lever wrote:
>>
>> Wouldn't it be nicer if it worked like this:
>>
>> (guest)$ cat /etc/hosts
>> 129.0.0.2 localhyper
>> (guest)$ mount.nfs localhyper:/export /mnt
>>
>> And the result was a working NFS mount of the
>> local hypervisor, using whatever NFS version the
>> two both support, with no changes needed to the
>> NFS implementation or the understanding of the
>> system administrator?
>
> Yes. Yes. Definitely Yes.
> Though I suspect you mean "127.0.0.2", not "129..."??
>
> There must be some way to redirect TCP connections to some address
> transparently through to the vsock protocol.
> The "sshuttle" program does this to transparently forward TCP connections
> over an ssh connection. Using a similar technique to forward
> connections over vsock shouldn't be hard.
>
> Or is performance really critical, and you get too much copying when you
> try forwarding connections? I suspect that is fixable, but it would be
> a little less straight forward.
>
> I would really *not* like to see vsock support being bolted into one
> network tool after another.

I've been digging into this a big more. I came across
https://vmsplice.net/~stefan/stefanha-kvm-forum-2015.pdf

which (on page 7) lists some reasons not to use TCP/IP between guest
and host.

. Adding & configuring guest interfaces is invasive

That is possibly true. But adding support for a new address family to
NFS, NFSD, and nfs-utils is also very invasive. You would need to
install this software on the guest. I suggest you install different
software on the guest which solves the problem better.

. Prone to break due to config changes inside guest

This is, I suspect, a key issue. With vsock, the address of the
guest-side interface is defined by options passed to qemu. With
normal IP addressing, the guest has to configure the address.

However I think that IPv6 autoconfig makes this work well without vsock.
If I create a bridge interface on the host, run
ip -6 addr add fe80::1 dev br0
then run a guest with
-net nic,macaddr=Ch:oo:se:an:ad:dr \
-net bridge,br=br0 \

then the client can
mount [fe80::1%interfacename]:/path /mountpoint

and the host will see a connection from
fe80::ch:oo:se:an:ad:dr

So from the guest side, I have achieved zero-config NFS mounts from the
host.

I don't think the server can filter connections based on which interface
a link-local address came from. If that was a problem that someone
wanted to be fixed, I'm sure we can fix it.

If you need to be sure that clients don't fake their IPv6 address, I'm
sure netfilter is up to the task.

. Creates network interfaces on host that must be managed

What vsock does is effectively create a hidden interface on the host that only the
kernel knows about and so the sysadmin cannot break it. The only
difference between this and an explicit interface on the host is that
the latter requires a competent sysadmin.

If you have other reasons for preferring the use of vsock for NFS, I'd be
happy to hear them. So far I'm not convinced.

Thanks,
NeilBrown

Attachments:

signature.asc (832.00 B)

2017-07-07 04:14:31

by Chuck Lever

On 07/10/2017 02:14 PM, Stefan Hajnoczi wrote:
> On Mon, Jul 03, 2017 at 12:51:07PM -0400, Steve Dickson wrote:
>> On 07/03/2017 05:00 AM, Stefan Hajnoczi wrote:
>>> On Fri, Jun 30, 2017 at 10:40:49AM -0400, Steve Dickson wrote:
>>>> On 06/30/2017 09:21 AM, Stefan Hajnoczi wrote:
>> Are there other implementations out there that would cause breakage?
>> I'm pretty sure nfs-utils is only used in Linux environments, right?
>
> Does nfs-utils have a set of minimum distro versions that are supported?
Not that I'm aware of... Now there are certain package versions nfs-utils
is dependent on but not particular distro.

steved.

>
> For example, "the latest nfs-utils release is supported on RHEL 6, SLES
> 11, and Debian 8 jessie".
>
> If yes, then I could go check that these distros ship AF_VSOCK and
> <linux/vm_sockets.h>.
>
> Stefan
>

2017-07-19 15:11:48

On Thu, 2017-07-27 at 11:58 +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
> > On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
> > > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> > > > On Fri, Jul 07 2017, NeilBrown wrote:
> > > > > On Fri, Jun 30 2017, Chuck Lever wrote:
> > > >
> > > > I don't think the server can filter connections based on which
> > > > interface
> > > > a link-local address came from. If that was a problem that
> > > > someone
> > > > wanted to be fixed, I'm sure we can fix it.
> > > >
> > > > If you need to be sure that clients don't fake their IPv6
> > > > address, I'm
> > > > sure netfilter is up to the task.
> > >
> > > Yes, it's common to prevent spoofing on the host using netfilter
> > > and I
> > > think it wouldn't be a problem.
> > >
> > > > . Creates network interfaces on host that must be managed
> > > >
> > > > What vsock does is effectively create a hidden interface on the
> > > > host that only the
> > > > kernel knows about and so the sysadmin cannot break it. The
> > > > only
> > > > difference between this and an explicit interface on the host
> > > > is that
> > > > the latter requires a competent sysadmin.
> > > >
> > > > If you have other reasons for preferring the use of vsock for
> > > > NFS, I'd be
> > > > happy to hear them. So far I'm not convinced.
> > >
> > > Before working on AF_VSOCK I originally proposed adding dedicated
> > > network interfaces to guests, similar to what you've suggested,
> > > but
> > > there was resistance for additional reasons that weren't covered
> > > in the
> > > presentation:
> >
> > I would like to suggest that this is critical information for
> > understanding the design rationale for AF_VSOCK and should be
> > easily
> > found from http://wiki.qemu.org/Features/VirtioVsock
>
> Thanks, I have updated the wiki.
>
> > To achieve zero-config, I think link-local addresses are by far the
> > best
> > answer. To achieve isolation, some targeted filtering seems like
> > the
> > best approach.
> >
> > If you really want traffic between guest and host to go over a
> > vsock,
> > then some sort of packet redirection should be possible.
>
> The issue we seem to hit with designs using AF_INET and network
> interfaces is that they cannot meet the "it must avoid invasive
> configuration changes, especially inside the guest"
> requirement. It's
> very hard to autoconfigure in a way that doesn't conflict with the
> user's network configuration inside the guest.
>
> One thought about solving the interface naming problem: if the
> dedicated
> NIC uses a well-known OUI dedicated for this purpose then udev could
> assign a persistent name (e.g. "virtguestif"). This gets us one step
> closer to non-invasive automatic configuration.

Link-local IPv6 addresses are always present once you bring up an IPv6
interface. You can use them to communicate with other hosts on the same
network segment. It's just not routable. That seems entirely fine here
where you're not dealing with routing anyway.

What I would (naively) envision is a new network interface driver that
presents itself as "hvlo0" or soemthing, much like we do with the
loopback interface. You just need the guest to ensure that it plugs in
that driver and brings up the interface for ipv6.

Then the only issue is discovery of addresses. The HV should be able to
figure that out and present it. Maybe roll up a new nsswitch module
that queries the HV directly somehow? The nice thing there is that you
get name resolution "for free", since it's just plain old IPv6 traffic
at that point.

AF_VSOCK just seems like a very invasive solution to this problem
that's going to add a lot of maintenance burden to a lot of different
code.
--
Jeff Layton <[email protected]>

2017-07-27 23:11:34

On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote:
> On Thu, Jul 27 2017, Stefan Hajnoczi wrote:
> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> >> >> On Fri, Jul 07 2017, NeilBrown wrote:
> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
> >> To achieve zero-config, I think link-local addresses are by far the best
> >> answer. To achieve isolation, some targeted filtering seems like the
> >> best approach.
> >>
> >> If you really want traffic between guest and host to go over a vsock,
> >> then some sort of packet redirection should be possible.
> >
> > The issue we seem to hit with designs using AF_INET and network
> > interfaces is that they cannot meet the "it must avoid invasive
> > configuration changes, especially inside the guest" requirement. It's
> > very hard to autoconfigure in a way that doesn't conflict with the
> > user's network configuration inside the guest.
> >
> > One thought about solving the interface naming problem: if the dedicated
> > NIC uses a well-known OUI dedicated for this purpose then udev could
> > assign a persistent name (e.g. "virtguestif"). This gets us one step
> > closer to non-invasive automatic configuration.
>
> I think this is well worth pursuing. As you say, an OUI allows the
> guest to reliably detect the right interface to use a link-local address
> on.

IPv6 link-local addressing with a well-known MAC address range solves
address collisions. The presence of a network interface still has the
following issues:

1. Network management tools (e.g. NetworkManager) inside the guest
detect the interface and may auto-configure it (e.g. DHCP). Guest
administrators are confronted with a new interface - this opens up
the possibility that they change its configuration.

2. Default drop firewall policies conflict with the interface. The
guest administrator would have to manually configure exceptions for
their firewall.

3. udev is a Linux-only solution and other OSes do not offer a
configurable interface naming scheme. Manual configuration would
be required.

I still see these as blockers preventing guest<->host file system
sharing. Users can already manually add a NIC and configure NFS today,
but the goal here is to offer this as a feature that works in an
automated way (useful both for GUI-style virtual machine management and
for OpenStack clouds where guest configuration must be simple and
scale).

In contrast, AF_VSOCK works as long as the driver is loaded. There is
no configuration.

The changes required to Linux and nfs-utils are related to the sunrpc
transport and configuration. They do not introduce risks to core NFS or
TCP/IP. I would really like to get patches merged because I currently
have to direct interested users to building Linux and nfs-utils from
source to try this out.

Stefan

Attachments:

(No filename) (2.88 kB)
signature.asc (455.00 B)
Download all attachments

2017-08-03 21:45:34

On Sat, Aug 05, 2017 at 08:35:52AM +1000, NeilBrown wrote:
> On Fri, Aug 04 2017, Stefan Hajnoczi wrote:
>
> > On Fri, Aug 04, 2017 at 07:45:22AM +1000, NeilBrown wrote:
> >> On Thu, Aug 03 2017, Stefan Hajnoczi wrote:
> >> > On Fri, Jul 28, 2017 at 09:11:22AM +1000, NeilBrown wrote:
> >> >> On Thu, Jul 27 2017, Stefan Hajnoczi wrote:
> >> >> > On Thu, Jul 27, 2017 at 03:13:53PM +1000, NeilBrown wrote:
> >> >> >> On Tue, Jul 25 2017, Stefan Hajnoczi wrote:
> >> >> >> > On Fri, Jul 07, 2017 at 02:13:38PM +1000, NeilBrown wrote:
> >> >> >> >> On Fri, Jul 07 2017, NeilBrown wrote:
> >> >> >> >> > On Fri, Jun 30 2017, Chuck Lever wrote:
> >> > I still see these as blockers preventing guest<->host file system
> >> > sharing. Users can already manually add a NIC and configure NFS today,
> >> > but the goal here is to offer this as a feature that works in an
> >> > automated way (useful both for GUI-style virtual machine management and
> >> > for OpenStack clouds where guest configuration must be simple and
> >> > scale).
> >> >
> >> > In contrast, AF_VSOCK works as long as the driver is loaded. There is
> >> > no configuration.
> >>
> >> I think we all agree that providing something that "just works" is a
> >> worth goal. In only question is about how much new code can be
> >> justified, and where it should be put.
> >>
> >> Given that almost everything you need already exists, it seems best to
> >> just tie those pieces together.
> >
> > Neil,
> > You said downthread you're losing interest but there's a point that I
> > hope you have time to consider because it's key:
> >
> > Even if the NFS transport can be set up automatically without
> > conflicting with the user's system configuration, it needs to stay
> > available going forward. A network interface is prone to user
> > configuration changes through network management tools, firewalls, and
> > other utilities. The risk of it breakage is significant.
>
> I've already addressed this issue. I wrote:
>
> True, the admin might delete the link-local address themselves. They
> might also delete /sbin/mount.nfs. Maybe they could even "rm -rf /".
> A rogue admin can always shoot themselves in the foot. Trying to
> prevent this is pointless.

These are not things that I'm worried about. I agree that it's
pointless trying to prevent them.

The issue is genuine configuration changes either by the user or by
software they are running that simply interfere with the host<->guest
interface. For example, a default DROP iptables policy.

> Meanwhile I have another issue. Is it possible for tcpdump, or some
> other tool, to capture all the packets flowing over a vsock? If it
> isn't possible to analyse the traffic with wireshark, it will be much
> harder to diagnose issues that customers have.

Yes, packet capture is possible. The vsockmon driver was added in Linux
4.11. Wireshark has a dissector for AF_VSOCK.

Stefan

Attachments:

(No filename) (2.87 kB)
signature.asc (455.00 B)
Download all attachments