2017-08-17 08:00:53

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH] vsock: only load vmci transport on VMware hypervisor by default


Without the patch, vmw_vsock_vmci_transport.ko can automatically load
when an application creates an AF_VSOCK socket.

This is the expected good behavior on VMware hypervisor, but as we
are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
should make sure vmw_vsock_vmci_transport.ko can't load on Hyper-V,
otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko
and hv_sock.ko try to call vsock_core_init() on Hyper-V.

On the other hand, hv_sock.ko can only load on Hyper-V, because it
depends on hv_vmbus.ko, which detects Hyper-V in hv_acpi_init().

KVM's vsock_virtio_transport doesn't have the issue because it doesn't
define MODULE_ALIAS_NETPROTO(PF_VSOCK).

The patch also adds a module parameter "skip_hypervisor_check" for
vmw_vsock_vmci_transport.ko.

Signed-off-by: Dexuan Cui <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Andy King <[email protected]>
Cc: Adit Ranadive <[email protected]>
Cc: George Zhang <[email protected]>
Cc: Jorgen Hansen <[email protected]>
Cc: K. Y. Srinivasan <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
---
net/vmw_vsock/Kconfig | 2 +-
net/vmw_vsock/vmci_transport.c | 11 +++++++++++
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index a24369d..3f52929 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -17,7 +17,7 @@ config VSOCKETS

config VMWARE_VMCI_VSOCKETS
tristate "VMware VMCI transport for Virtual Sockets"
- depends on VSOCKETS && VMWARE_VMCI
+ depends on VSOCKETS && VMWARE_VMCI && HYPERVISOR_GUEST
help
This module implements a VMCI transport for Virtual Sockets.

diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 10ae782..c068873 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -16,6 +16,7 @@
#include <linux/types.h>
#include <linux/bitops.h>
#include <linux/cred.h>
+#include <linux/hypervisor.h>
#include <linux/init.h>
#include <linux/io.h>
#include <linux/kernel.h>
@@ -73,6 +74,10 @@ struct vmci_transport_recv_pkt_info {
struct vmci_transport_packet pkt;
};

+static bool skip_hypervisor_check;
+module_param(skip_hypervisor_check, bool, 0444);
+MODULE_PARM_DESC(hot_add, "If set, attempt to load on non-VMware platforms");
+
static LIST_HEAD(vmci_transport_cleanup_list);
static DEFINE_SPINLOCK(vmci_transport_cleanup_lock);
static DECLARE_WORK(vmci_transport_cleanup_work, vmci_transport_cleanup);
@@ -2085,6 +2090,12 @@ static int __init vmci_transport_init(void)
{
int err;

+ /* Check if we are running on VMware's hypervisor and bail out
+ * if we are not.
+ */
+ if (!skip_hypervisor_check && x86_hyper != &x86_hyper_vmware)
+ return -ENODEV;
+
/* Create the datagram handle that we will use to send and receive all
* VSocket control messages for this context.
*/
--
2.7.4



2017-08-17 13:56:23

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

On Thu, Aug 17, 2017 at 08:00:29AM +0000, Dexuan Cui wrote:
>
> Without the patch, vmw_vsock_vmci_transport.ko can automatically load
> when an application creates an AF_VSOCK socket.
>
> This is the expected good behavior on VMware hypervisor, but as we
> are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
> should make sure vmw_vsock_vmci_transport.ko can't load on Hyper-V,
> otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko
> and hv_sock.ko try to call vsock_core_init() on Hyper-V.
>
> On the other hand, hv_sock.ko can only load on Hyper-V, because it
> depends on hv_vmbus.ko, which detects Hyper-V in hv_acpi_init().
>
> KVM's vsock_virtio_transport doesn't have the issue because it doesn't
> define MODULE_ALIAS_NETPROTO(PF_VSOCK).

Thanks for sending this patch, vmci's MODULE_ALIAS_NETPROTO(PF_VSOCK) is
a problem for vhost_vsock.ko (the virtio host driver) too. A host
userspace program can create a AF_VSOCK socket before vhost_vsock is
loaded. The vmci transport will be unconditionally loaded and that's
not the right behavior.

Putting aside nested virtualization, I want to load the transport (vmci,
Hyper-V, vsock) for which there is paravirtualized hardware present
inside the guest.

It's a little tricker on the host side (doesn't matter for Hyper-V and
probably also doesn't for VMware) because the host-side driver is a
software device with no hardware backing it. In KVM we assume the
vhost_vsock.ko kernel module will be loaded sufficiently early.

Things get trickier with nested virtualization because the VM might want
to talk to its host but also to its nested VMs. The simple way of
fixing this would be to allow two transports loaded simultaneously and
route traffic destined to CID 2 to the host transport and all other
traffic to the guest transport.

Perhaps we should discuss these cases a bit more to figure out how to
avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).

>
> The patch also adds a module parameter "skip_hypervisor_check" for
> vmw_vsock_vmci_transport.ko.
>
> Signed-off-by: Dexuan Cui <[email protected]>
> Cc: Alok Kataria <[email protected]>
> Cc: Andy King <[email protected]>
> Cc: Adit Ranadive <[email protected]>
> Cc: George Zhang <[email protected]>
> Cc: Jorgen Hansen <[email protected]>
> Cc: K. Y. Srinivasan <[email protected]>
> Cc: Haiyang Zhang <[email protected]>
> Cc: Stephen Hemminger <[email protected]>
> ---
> net/vmw_vsock/Kconfig | 2 +-
> net/vmw_vsock/vmci_transport.c | 11 +++++++++++
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
> index a24369d..3f52929 100644
> --- a/net/vmw_vsock/Kconfig
> +++ b/net/vmw_vsock/Kconfig
> @@ -17,7 +17,7 @@ config VSOCKETS
>
> config VMWARE_VMCI_VSOCKETS
> tristate "VMware VMCI transport for Virtual Sockets"
> - depends on VSOCKETS && VMWARE_VMCI
> + depends on VSOCKETS && VMWARE_VMCI && HYPERVISOR_GUEST
> help
> This module implements a VMCI transport for Virtual Sockets.
>
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index 10ae782..c068873 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -16,6 +16,7 @@
> #include <linux/types.h>
> #include <linux/bitops.h>
> #include <linux/cred.h>
> +#include <linux/hypervisor.h>
> #include <linux/init.h>
> #include <linux/io.h>
> #include <linux/kernel.h>
> @@ -73,6 +74,10 @@ struct vmci_transport_recv_pkt_info {
> struct vmci_transport_packet pkt;
> };
>
> +static bool skip_hypervisor_check;
> +module_param(skip_hypervisor_check, bool, 0444);
> +MODULE_PARM_DESC(hot_add, "If set, attempt to load on non-VMware platforms");
> +
> static LIST_HEAD(vmci_transport_cleanup_list);
> static DEFINE_SPINLOCK(vmci_transport_cleanup_lock);
> static DECLARE_WORK(vmci_transport_cleanup_work, vmci_transport_cleanup);
> @@ -2085,6 +2090,12 @@ static int __init vmci_transport_init(void)
> {
> int err;
>
> + /* Check if we are running on VMware's hypervisor and bail out
> + * if we are not.
> + */
> + if (!skip_hypervisor_check && x86_hyper != &x86_hyper_vmware)
> + return -ENODEV;
> +
> /* Create the datagram handle that we will use to send and receive all
> * VSocket control messages for this context.
> */
> --
> 2.7.4
>


Attachments:
(No filename) (4.27 kB)
signature.asc (455.00 B)
Download all attachments

2017-08-17 15:17:06

by Jorgen Hansen

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default


> On Aug 17, 2017, at 3:55 PM, Stefan Hajnoczi <[email protected]> wrote:
>
> On Thu, Aug 17, 2017 at 08:00:29AM +0000, Dexuan Cui wrote:
>>
>> Without the patch, vmw_vsock_vmci_transport.ko can automatically load
>> when an application creates an AF_VSOCK socket.
>>
>> This is the expected good behavior on VMware hypervisor, but as we
>> are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
>> should make sure vmw_vsock_vmci_transport.ko can't load on Hyper-V,
>> otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko
>> and hv_sock.ko try to call vsock_core_init() on Hyper-V.
>>
>> On the other hand, hv_sock.ko can only load on Hyper-V, because it
>> depends on hv_vmbus.ko, which detects Hyper-V in hv_acpi_init().
>>
>> KVM's vsock_virtio_transport doesn't have the issue because it doesn't
>> define MODULE_ALIAS_NETPROTO(PF_VSOCK).
>
> Thanks for sending this patch, vmci's MODULE_ALIAS_NETPROTO(PF_VSOCK) is
> a problem for vhost_vsock.ko (the virtio host driver) too. A host
> userspace program can create a AF_VSOCK socket before vhost_vsock is
> loaded. The vmci transport will be unconditionally loaded and that's
> not the right behavior.
>
> Putting aside nested virtualization, I want to load the transport (vmci,
> Hyper-V, vsock) for which there is paravirtualized hardware present
> inside the guest.

Good points. Completely agree that this is the desired behavior for a guest.


> It's a little tricker on the host side (doesn't matter for Hyper-V and
> probably also doesn't for VMware) because the host-side driver is a
> software device with no hardware backing it. In KVM we assume the
> vhost_vsock.ko kernel module will be loaded sufficiently early.

Since the vmci driver is currently tied to PF_VSOCK it hasn’t been a problem, but on the host side the VMCI driver has no hardware backing it either, so when we move to a more appropriate solution, this will be an issue for VMCI as well. I’ll check our shipped products, but they most likely assume that if an upstreamed vmci module is present, it will be loaded automatically.

>
> Things get trickier with nested virtualization because the VM might want
> to talk to its host but also to its nested VMs. The simple way of
> fixing this would be to allow two transports loaded simultaneously and
> route traffic destined to CID 2 to the host transport and all other
> traffic to the guest transport.

This is close to the routing the VMCI driver does in a nested environment, but that is with the assumption that there is only one type of transport. Having two different transports would require that we delay resolving the transport type until the socket endpoint has been bound to an address. Things get trickier if listening sockets use VMADDR_CID_ANY - if only one transport is present, this would allow the socket to accept connections from both guests and outer host, but with multiple transports that won’t work, since we can’t associate a socket with a transport until the socket is bound.

>
> Perhaps we should discuss these cases a bit more to figure out how to
> avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).

Agreed.

>
>>
>> The patch also adds a module parameter "skip_hypervisor_check" for
>> vmw_vsock_vmci_transport.ko.
>>
>> Signed-off-by: Dexuan Cui <[email protected]>
>> Cc: Alok Kataria <[email protected]>
>> Cc: Andy King <[email protected]>
>> Cc: Adit Ranadive <[email protected]>
>> Cc: George Zhang <[email protected]>
>> Cc: Jorgen Hansen <[email protected]>
>> Cc: K. Y. Srinivasan <[email protected]>
>> Cc: Haiyang Zhang <[email protected]>
>> Cc: Stephen Hemminger <[email protected]>
>> ---
>> net/vmw_vsock/Kconfig | 2 +-
>> net/vmw_vsock/vmci_transport.c | 11 +++++++++++
>> 2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
>> index a24369d..3f52929 100644
>> --- a/net/vmw_vsock/Kconfig
>> +++ b/net/vmw_vsock/Kconfig
>> @@ -17,7 +17,7 @@ config VSOCKETS
>>
>> config VMWARE_VMCI_VSOCKETS
>> tristate "VMware VMCI transport for Virtual Sockets"
>> - depends on VSOCKETS && VMWARE_VMCI
>> + depends on VSOCKETS && VMWARE_VMCI && HYPERVISOR_GUEST
>> help
>> This module implements a VMCI transport for Virtual Sockets.
>>
>> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
>> index 10ae782..c068873 100644
>> --- a/net/vmw_vsock/vmci_transport.c
>> +++ b/net/vmw_vsock/vmci_transport.c
>> @@ -16,6 +16,7 @@
>> #include <linux/types.h>
>> #include <linux/bitops.h>
>> #include <linux/cred.h>
>> +#include <linux/hypervisor.h>
>> #include <linux/init.h>
>> #include <linux/io.h>
>> #include <linux/kernel.h>
>> @@ -73,6 +74,10 @@ struct vmci_transport_recv_pkt_info {
>> struct vmci_transport_packet pkt;
>> };
>>
>> +static bool skip_hypervisor_check;
>> +module_param(skip_hypervisor_check, bool, 0444);
>> +MODULE_PARM_DESC(hot_add, "If set, attempt to load on non-VMware platforms");
>> +
>> static LIST_HEAD(vmci_transport_cleanup_list);
>> static DEFINE_SPINLOCK(vmci_transport_cleanup_lock);
>> static DECLARE_WORK(vmci_transport_cleanup_work, vmci_transport_cleanup);
>> @@ -2085,6 +2090,12 @@ static int __init vmci_transport_init(void)
>> {
>> int err;
>>
>> + /* Check if we are running on VMware's hypervisor and bail out
>> + * if we are not.
>> + */
>> + if (!skip_hypervisor_check && x86_hyper != &x86_hyper_vmware)
>> + return -ENODEV;
>> +

I’ma bit concerned with this. On the host-side, we still want to be able to use the VMCI transport, so to allow that, the above should also allow loading the transport when x86_hyper == NULL. However, this may still cause a conflict with virtio host side support, so it looks like we need to find a better overall way to make the protocols co-exist.

>> /* Create the datagram handle that we will use to send and receive all
>> * VSocket control messages for this context.
>> */
>> --
>> 2.7.4


2017-08-17 17:04:06

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

From: Dexuan Cui <[email protected]>
Date: Thu, 17 Aug 2017 08:00:29 +0000

> @@ -73,6 +74,10 @@ struct vmci_transport_recv_pkt_info {
> struct vmci_transport_packet pkt;
> };
>
> +static bool skip_hypervisor_check;
> +module_param(skip_hypervisor_check, bool, 0444);
> +MODULE_PARM_DESC(hot_add, "If set, attempt to load on non-VMware platforms");
> +

I would avoid module parameters at all costs.

It is the worst possible interface for users of your software.

You really need to fundamentally solve the problems related to making
sure the proper modules for the VM actually present on the system get
loaded when necessary rather than adding hacks like this.

Unlike a proper solution, these hacks are ugly but have to stay around
forever once you put them in place.

2017-08-17 18:27:58

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

> From: David Miller [mailto:[email protected]]
> Sent: Thursday, August 17, 2017 10:04
> I would avoid module parameters at all costs.
>
> It is the worst possible interface for users of your software.
>
> You really need to fundamentally solve the problems related to making
> sure the proper modules for the VM actually present on the system get
> loaded when necessary rather than adding hacks like this.
>
> Unlike a proper solution, these hacks are ugly but have to stay around
> forever once you put them in place.

Sorry for reminding me again, David! :-)

I'll try to figure out the correct solution.

Thanks,
-- Dexuan

2017-08-18 03:07:55

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

> From: Jorgen S. Hansen [mailto:[email protected]]
> Sent: Thursday, August 17, 2017 08:17
> >
> > Putting aside nested virtualization, I want to load the transport (vmci,
> > Hyper-V, vsock) for which there is paravirtualized hardware present
> > inside the guest.
>
> Good points. Completely agree that this is the desired behavior for a guest.
>
>
> > It's a little tricker on the host side (doesn't matter for Hyper-V and
> > probably also doesn't for VMware) because the host-side driver is a
> > software device with no hardware backing it. In KVM we assume the
> > vhost_vsock.ko kernel module will be loaded sufficiently early.
>
> Since the vmci driver is currently tied to PF_VSOCK it hasn’t been a problem,
> but on the host side the VMCI driver has no hardware backing it either, so
> when we move to a more appropriate solution, this will be an issue for VMCI as
> well. I’ll check our shipped products, but they most likely assume that if an
> upstreamed vmci module is present, it will be loaded automatically.

Hyper-V Sockets is a standard feature of VMBus v4.0, so we can easily know
we can and should load iff vmbus_proto_version >= VERSION_WIN10.

> > Things get trickier with nested virtualization because the VM might want
> > to talk to its host but also to its nested VMs. The simple way of
> > fixing this would be to allow two transports loaded simultaneously and
> > route traffic destined to CID 2 to the host transport and all other
> > traffic to the guest transport.

This sounds like a little tricky to me.
CID is not really used by us, because we only support guest<->host communication,
and don't support guest<->guest communication. The Hyper-V host references
every VM by VmID (which is invisible to the VM), and a VM can only talk to the
host via this feature.

> This is close to the routing the VMCI driver does in a nested environment, but
> that is with the assumption that there is only one type of transport. Having two
> different transports would require that we delay resolving the transport type
> until the socket endpoint has been bound to an address. Things get trickier if
> listening sockets use VMADDR_CID_ANY - if only one transport is present, this
> would allow the socket to accept connections from both guests and outer host,
> but with multiple transports that won’t work, since we can’t associate a socket
> with a transport until the socket is bound.
>
> >
> > Perhaps we should discuss these cases a bit more to figure out how to
> > avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).
>
> Agreed.

Can we use the 'protocol' parameter in the socket() function:
int socket(int domain, int type, int protocol)

IMO currently the 'protocol' is not really used.
I think we can modify __vsock_core_init() to allow multiple transport layers to
be registered, and we can define different 'protocol' numbers for
VMware/KVM/Hyper-V, and ask the application to explicitly specify what should
be used. Considering compatibility, we can use the default transport in a given
VM depending on the underlying hypervisor.

-- Dexuan

2017-08-18 15:37:31

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

On Fri, Aug 18, 2017 at 03:07:30AM +0000, Dexuan Cui wrote:
> > From: Jorgen S. Hansen [mailto:[email protected]]
> > Sent: Thursday, August 17, 2017 08:17
> > >
> > > Putting aside nested virtualization, I want to load the transport (vmci,
> > > Hyper-V, vsock) for which there is paravirtualized hardware present
> > > inside the guest.
> >
> > Good points. Completely agree that this is the desired behavior for a guest.
> >
> >
> > > It's a little tricker on the host side (doesn't matter for Hyper-V and
> > > probably also doesn't for VMware) because the host-side driver is a
> > > software device with no hardware backing it. In KVM we assume the
> > > vhost_vsock.ko kernel module will be loaded sufficiently early.
> >
> > Since the vmci driver is currently tied to PF_VSOCK it hasn’t been a problem,
> > but on the host side the VMCI driver has no hardware backing it either, so
> > when we move to a more appropriate solution, this will be an issue for VMCI as
> > well. I’ll check our shipped products, but they most likely assume that if an
> > upstreamed vmci module is present, it will be loaded automatically.
>
> Hyper-V Sockets is a standard feature of VMBus v4.0, so we can easily know
> we can and should load iff vmbus_proto_version >= VERSION_WIN10.
>
> > > Things get trickier with nested virtualization because the VM might want
> > > to talk to its host but also to its nested VMs. The simple way of
> > > fixing this would be to allow two transports loaded simultaneously and
> > > route traffic destined to CID 2 to the host transport and all other
> > > traffic to the guest transport.
>
> This sounds like a little tricky to me.
> CID is not really used by us, because we only support guest<->host communication,
> and don't support guest<->guest communication. The Hyper-V host references
> every VM by VmID (which is invisible to the VM), and a VM can only talk to the
> host via this feature.

Applications running inside the guest should use VMADDR_CID_HOST (2) to
connect to the host, even on Hyper-V.

By the way, we should collaborate on a test suite and a vsock(7) man
page that documents the semantics of AF_VSOCK sockets. This way our
transports will have the same behavior and AF_VSOCK applications will
work on all 3 hypervisors.

Not all features need to be supported. For example, VMCI supports
SOCK_DGRAM while Hyper-V and virtio do not. But features that are
available should behave identically.

> > This is close to the routing the VMCI driver does in a nested environment, but
> > that is with the assumption that there is only one type of transport. Having two
> > different transports would require that we delay resolving the transport type
> > until the socket endpoint has been bound to an address. Things get trickier if
> > listening sockets use VMADDR_CID_ANY - if only one transport is present, this
> > would allow the socket to accept connections from both guests and outer host,
> > but with multiple transports that won’t work, since we can’t associate a socket
> > with a transport until the socket is bound.
> >
> > >
> > > Perhaps we should discuss these cases a bit more to figure out how to
> > > avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).
> >
> > Agreed.
>
> Can we use the 'protocol' parameter in the socket() function:
> int socket(int domain, int type, int protocol)
>
> IMO currently the 'protocol' is not really used.
> I think we can modify __vsock_core_init() to allow multiple transport layers to
> be registered, and we can define different 'protocol' numbers for
> VMware/KVM/Hyper-V, and ask the application to explicitly specify what should
> be used. Considering compatibility, we can use the default transport in a given
> VM depending on the underlying hypervisor.

I think AF_VSOCK should hide the transport from users/applications.
Think of same-on-same nested virtualization: VMware-on-VMware or
KVM-on-KVM. In that case specifying VMCI or virtio doesn't help.

We'd still need to distinguish between "to guest" and "to host"
(currently VMCI has code to do this but virtio does not).

The natural place to distinguish the destination is when dealing with
the sockaddr in connect(), bind(), etc.

Stefan


Attachments:
(No filename) (4.13 kB)
signature.asc (455.00 B)
Download all attachments

2017-08-18 23:08:02

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

> From: Stefan Hajnoczi [mailto:[email protected]]
> > CID is not really used by us, because we only support guest<->host
> communication,
> > and don't support guest<->guest communication. The Hyper-V host
> references
> > every VM by VmID (which is invisible to the VM), and a VM can only talk to
> the
> > host via this feature.
>
> Applications running inside the guest should use VMADDR_CID_HOST (2) to
> connect to the host, even on Hyper-V.
I have no objection, and this patch does support this usage of the
user-space applications.

> By the way, we should collaborate on a test suite and a vsock(7) man
> page that documents the semantics of AF_VSOCK sockets. This way our
> transports will have the same behavior and AF_VSOCK applications will
> work on all 3 hypervisors.
I can't agree more. :-)
BTW, I have been using Rolf's test suite to test my patch:
https://github.com/rn/virtsock/tree/master/c
Maybe this can be a good starting point.

> Not all features need to be supported. For example, VMCI supports
> SOCK_DGRAM while Hyper-V and virtio do not. But features that are
> available should behave identically.
I totally agree, though I'm afraid Hyper-V may have a little more limitations
compared to VMware/KVM duo to the <VM_ID, ServiceID> <--> <cid, port>
mapping.

> > Can we use the 'protocol' parameter in the socket() function:
> > int socket(int domain, int type, int protocol)
> >
> > IMO currently the 'protocol' is not really used.
> > I think we can modify __vsock_core_init() to allow multiple transport layers
> to
> > be registered, and we can define different 'protocol' numbers for
> > VMware/KVM/Hyper-V, and ask the application to explicitly specify what
> should
> > be used. Considering compatibility, we can use the default transport in a
> given
> > VM depending on the underlying hypervisor.
>
> I think AF_VSOCK should hide the transport from users/applications.
Ideally yes, but let's consider the KVM-on-KVM nested scenario: when
an application in the Level-1 VM creates an AF_VSOCK socket and call
connect() for it, how can we know if the app is trying to connect to
the Level-0 host, or connect to the Level-2 VM? We can't. That's why
I propose we should use the 'protocol' parameter to distinguish between
"to guest" and "to host".

With my proposal, in the above scenario, by default (the 'protocol' is 0),
we choose the "to host" transport layer when socket() is called; if the
userspace app explicitly specifies "to guest", we choose the "to guest"
transport layer when socket() is called. This way, the connect(), bind(), etc.
can work automatically.
(Of course, the default transport for a give VM can be better chosen
if we detect which nested level the app is running on.)

> Think of same-on-same nested virtualization: VMware-on-VMware or
> KVM-on-KVM. In that case specifying VMCI or virtio doesn't help.
>
> We'd still need to distinguish between "to guest" and "to host"
> (currently VMCI has code to do this but virtio does not).
>
> The natural place to distinguish the destination is when dealing with
> the sockaddr in connect(), bind(), etc.
>
> Stefan

Thanks,
-- Dexuan

2017-08-22 09:54:45

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

On Fri, Aug 18, 2017 at 11:07:37PM +0000, Dexuan Cui wrote:
> > From: Stefan Hajnoczi [mailto:[email protected]]
> > > CID is not really used by us, because we only support guest<->host
> > communication,
> > > and don't support guest<->guest communication. The Hyper-V host
> > references
> > > every VM by VmID (which is invisible to the VM), and a VM can only talk to
> > the
> > > host via this feature.
> >
> > Applications running inside the guest should use VMADDR_CID_HOST (2) to
> > connect to the host, even on Hyper-V.
> I have no objection, and this patch does support this usage of the
> user-space applications.
>
> > By the way, we should collaborate on a test suite and a vsock(7) man
> > page that documents the semantics of AF_VSOCK sockets. This way our
> > transports will have the same behavior and AF_VSOCK applications will
> > work on all 3 hypervisors.
> I can't agree more. :-)
> BTW, I have been using Rolf's test suite to test my patch:
> https://github.com/rn/virtsock/tree/master/c
> Maybe this can be a good starting point.

Thanks for sharing this, I will try it with virtio-vsock.

I have a netcat-like utility here:
https://github.com/stefanha/linux/blob/vsock-extras/nc-vsock.c

> > Not all features need to be supported. For example, VMCI supports
> > SOCK_DGRAM while Hyper-V and virtio do not. But features that are
> > available should behave identically.
> I totally agree, though I'm afraid Hyper-V may have a little more limitations
> compared to VMware/KVM duo to the <VM_ID, ServiceID> <--> <cid, port>
> mapping.
>
> > > Can we use the 'protocol' parameter in the socket() function:
> > > int socket(int domain, int type, int protocol)
> > >
> > > IMO currently the 'protocol' is not really used.
> > > I think we can modify __vsock_core_init() to allow multiple transport layers
> > to
> > > be registered, and we can define different 'protocol' numbers for
> > > VMware/KVM/Hyper-V, and ask the application to explicitly specify what
> > should
> > > be used. Considering compatibility, we can use the default transport in a
> > given
> > > VM depending on the underlying hypervisor.
> >
> > I think AF_VSOCK should hide the transport from users/applications.
> Ideally yes, but let's consider the KVM-on-KVM nested scenario: when
> an application in the Level-1 VM creates an AF_VSOCK socket and call
> connect() for it, how can we know if the app is trying to connect to
> the Level-0 host, or connect to the Level-2 VM? We can't.

We *can* by looking at the destination CID. Please take a look at
drivers/misc/vmw_vmci/vmci_route.c:vmci_route() to see how VMCI handles
nested virt.

It boils down to something like this:

static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
int addr_len, int flags)
{
...
if (remote_addr.svm_cid == VMADDR_CID_HOST)
transport = host_transport;
else
transport = guest_transport;

It's easy for connect(2) but Jorgen mentioned it's harder for listen(2)
because the socket would need to listen on both transports. We define
two new constants VMADDR_CID_LISTEN_FROM_GUEST and
VMADDR_CID_LISTEN_FROM_HOST for bind(2) so that applications can decide
which side to listen on. Or the listen socket could simply listen to
both sides.

Stefan

2017-08-22 13:07:09

by Jorgen Hansen

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default


> On Aug 22, 2017, at 11:54 AM, Stefan Hajnoczi <[email protected]> wrote:
>
> On Fri, Aug 18, 2017 at 11:07:37PM +0000, Dexuan Cui wrote:
>>> Not all features need to be supported. For example, VMCI supports
>>> SOCK_DGRAM while Hyper-V and virtio do not. But features that are
>>> available should behave identically.
>> I totally agree, though I'm afraid Hyper-V may have a little more limitations
>> compared to VMware/KVM duo to the <VM_ID, ServiceID> <--> <cid, port>
>> mapping.
>>
>>>> Can we use the 'protocol' parameter in the socket() function:
>>>> int socket(int domain, int type, int protocol)
>>>>
>>>> IMO currently the 'protocol' is not really used.
>>>> I think we can modify __vsock_core_init() to allow multiple transport layers
>>> to
>>>> be registered, and we can define different 'protocol' numbers for
>>>> VMware/KVM/Hyper-V, and ask the application to explicitly specify what
>>> should
>>>> be used. Considering compatibility, we can use the default transport in a
>>> given
>>>> VM depending on the underlying hypervisor.
>>>
>>> I think AF_VSOCK should hide the transport from users/applications.
>> Ideally yes, but let's consider the KVM-on-KVM nested scenario: when
>> an application in the Level-1 VM creates an AF_VSOCK socket and call
>> connect() for it, how can we know if the app is trying to connect to
>> the Level-0 host, or connect to the Level-2 VM? We can't.
>
> We *can* by looking at the destination CID. Please take a look at
> drivers/misc/vmw_vmci/vmci_route.c:vmci_route() to see how VMCI handles
> nested virt.
>
> It boils down to something like this:
>
> static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
> int addr_len, int flags)
> {
> ...
> if (remote_addr.svm_cid == VMADDR_CID_HOST)
> transport = host_transport;
> else
> transport = guest_transport;
>
> It's easy for connect(2) but Jorgen mentioned it's harder for listen(2)
> because the socket would need to listen on both transports. We define
> two new constants VMADDR_CID_LISTEN_FROM_GUEST and
> VMADDR_CID_LISTEN_FROM_HOST for bind(2) so that applications can decide
> which side to listen on.

If a socket is bound to VMADDR_CID_HOST, we would consider that socket as bound to the host side transport, so that would be the same as VMADDR_CID_LISTEN_FROM_GUEST. For the guest, we have IOCTL_VM_SOCKETS_GET_LOCAL_CID, so that could be used to get and bind a socket to the guest transport (VMCI will always return the guest CID as the local one, if the VMCI driver is used in a guest, and it looks like virtio will do the same). We could treat VMADDR_CID_ANY as always being the guest transport, since that is the use case where you don’t know upfront what your CID is, if we don’t want to listen on all transports. So we would use the host transport, if a socket is bound to VMADDR_CID_HOST, or if there is no guest transport, and in all other cases use the guest transport. However, having a couple of symbolic names like you suggest certainly makes it more obvious, and could be used in combination with this. It would be a plus if existing applications would function as intended in most cases.

> Or the listen socket could simply listen to
> both sides.

The only problem here would be the potential for a guest and a host app to have a conflict wrt port numbers, even though they would be able to operate fine, if restricted to their appropriate transport.

Thanks,
Jorgen

2017-08-23 04:21:50

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

> From: Jorgen S. Hansen [mailto:[email protected]]
> > On Aug 22, 2017, at 11:54 AM, Stefan Hajnoczi <[email protected]>
> wrote:
> > ...
> > We *can* by looking at the destination CID. Please take a look at
> > drivers/misc/vmw_vmci/vmci_route.c:vmci_route() to see how VMCI
> handles
> > nested virt.
> >
> > It boils down to something like this:
> >
> > static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
> > int addr_len, int flags)
> > {
> > ...
> > if (remote_addr.svm_cid == VMADDR_CID_HOST)
> > transport = host_transport;
> > else
> > transport = guest_transport;
> >
> > It's easy for connect(2) but Jorgen mentioned it's harder for listen(2)
> > because the socket would need to listen on both transports. We define
> > two new constants VMADDR_CID_LISTEN_FROM_GUEST and
> > VMADDR_CID_LISTEN_FROM_HOST for bind(2) so that applications can
> decide
> > which side to listen on.
>
> If a socket is bound to VMADDR_CID_HOST, we would consider that socket as
> bound to the host side transport, so that would be the same as
> VMADDR_CID_LISTEN_FROM_GUEST. For the guest, we have
> IOCTL_VM_SOCKETS_GET_LOCAL_CID, so that could be used to get and bind
> a socket to the guest transport (VMCI will always return the guest CID as the
> local one, if the VMCI driver is used in a guest, and it looks like virtio will do
> the same). We could treat VMADDR_CID_ANY as always being the guest
> transport, since that is the use case where you don’t know upfront what
> your CID is, if we don’t want to listen on all transports. So we would use the
> host transport, if a socket is bound to VMADDR_CID_HOST, or if there is no
> guest transport, and in all other cases use the guest transport. However,
> having a couple of symbolic names like you suggest certainly makes it more
> obvious, and could be used in combination with this. It would be a plus if
> existing applications would function as intended in most cases.
>
> > Or the listen socket could simply listen to
> > both sides.
>
> The only problem here would be the potential for a guest and a host app to
> have a conflict wrt port numbers, even though they would be able to
> operate fine, if restricted to their appropriate transport.
>
> Thanks,
> Jorgen

Hi Jorgen, Stefan,
Thank you for the detailed analysis!
You have a much better understanding than me about the complex
scenarios. Can you please work out a patch? :-)

IMO Linux driver of Hyper-V sockets is the simplest case, as we only have
the "to host" option (the host side driver of Hyper-V sockets runs on
Windows kernel and I don't think the other hypervisors emulate
the full Hyper-V VMBus 4.0, which is required to support Hyper-V sockets).

-- Dexuan


2017-08-29 02:36:51

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

> From: Dexuan Cui
> Sent: Tuesday, August 22, 2017 21:21
> > ...
> > ...
> > The only problem here would be the potential for a guest and a host app
> to
> > have a conflict wrt port numbers, even though they would be able to
> > operate fine, if restricted to their appropriate transport.
> >
> > Thanks,
> > Jorgen
>
> Hi Jorgen, Stefan,
> Thank you for the detailed analysis!
> You have a much better understanding than me about the complex
> scenarios. Can you please work out a patch? :-)

Hi Jorgen, Stefan,
May I know your plan for this?

> IMO Linux driver of Hyper-V sockets is the simplest case, as we only have
> the "to host" option (the host side driver of Hyper-V sockets runs on
> Windows kernel and I don't think the other hypervisors emulate
> the full Hyper-V VMBus 4.0, which is required to support Hyper-V sockets).
>
> -- Dexuan

Thanks,
-- Dexuan

2017-08-29 15:37:14

by Jorgen Hansen

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default


> On Aug 29, 2017, at 4:36 AM, Dexuan Cui <[email protected]> wrote:
>
>> From: Dexuan Cui
>> Sent: Tuesday, August 22, 2017 21:21
>>> ...
>>> ...
>>> The only problem here would be the potential for a guest and a host app
>> to
>>> have a conflict wrt port numbers, even though they would be able to
>>> operate fine, if restricted to their appropriate transport.
>>>
>>> Thanks,
>>> Jorgen
>>
>> Hi Jorgen, Stefan,
>> Thank you for the detailed analysis!
>> You have a much better understanding than me about the complex
>> scenarios. Can you please work out a patch? :-)
>
> Hi Jorgen, Stefan,
> May I know your plan for this?

I’d be happy to discuss this now, but I don’t have time to work on the
actual implementation in the next couple of weeks.

For the guest, we agree that registering a guest transport should be
tied to either the hypervisor running the guest, or the existence of
the appropriate virtual hardware.

My main concern is that our existing software relies on the current
behavior of auto-loading the VMCI transport on the host side. So
changing that behavior could cause trouble for our existing Linux
users.

I’m wondering whether it would be possible to support multiple host
side transports. Since it is theoretically possible to run multiple
hypervisors at the same time, there would even be a use case for this.

If the assignment of CIDs is made unique across all transports, a
connection initiated by the host side will get directed to the
appropriate transport based on the CID. Any traffic coming from a guest
will naturally be using the appropriate transport. Host side applications
will have to share the port space as well. This would require tracking
CIDs as a common resource across all transports, but make CIDs
unique VM addresses on a given host.

If we allow multiple host side transports, virtio host side support and
vmci should be able to coexist regardless of the order of initialization.

Thanks,
Jorgen


2017-08-31 11:55:36

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

On Tue, Aug 29, 2017 at 03:37:07PM +0000, Jorgen S. Hansen wrote:
> > On Aug 29, 2017, at 4:36 AM, Dexuan Cui <[email protected]> wrote:
> If we allow multiple host side transports, virtio host side support and
> vmci should be able to coexist regardless of the order of initialization.

That sounds good to me.

This means af_vsock.c needs to be aware of CID allocation. Currently the
vhost_vsock.ko driver handles this itself (it keeps a list of CIDs and
checks that they are not used twice). It should be possible to move
that state into af_vsock.c so we have <cid, host_transport> pairs.

I'm currently working on NFS over AF_VSOCK and sock_diag support (for
ss(8) and netstat-like tools).

Multi-transport support is lower priority for me at the moment. I'm
happy to review patches though. If there is no progress on this by the
end of the year then I will have time to work on it.

Are either of you are in Prague, Czech Republic on October 25-27 for
Linux Kernel Summit, Open Source Summit Europe, Embedded Linux
Conference Europe, KVM Forum, or MesosCon Europe?

Stefan

2017-09-02 06:25:54

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

> From: Stefan Hajnoczi [mailto:[email protected]]
> Sent: Thursday, August 31, 2017 4:55 AM
> ...
> On Tue, Aug 29, 2017 at 03:37:07PM +0000, Jorgen S. Hansen wrote:
> > > On Aug 29, 2017, at 4:36 AM, Dexuan Cui <[email protected]> wrote:
> > If we allow multiple host side transports, virtio host side support and
> > vmci should be able to coexist regardless of the order of initialization.
>
> That sounds good to me.
>
> This means af_vsock.c needs to be aware of CID allocation. Currently the
> vhost_vsock.ko driver handles this itself (it keeps a list of CIDs and
> checks that they are not used twice). It should be possible to move
> that state into af_vsock.c so we have <cid, host_transport> pairs.
>
> I'm currently working on NFS over AF_VSOCK and sock_diag support (for
> ss(8) and netstat-like tools).
>
> Multi-transport support is lower priority for me at the moment. I'm
> happy to review patches though. If there is no progress on this by the
> end of the year then I will have time to work on it.
I understand. Thank you both for sharing the details about the plan!

> Are either of you are in Prague, Czech Republic on October 25-27 for
> Linux Kernel Summit, Open Source Summit Europe, Embedded Linux
> Conference Europe, KVM Forum, or MesosCon Europe?
>
> Stefan
I regret I won't be there this year.

Thanks,
-- Dexuan

2017-09-06 14:11:31

by Jorgen Hansen

[permalink] [raw]
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default


> On Aug 31, 2017, at 1:54 PM, Stefan Hajnoczi <[email protected]> wrote:
>
> On Tue, Aug 29, 2017 at 03:37:07PM +0000, Jorgen S. Hansen wrote:
>>> On Aug 29, 2017, at 4:36 AM, Dexuan Cui <[email protected]> wrote:
>> If we allow multiple host side transports, virtio host side support and
>> vmci should be able to coexist regardless of the order of initialization.
>
> That sounds good to me.
>
> This means af_vsock.c needs to be aware of CID allocation. Currently the
> vhost_vsock.ko driver handles this itself (it keeps a list of CIDs and
> checks that they are not used twice). It should be possible to move
> that state into af_vsock.c so we have <cid, host_transport> pairs.
>

Yes, that was my thinking as well.

> I'm currently working on NFS over AF_VSOCK and sock_diag support (for
> ss(8) and netstat-like tools).
>
> Multi-transport support is lower priority for me at the moment. I'm
> happy to review patches though. If there is no progress on this by the
> end of the year then I will have time to work on it.
>

I’ll try to find time to write a more coherent proposal in the coming weeks,
and we can discuss that.

> Are either of you are in Prague, Czech Republic on October 25-27 for
> Linux Kernel Summit, Open Source Summit Europe, Embedded Linux
> Conference Europe, KVM Forum, or MesosCon Europe?

No, I won’t be there.

Thanks,
Jorgen


2017-09-06 19:40:17

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

> From: Jorgen S. Hansen [mailto:[email protected]]
> Sent: Wednesday, September 6, 2017 7:11 AM
>> ...
> > I'm currently working on NFS over AF_VSOCK and sock_diag support (for
> > ss(8) and netstat-like tools).
> >
> > Multi-transport support is lower priority for me at the moment. I'm
> > happy to review patches though. If there is no progress on this by the
> > end of the year then I will have time to work on it.
> >
>
> I’ll try to find time to write a more coherent proposal in the coming weeks,
> and we can discuss that.
>
> Jorgen

Thank you!

Thanks,
-- Dexuan