Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932484AbdHVJyp (ORCPT ); Tue, 22 Aug 2017 05:54:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50812 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932306AbdHVJyn (ORCPT ); Tue, 22 Aug 2017 05:54:43 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 211A8C047B86 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=stefanha@redhat.com Date: Tue, 22 Aug 2017 10:54:37 +0100 From: Stefan Hajnoczi To: Dexuan Cui Cc: "Jorgen S. Hansen" , "davem@davemloft.net" , "netdev@vger.kernel.org" , "gregkh@linuxfoundation.org" , "devel@linuxdriverproject.org" , KY Srinivasan , Haiyang Zhang , Stephen Hemminger , George Zhang , Michal Kubecek , Asias He , Vitaly Kuznetsov , Cathy Avery , "jasowang@redhat.com" , Rolf Neugebauer , Dave Scott , Marcelo Cerri , "apw@canonical.com" , "olaf@aepfle.de" , "joe@perches.com" , "linux-kernel@vger.kernel.org" , Dan Carpenter Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default Message-ID: <20170822095437.GB16799@stefanha-x1.localdomain> References: <20170817135559.GG5539@stefanha-x1.localdomain> <04460E3B-B213-4090-96CD-00CEEBE6AC32@vmware.com> <20170818153716.GB17572@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 22 Aug 2017 09:54:43 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3329 Lines: 78 On Fri, Aug 18, 2017 at 11:07:37PM +0000, Dexuan Cui wrote: > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com] > > > CID is not really used by us, because we only support guest<->host > > communication, > > > and don't support guest<->guest communication. The Hyper-V host > > references > > > every VM by VmID (which is invisible to the VM), and a VM can only talk to > > the > > > host via this feature. > > > > Applications running inside the guest should use VMADDR_CID_HOST (2) to > > connect to the host, even on Hyper-V. > I have no objection, and this patch does support this usage of the > user-space applications. > > > By the way, we should collaborate on a test suite and a vsock(7) man > > page that documents the semantics of AF_VSOCK sockets. This way our > > transports will have the same behavior and AF_VSOCK applications will > > work on all 3 hypervisors. > I can't agree more. :-) > BTW, I have been using Rolf's test suite to test my patch: > https://github.com/rn/virtsock/tree/master/c > Maybe this can be a good starting point. Thanks for sharing this, I will try it with virtio-vsock. I have a netcat-like utility here: https://github.com/stefanha/linux/blob/vsock-extras/nc-vsock.c > > Not all features need to be supported. For example, VMCI supports > > SOCK_DGRAM while Hyper-V and virtio do not. But features that are > > available should behave identically. > I totally agree, though I'm afraid Hyper-V may have a little more limitations > compared to VMware/KVM duo to the <--> > mapping. > > > > Can we use the 'protocol' parameter in the socket() function: > > > int socket(int domain, int type, int protocol) > > > > > > IMO currently the 'protocol' is not really used. > > > I think we can modify __vsock_core_init() to allow multiple transport layers > > to > > > be registered, and we can define different 'protocol' numbers for > > > VMware/KVM/Hyper-V, and ask the application to explicitly specify what > > should > > > be used. Considering compatibility, we can use the default transport in a > > given > > > VM depending on the underlying hypervisor. > > > > I think AF_VSOCK should hide the transport from users/applications. > Ideally yes, but let's consider the KVM-on-KVM nested scenario: when > an application in the Level-1 VM creates an AF_VSOCK socket and call > connect() for it, how can we know if the app is trying to connect to > the Level-0 host, or connect to the Level-2 VM? We can't. We *can* by looking at the destination CID. Please take a look at drivers/misc/vmw_vmci/vmci_route.c:vmci_route() to see how VMCI handles nested virt. It boils down to something like this: static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr, int addr_len, int flags) { ... if (remote_addr.svm_cid == VMADDR_CID_HOST) transport = host_transport; else transport = guest_transport; It's easy for connect(2) but Jorgen mentioned it's harder for listen(2) because the socket would need to listen on both transports. We define two new constants VMADDR_CID_LISTEN_FROM_GUEST and VMADDR_CID_LISTEN_FROM_HOST for bind(2) so that applications can decide which side to listen on. Or the listen socket could simply listen to both sides. Stefan