Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7342200ybi; Mon, 8 Jul 2019 19:54:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqyEmf8Fm5cDXaO4nkk/0jamNZyLLh86Lk6pJkTtciVjelpJhhCbX6StnWeFaV/yp8D9sz3J X-Received: by 2002:a17:90a:e397:: with SMTP id b23mr29418523pjz.140.1562640863891; Mon, 08 Jul 2019 19:54:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562640863; cv=none; d=google.com; s=arc-20160816; b=BTaLB3PF5W7e1qdRXabt09S9zc2iL5EXWzMZc6+12Y0OkB5SDjpOCjzV2QrujCPzWE 9sxvMI2kqUiI2F8/j8fIoMcCyt/r5OtnrF5ulFQ1MOGmD/B59T/Hx/OWAydFE362NJy6 89qiwH8n7rKU0OPvePmdu+ZN8us4KuN5Eme2/np+DIWRQR/241j7dVfWIFeu8TKcgVxR 6UqssLJWCKEXJFw15CNvJrwDkXQyxSp/OgQAXbb4iVt0oR+fkBcSXnjQQo0Dpqz3bpZD XtorM37AKOlLOdDu/CmCHJO9FL+8csCOUPep3B5gE+AcI9NQX0h1RQ3FHK0pOqUrp1Tu wWkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=0BHeFdy5RYe5URypDZXm8XERoxnl0qeDDeuQc76OjfM=; b=DMP3/ieUdVC7Gg/+MYp8DtDY6H5cRkR7I9jlNuKCWoZyCXBzRivotxJ9Dajf2c6vPC k6icaATkX7O0ukeS3Oq17cF6Tn87g+iWvVRsy1NM+f2n+JKg9JkTXCJ7rknp+nC0msja iCq5UoAA8S4zjyNUGWfYphFgKqTI40ee8ZrQqGD6oYipnZdbUYwkZ3WrZhQDD6XFg9hL QIx1BV4pFi1XS9ScrxOEaSGJZCPxFp60jVwq4BTU2Nu09IB+tt0U4l/O4FznLoPm3yQX zoZSVzfWjdRqSAxSZu5JDRc/P24ntUXJuVARK1IaItaNynDACJ28tZoG4FTwB7QTAoxa bZXg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e92si19917865pld.312.2019.07.08.19.54.06; Mon, 08 Jul 2019 19:54:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726664AbfGICvF (ORCPT + 99 others); Mon, 8 Jul 2019 22:51:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36738 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725886AbfGICvE (ORCPT ); Mon, 8 Jul 2019 22:51:04 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 31FF888311; Tue, 9 Jul 2019 02:50:56 +0000 (UTC) Received: from [10.72.12.197] (ovpn-12-197.pek2.redhat.com [10.72.12.197]) by smtp.corp.redhat.com (Postfix) with ESMTP id 18CD4381A9; Tue, 9 Jul 2019 02:50:39 +0000 (UTC) Subject: Re: [RFC v2] vhost: introduce mdev based hardware vhost backend To: Tiwei Bie , Alex Williamson Cc: mst@redhat.com, maxime.coquelin@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, dan.daly@intel.com, cunming.liang@intel.com, zhihong.wang@intel.com, idos@mellanox.com, Rob Miller , Ariel Adam References: <20190703091339.1847-1-tiwei.bie@intel.com> <7b8279b2-aa7e-7adc-eeff-20dfaf4400d0@redhat.com> <20190703115245.GA22374@___> <64833f91-02cd-7143-f12e-56ab93b2418d@redhat.com> <20190703130817.GA1978@___> <20190704062134.GA21116@___> <20190705084946.67b8f9f5@x1.home> <20190708061625.GA15936@___> From: Jason Wang Message-ID: Date: Tue, 9 Jul 2019 10:50:38 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20190708061625.GA15936@___> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 09 Jul 2019 02:51:04 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/7/8 下午2:16, Tiwei Bie wrote: > On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote: >> On Thu, 4 Jul 2019 14:21:34 +0800 >> Tiwei Bie wrote: >>> On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote: >>>> On 2019/7/3 下午9:08, Tiwei Bie wrote: >>>>> On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote: >>>>>> On 2019/7/3 下午7:52, Tiwei Bie wrote: >>>>>>> On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote: >>>>>>>> On 2019/7/3 下午5:13, Tiwei Bie wrote: >>>>>>>>> Details about this can be found here: >>>>>>>>> >>>>>>>>> https://lwn.net/Articles/750770/ >>>>>>>>> >>>>>>>>> What's new in this version >>>>>>>>> ========================== >>>>>>>>> >>>>>>>>> A new VFIO device type is introduced - vfio-vhost. This addressed >>>>>>>>> some comments from here:https://patchwork.ozlabs.org/cover/984763/ >>>>>>>>> >>>>>>>>> Below is the updated device interface: >>>>>>>>> >>>>>>>>> Currently, there are two regions of this device: 1) CONFIG_REGION >>>>>>>>> (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the >>>>>>>>> device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which >>>>>>>>> can be used to notify the device. >>>>>>>>> >>>>>>>>> 1. CONFIG_REGION >>>>>>>>> >>>>>>>>> The region described by CONFIG_REGION is the main control interface. >>>>>>>>> Messages will be written to or read from this region. >>>>>>>>> >>>>>>>>> The message type is determined by the `request` field in message >>>>>>>>> header. The message size is encoded in the message header too. >>>>>>>>> The message format looks like this: >>>>>>>>> >>>>>>>>> struct vhost_vfio_op { >>>>>>>>> __u64 request; >>>>>>>>> __u32 flags; >>>>>>>>> /* Flag values: */ >>>>>>>>> #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */ >>>>>>>>> __u32 size; >>>>>>>>> union { >>>>>>>>> __u64 u64; >>>>>>>>> struct vhost_vring_state state; >>>>>>>>> struct vhost_vring_addr addr; >>>>>>>>> } payload; >>>>>>>>> }; >>>>>>>>> >>>>>>>>> The existing vhost-kernel ioctl cmds are reused as the message >>>>>>>>> requests in above structure. >>>>>>>> Still a comments like V1. What's the advantage of inventing a new protocol? >>>>>>> I'm trying to make it work in VFIO's way.. >>>>>>> >>>>>>>> I believe either of the following should be better: >>>>>>>> >>>>>>>> - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and >>>>>>>> extend it with e.g notify region. The advantages is that all exist userspace >>>>>>>> program could be reused without modification (or minimal modification). And >>>>>>>> vhost API hides lots of details that is not necessary to be understood by >>>>>>>> application (e.g in the case of container). >>>>>>> Do you mean reusing vhost's ioctl on VFIO device fd directly, >>>>>>> or introducing another mdev driver (i.e. vhost_mdev instead of >>>>>>> using the existing vfio_mdev) for mdev device? >>>>>> Can we simply add them into ioctl of mdev_parent_ops? >>>>> Right, either way, these ioctls have to be and just need to be >>>>> added in the ioctl of the mdev_parent_ops. But another thing we >>>>> also need to consider is that which file descriptor the userspace >>>>> will do the ioctl() on. So I'm wondering do you mean let the >>>>> userspace do the ioctl() on the VFIO device fd of the mdev >>>>> device? >>>>> >>>> Yes. >>> Got it! I'm not sure what's Alex opinion on this. If we all >>> agree with this, I can do it in this way. >>> >>>> Is there any other way btw? >>> Just a quick thought.. Maybe totally a bad idea. I was thinking >>> whether it would be odd to do non-VFIO's ioctls on VFIO's device >>> fd. So I was wondering whether it's possible to allow binding >>> another mdev driver (e.g. vhost_mdev) to the supported mdev >>> devices. The new mdev driver, vhost_mdev, can provide similar >>> ways to let userspace open the mdev device and do the vhost ioctls >>> on it. To distinguish with the vfio_mdev compatible mdev devices, >>> the device API of the new vhost_mdev compatible mdev devices >>> might be e.g. "vhost-net" for net? >>> >>> So in VFIO case, the device will be for passthru directly. And >>> in VHOST case, the device can be used to accelerate the existing >>> virtualized devices. >>> >>> How do you think? >> VFIO really can't prevent vendor specific ioctls on the device file >> descriptor for mdevs, but a) we'd want to be sure the ioctl address >> space can't collide with ioctls we'd use for vfio defined purposes and >> b) maybe the VFIO user API isn't what you want in the first place if >> you intend to mostly/entirely ignore the defined ioctl set and replace >> them with your own. In the case of the latter, you're also not getting >> the advantages of the existing VFIO userspace code, so why expose a >> VFIO device at all. > Yeah, I totally agree. I guess the original idea is to reuse the VFIO DMA/IOMMU API for this. Then we have the chance to reuse vfio codes in qemu for dealing with e.g vIOMMU. > >> The mdev interface does provide a general interface for creating and >> managing virtual devices, vfio-mdev is just one driver on the mdev >> bus. Parav (Mellanox) has been doing work on mdev-core to help clean >> out vfio-isms from the interface, aiui, with the intent of implementing >> another mdev bus driver for using the devices within the kernel. > Great to know this! I found below series after some searching: > > https://lkml.org/lkml/2019/3/8/821 > > In above series, the new mlx5_core mdev driver will do the probe > by calling mlx5_get_core_dev() first on the parent device of the > mdev device. In vhost_mdev, maybe we can also keep track of all > the compatible mdev devices and use this info to do the probe. I don't get why this is needed. My understanding is if we want to go this way, there're actually two parts. 1) Vhost mdev that implements the device managements and vhost ioctl. 2) Vhost it self, which can accept mdev fd as it backend through VHOST_NET_SET_BACKEND. > But we also need a way to allow vfio_mdev driver to distinguish > and reject the incompatible mdev devices. One issue for this series is that it doesn't consider DMA isolation at all. > >> It >> seems like this vhost-mdev driver might be similar, using mdev but not >> necessarily vfio-mdev to expose devices. Thanks, > Yeah, I also think so! I've cced some driver developers for their inputs. I think we need a sample parent drivers in the next version for us to understand the full picture. Thanks > > Thanks! > Tiwei > >> Alex