Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp2785387ybi; Thu, 4 Jul 2019 19:27:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqwPwHdXnT3VbZaF4q61heHkM5wfF2b3B6NQuLP++VoU+bspiLgPl5a+no02Rzj/lWBGURP4 X-Received: by 2002:a63:60c8:: with SMTP id u191mr1597763pgb.401.1562293671302; Thu, 04 Jul 2019 19:27:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562293671; cv=none; d=google.com; s=arc-20160816; b=0D1R5Ogy+p/oiAQo0jUB8/ItRB+CSWFpFj7OjtEbh5BlJt29sTeRgTH7OYXYlG1Wc/ svnbTq5YFL0xx657sAJixWsEzTNSSvbh4xMr7LBepgvzaHBJD+M6QibDiR8UDRn2cbC1 5Q1FgKu1Ob7nG8esDPMmnIf/dM4KvA9BHgWqWxtAwkkK2R2+xwi9tg/2lKI4fHuRkc29 e9oWuiIao9CwYpb2DhsNhwiCwfDspZiA0Ujmq/rC3D3eQ7JQ16vKQhUbDbzQEdgwidgy 2NMzlE22tbefmOolb5jV8Rz+R5wPAj+Ws3T5ggg4ZWWI9vDY5+pndmQTVUvLyeuWluW2 p6wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=GFbrAF1XCQssuGD4/khstV6f79xelwnS9n/kMyPiM4w=; b=s2pDCy3mhAiv5TSl/cRKI/PhvH79bYC3+1fi79UklRSyZYKOIofg2PRoxrVI2pkGDt Y5EmixOvOPC+k//PURP92ReV6NO4VdlmzZZNLRoPbV+H5OIEyonUow8xXgWnQGwDqppB IlPUgLkHW9r80F32ep41HtdfKwiZn2atlFpsLkM34EwF6RcjsWWET3gcVssny1Q1KuKb KZ7SOjTpFlh2w+c4x5ap2WIhlAHJFV9fY4rX8TaQe5IXdtHQRe24OlSR+B31SQK5JVYj XSSqneXmWrvTuwl7wxSQ7i3dx3G4svhF8o7c2jcBHzVoTp3aBl3A+TazhuBqgJi0aZgJ 4pcA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n3si6695101pgh.53.2019.07.04.19.27.34; Thu, 04 Jul 2019 19:27:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727504AbfGECYe (ORCPT + 99 others); Thu, 4 Jul 2019 22:24:34 -0400 Received: from mga05.intel.com ([192.55.52.43]:3030 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbfGECYd (ORCPT ); Thu, 4 Jul 2019 22:24:33 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Jul 2019 19:24:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,453,1557212400"; d="scan'208";a="191500375" Received: from npg-dpdk-virtio-tbie-2.sh.intel.com (HELO ___) ([10.67.104.151]) by fmsmga002.fm.intel.com with ESMTP; 04 Jul 2019 19:24:31 -0700 Date: Fri, 5 Jul 2019 10:23:07 +0800 From: Tiwei Bie To: Jason Wang Cc: mst@redhat.com, alex.williamson@redhat.com, maxime.coquelin@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, dan.daly@intel.com, cunming.liang@intel.com, zhihong.wang@intel.com Subject: Re: [RFC v2] vhost: introduce mdev based hardware vhost backend Message-ID: <20190705022307.GA8263@___> References: <20190703091339.1847-1-tiwei.bie@intel.com> <7b8279b2-aa7e-7adc-eeff-20dfaf4400d0@redhat.com> <20190703115245.GA22374@___> <64833f91-02cd-7143-f12e-56ab93b2418d@redhat.com> <20190703130817.GA1978@___> <20190704062134.GA21116@___> <20190704070242.GA27369@___> <513c62ba-3f44-f4cf-3b3d-e0e03b6a6de1@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <513c62ba-3f44-f4cf-3b3d-e0e03b6a6de1@redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 05, 2019 at 08:30:00AM +0800, Jason Wang wrote: > On 2019/7/4 下午3:02, Tiwei Bie wrote: > > On Thu, Jul 04, 2019 at 02:35:20PM +0800, Jason Wang wrote: > > > On 2019/7/4 下午2:21, Tiwei Bie wrote: > > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote: > > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote: > > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote: > > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote: > > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote: > > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote: > > > > > > > > > > Details about this can be found here: > > > > > > > > > > > > > > > > > > > > https://lwn.net/Articles/750770/ > > > > > > > > > > > > > > > > > > > > What's new in this version > > > > > > > > > > ========================== > > > > > > > > > > > > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This addressed > > > > > > > > > > some comments from here:https://patchwork.ozlabs.org/cover/984763/ > > > > > > > > > > > > > > > > > > > > Below is the updated device interface: > > > > > > > > > > > > > > > > > > > > Currently, there are two regions of this device: 1) CONFIG_REGION > > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the > > > > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which > > > > > > > > > > can be used to notify the device. > > > > > > > > > > > > > > > > > > > > 1. CONFIG_REGION > > > > > > > > > > > > > > > > > > > > The region described by CONFIG_REGION is the main control interface. > > > > > > > > > > Messages will be written to or read from this region. > > > > > > > > > > > > > > > > > > > > The message type is determined by the `request` field in message > > > > > > > > > > header. The message size is encoded in the message header too. > > > > > > > > > > The message format looks like this: > > > > > > > > > > > > > > > > > > > > struct vhost_vfio_op { > > > > > > > > > > __u64 request; > > > > > > > > > > __u32 flags; > > > > > > > > > > /* Flag values: */ > > > > > > > > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */ > > > > > > > > > > __u32 size; > > > > > > > > > > union { > > > > > > > > > > __u64 u64; > > > > > > > > > > struct vhost_vring_state state; > > > > > > > > > > struct vhost_vring_addr addr; > > > > > > > > > > } payload; > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > The existing vhost-kernel ioctl cmds are reused as the message > > > > > > > > > > requests in above structure. > > > > > > > > > Still a comments like V1. What's the advantage of inventing a new protocol? > > > > > > > > I'm trying to make it work in VFIO's way.. > > > > > > > > > > > > > > > > > I believe either of the following should be better: > > > > > > > > > > > > > > > > > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and > > > > > > > > > extend it with e.g notify region. The advantages is that all exist userspace > > > > > > > > > program could be reused without modification (or minimal modification). And > > > > > > > > > vhost API hides lots of details that is not necessary to be understood by > > > > > > > > > application (e.g in the case of container). > > > > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly, > > > > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of > > > > > > > > using the existing vfio_mdev) for mdev device? > > > > > > > Can we simply add them into ioctl of mdev_parent_ops? > > > > > > Right, either way, these ioctls have to be and just need to be > > > > > > added in the ioctl of the mdev_parent_ops. But another thing we > > > > > > also need to consider is that which file descriptor the userspace > > > > > > will do the ioctl() on. So I'm wondering do you mean let the > > > > > > userspace do the ioctl() on the VFIO device fd of the mdev > > > > > > device? > > > > > > > > > > > Yes. > > > > Got it! I'm not sure what's Alex opinion on this. If we all > > > > agree with this, I can do it in this way. > > > > > > > > > Is there any other way btw? > > > > Just a quick thought.. Maybe totally a bad idea. > > > > > > It's not for sure :) > > Thanks! > > > > > > > > > I was thinking > > > > whether it would be odd to do non-VFIO's ioctls on VFIO's device > > > > fd. So I was wondering whether it's possible to allow binding > > > > another mdev driver (e.g. vhost_mdev) to the supported mdev > > > > devices. The new mdev driver, vhost_mdev, can provide similar > > > > ways to let userspace open the mdev device and do the vhost ioctls > > > > on it. To distinguish with the vfio_mdev compatible mdev devices, > > > > the device API of the new vhost_mdev compatible mdev devices > > > > might be e.g. "vhost-net" for net? > > > > > > > > So in VFIO case, the device will be for passthru directly. And > > > > in VHOST case, the device can be used to accelerate the existing > > > > virtualized devices. > > > > > > > > How do you think? > > > > > > If my understanding is correct, there will be no VFIO ioctl if we go for > > > vhost_mdev? > > Yeah, exactly. If we go for vhost_mdev, we may have some vhost nodes > > in /dev similar to what /dev/vfio/* does to handle the $UUID and open > > the device (e.g. similar to VFIO_GROUP_GET_DEVICE_FD in VFIO). And > > to setup the device, we can try to reuse the ioctls of the existing > > kernel vhost as much as possible. > > > Interesting, actually, I've considered something similar. I think there > should be no issues other than DMA: Yeah, that's something we need to optimize to make it more lightweight and efficient. How about allowing userspace to do map/unmap operations like what VFIO provides? > > - Need to invent new API for DMA mapping other than SET_MEM_TABLE? (Which is > too heavyweight). > > - Need to consider a way to co-work with both on chip IOMMU (your proposal > should be fine) and scalable IOV. Maybe we can make it possible to let the parent device know the mappings (mapping events) if they need (it would be helpful for software-based device as well). Thanks, Tiwei > > Thanks > > > > > > Thanks, > > Tiwei > > > > > Thanks > > > > > > > > > > Thanks, > > > > Tiwei > > > > > Thanks > > > > >