Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5176674ybe; Tue, 17 Sep 2019 04:03:56 -0700 (PDT) X-Google-Smtp-Source: APXvYqz2aXMZjY5dY6mZV5o1oUhuoL/xRXajqwt2sHC1WPlbHO289bow5tAsbanuKuI2O4kLOiYO X-Received: by 2002:a17:906:1ed1:: with SMTP id m17mr4121783ejj.82.1568718236316; Tue, 17 Sep 2019 04:03:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568718236; cv=none; d=google.com; s=arc-20160816; b=1JWGeTgbHTClpEtghhAtcC2pSOCv7ibOAAYTUx/oQq59xdEwXLTIAIniqmJAEGaKTL PI5UbBWHLzxr6Oi8vMREpwx4/ovgyKQiqjbnJtOlz3h15vg8ur1nSKVnbJGbubrEBYIB tDJfY7aDx63WLPv0sHY1CcnEKEJC8qeBqg9oL7J+gR7qBb+p+F4RyJu476dZZper38OX F2wHZPKz7cDbJO7ZWLVuH6hWqfuKVHmX1YRnoImkP7A+kTdVEblEfWYpdb00LnJ6idxk xELr/BMsxxmcDgy/wkBsrsQ+oT+2szRsD9DuyaWOpOwGGbiRF/lIynvhQJInlA/xHolw AoNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=xrioPiT7HDT5B9HVu6HGWtbvHfRVhLIEg4WmQaOUZuc=; b=BY1TrNGj04dRf/kZ3Jd/DqdRmEcduEHiSIg5LaFaZc09WD8YFek7Hbqs9rtGal8F6n ELCSpLbSAi5sp8xTF3bS+6OO0YBeSkFAP95GIgMTK5ht3uUz9gzSREb0MWNrDAo8KdNE ESYeEclTxzrLRRd3vqLHuz3YByvZRZ6xXAsEvnwKwTAtEciegHRSOLqjRJjCjR9QKYf2 cAjTF4XefnXM2ValTwMoGaVNK0tr0+TqoO2AQjrC3+1iKWFoH3n+h1PMKsVNmH4QjoQ/ RRcVSFb0OswsyawFxEM4JWKMxzGS8j1je+HfYn1WU0/1I9LYWcJExmD/pAcpGfyFuJWW n+3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id lu11si878748ejb.215.2019.09.17.04.03.21; Tue, 17 Sep 2019 04:03:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727016AbfIQLAs (ORCPT + 99 others); Tue, 17 Sep 2019 07:00:48 -0400 Received: from mga03.intel.com ([134.134.136.65]:44039 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726243AbfIQLAs (ORCPT ); Tue, 17 Sep 2019 07:00:48 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Sep 2019 04:00:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,516,1559545200"; d="scan'208";a="193720429" Received: from dpdk-virtio-tbie-2.sh.intel.com (HELO ___) ([10.67.104.71]) by FMSMGA003.fm.intel.com with ESMTP; 17 Sep 2019 04:00:44 -0700 Date: Tue, 17 Sep 2019 18:58:02 +0800 From: Tiwei Bie To: Jason Wang Cc: mst@redhat.com, alex.williamson@redhat.com, maxime.coquelin@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, dan.daly@intel.com, cunming.liang@intel.com, zhihong.wang@intel.com, lingshan.zhu@intel.com Subject: Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend Message-ID: <20190917105801.GA24855@___> References: <20190917010204.30376-1-tiwei.bie@intel.com> <993841ed-942e-c90b-8016-8e7dc76bf13a@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <993841ed-942e-c90b-8016-8e7dc76bf13a@redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 17, 2019 at 11:32:03AM +0800, Jason Wang wrote: > On 2019/9/17 上午9:02, Tiwei Bie wrote: > > This RFC is to demonstrate below ideas, > > > > a) Build vhost-mdev on top of the same abstraction defined in > > the virtio-mdev series [1]; > > > > b) Introduce /dev/vhost-mdev to do vhost ioctls and support > > setting mdev device as backend; > > > > Now the userspace API looks like this: > > > > - Userspace generates a compatible mdev device; > > > > - Userspace opens this mdev device with VFIO API (including > > doing IOMMU programming for this mdev device with VFIO's > > container/group based interface); > > > > - Userspace opens /dev/vhost-mdev and gets vhost fd; > > > > - Userspace uses vhost ioctls to setup vhost (userspace should > > do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device > > fd first before doing other vhost ioctls); > > > > Only compile test has been done for this series for now. > > > Have a hard thought on the architecture: Thanks a lot! Do appreciate it! > > 1) Create a vhost char device and pass vfio mdev device fd to it as a > backend and translate vhost-mdev ioctl to virtio mdev transport (e.g > read/write). DMA was done through the VFIO DMA mapping on the container that > is attached. Yeah, that's what we are doing in this series. > > We have two more choices: > > 2) Use vfio-mdev but do not create vhost-mdev device, instead, just > implement vhost ioctl on vfio_device_ops, and translate them into > virtio-mdev transport or just pass ioctl to parent. Yeah. Instead of introducing /dev/vhost-mdev char device, do vhost ioctls on VFIO device fd directly. That's what we did in RFC v3. > > 3) Don't use vfio-mdev, create a new vhost-mdev driver, during probe still > try to add dev to vfio group and talk to parent with device specific ops If my understanding is correct, this means we need to introduce a new VFIO device driver to replace the existing vfio-mdev driver in our case. Below is a quick draft just to show my understanding: #include #include #include #include #include #include #include #include "mdev_private.h" /* XXX: we need a proper way to include below vhost header. */ #include "../../vhost/vhost.h" static int vfio_vhost_mdev_open(void *device_data) { if (!try_module_get(THIS_MODULE)) return -ENODEV; /* ... */ vhost_dev_init(...); return 0; } static void vfio_vhost_mdev_release(void *device_data) { /* ... */ module_put(THIS_MODULE); } static long vfio_vhost_mdev_unlocked_ioctl(void *device_data, unsigned int cmd, unsigned long arg) { struct mdev_device *mdev = device_data; struct mdev_parent *parent = mdev->parent; /* * Use vhost ioctls. * * We will have a different parent_ops design. * And potentially, we can share the same parent_ops * with virtio_mdev. */ switch (cmd) { case VHOST_GET_FEATURES: parent->ops->get_features(mdev, ...); break; /* ... */ } return 0; } static ssize_t vfio_vhost_mdev_read(void *device_data, char __user *buf, size_t count, loff_t *ppos) { /* ... */ return 0; } static ssize_t vfio_vhost_mdev_write(void *device_data, const char __user *buf, size_t count, loff_t *ppos) { /* ... */ return 0; } static int vfio_vhost_mdev_mmap(void *device_data, struct vm_area_struct *vma) { /* ... */ return 0; } static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = { .name = "vfio-vhost-mdev", .open = vfio_vhost_mdev_open, .release = vfio_vhost_mdev_release, .ioctl = vfio_vhost_mdev_unlocked_ioctl, .read = vfio_vhost_mdev_read, .write = vfio_vhost_mdev_write, .mmap = vfio_vhost_mdev_mmap, }; static int vfio_vhost_mdev_probe(struct device *dev) { struct mdev_device *mdev = to_mdev_device(dev); /* ... */ return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev); } static void vfio_vhost_mdev_remove(struct device *dev) { /* ... */ vfio_del_group_dev(dev); } static struct mdev_driver vfio_vhost_mdev_driver = { .name = "vfio_vhost_mdev", .probe = vfio_vhost_mdev_probe, .remove = vfio_vhost_mdev_remove, }; static int __init vfio_vhost_mdev_init(void) { return mdev_register_driver(&vfio_vhost_mdev_driver, THIS_MODULE); } module_init(vfio_vhost_mdev_init) static void __exit vfio_vhost_mdev_exit(void) { mdev_unregister_driver(&vfio_vhost_mdev_driver); } module_exit(vfio_vhost_mdev_exit) > > So I have some questions: > > 1) Compared to method 2, what's the advantage of creating a new vhost char > device? I guess it's for keep the API compatibility? One benefit is that we can avoid doing vhost ioctls on VFIO device fd. > > 2) For method 2, is there any easy way for user/admin to distinguish e.g > ordinary vfio-mdev for vhost from ordinary vfio-mdev? I think device-api could be a choice. > I saw you introduce > ops matching helper but it's not friendly to management. The ops matching helper is just to check whether a given vfio-device is based on a mdev device. > > 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that > assumes the parameter comes from userspace, it prevents support kernel > virtio drivers. > > 4) So comes the idea of method 3, since it register a new vhost-mdev driver, > we can use device specific ops instead of VFIO ones, then we can have a > common API between vDPA parent and vhost-mdev/virtio-mdev drivers. As the above draft shows, this requires introducing a new VFIO device driver. I think Alex's opinion matters here. Thanks, Tiwei > > What's your thoughts? > > Thanks > > > > > > RFCv3: https://patchwork.kernel.org/patch/11117785/ > > > > [1] https://lkml.org/lkml/2019/9/10/135 > > > > Tiwei Bie (3): > > vfio: support getting vfio device from device fd > > vfio: support checking vfio driver by device ops > > vhost: introduce mdev based hardware backend > > > > drivers/vfio/mdev/vfio_mdev.c | 3 +- > > drivers/vfio/vfio.c | 32 +++ > > drivers/vhost/Kconfig | 9 + > > drivers/vhost/Makefile | 3 + > > drivers/vhost/mdev.c | 462 +++++++++++++++++++++++++++++++ > > drivers/vhost/vhost.c | 39 ++- > > drivers/vhost/vhost.h | 6 + > > include/linux/vfio.h | 11 + > > include/uapi/linux/vhost.h | 10 + > > include/uapi/linux/vhost_types.h | 5 + > > 10 files changed, 573 insertions(+), 7 deletions(-) > > create mode 100644 drivers/vhost/mdev.c > >