by Tiwei Bie

[permalink] [raw]

Subject: Re: [RFC v4 3/3] vhost: introduce mdev based hardware backend

On Tue, Sep 17, 2019 at 03:26:30PM +0800, Jason Wang wrote:
> On 2019/9/17 上午9:02, Tiwei Bie wrote:
> > diff --git a/drivers/vhost/mdev.c b/drivers/vhost/mdev.c
> > new file mode 100644
> > index 000000000000..8c6597aff45e
> > --- /dev/null
> > +++ b/drivers/vhost/mdev.c
> > @@ -0,0 +1,462 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2018-2019 Intel Corporation.
> > + */
> > +
> > +#include <linux/compat.h>
> > +#include <linux/kernel.h>
> > +#include <linux/miscdevice.h>
> > +#include <linux/mdev.h>
> > +#include <linux/module.h>
> > +#include <linux/vfio.h>
> > +#include <linux/vhost.h>
> > +#include <linux/virtio_mdev.h>
> > +
> > +#include "vhost.h"
> > +
> > +struct vhost_mdev {
> > + struct mutex mutex;
> > + struct vhost_dev dev;
> > + struct vhost_virtqueue *vqs;
> > + int nvqs;
> > + u64 state;
> > + u64 features;
> > + u64 acked_features;
> > + struct vfio_group *vfio_group;
> > + struct vfio_device *vfio_device;
> > + struct mdev_device *mdev;
> > +};
> > +
> > +/*
> > + * XXX
> > + * We assume virtio_mdev.ko exposes below symbols for now, as we
> > + * don't have a proper way to access parent ops directly yet.
> > + *
> > + * virtio_mdev_readl()
> > + * virtio_mdev_writel()
> > + */
> > +extern u32 virtio_mdev_readl(struct mdev_device *mdev, loff_t off);
> > +extern void virtio_mdev_writel(struct mdev_device *mdev, loff_t off, u32 val);
>
>
> Need to consider a better approach, I feel we should do it through some kind
> of mdev driver instead of talk to mdev device directly.

Yeah, a better approach is really needed here.
Besides, we may want a way to allow accessing the mdev
device_ops proposed in below series outside the
drivers/vfio/mdev/ directory.

https://lkml.org/lkml/2019/9/12/151

I.e. allow putting mdev drivers outside above directory.

> > +
> > + for (queue_id = 0; queue_id < m->nvqs; queue_id++) {
> > + vq = &m->vqs[queue_id];
> > +
> > + if (!vq->desc || !vq->avail || !vq->used)
> > + break;
> > +
> > + virtio_mdev_writel(mdev, VIRTIO_MDEV_QUEUE_NUM, vq->num);
> > +
> > + if (!vhost_translate_ring_addr(vq, (u64)vq->desc,
> > + vhost_get_desc_size(vq, vq->num),
> > + &addr))
> > + return -EINVAL;
>
>
> Interesting, any reason for doing such kinds of translation to HVA? I
> believe the add should already an IOVA that has been map by VFIO.

Currently, in the software based vhost-kernel and vhost-user
backends, QEMU will pass ring addresses as HVA in SET_VRING_ADDR
ioctl when iotlb isn't enabled. If it's OK to let QEMU pass GPA
in vhost-mdev in this case, then this translation won't be needed.

Thanks,
Tiwei

2019-09-20 21:28:14

by Jason Wang

[permalink] [raw]

Subject: Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend

On 2019/9/19 下午11:45, Tiwei Bie wrote:
> On Thu, Sep 19, 2019 at 09:08:11PM +0800, Jason Wang wrote:
>> On 2019/9/18 下午10:32, Michael S. Tsirkin wrote:
>>>>>> So I have some questions:
>>>>>>
>>>>>> 1) Compared to method 2, what's the advantage of creating a new vhost char
>>>>>> device? I guess it's for keep the API compatibility?
>>>>> One benefit is that we can avoid doing vhost ioctls on
>>>>> VFIO device fd.
>>>> Yes, but any benefit from doing this?
>>> It does seem a bit more modular, but it's certainly not a big deal.
>> Ok, if we go this way, it could be as simple as provide some callback to
>> vhost, then vhost can just forward the ioctl through parent_ops.
>>
>>>>>> 2) For method 2, is there any easy way for user/admin to distinguish e.g
>>>>>> ordinary vfio-mdev for vhost from ordinary vfio-mdev?
>>>>> I think device-api could be a choice.
>>>> Ok.
>>>>
>>>>
>>>>>> I saw you introduce
>>>>>> ops matching helper but it's not friendly to management.
>>>>> The ops matching helper is just to check whether a given
>>>>> vfio-device is based on a mdev device.
>>>>>
>>>>>> 3) A drawback of 1) and 2) is that it must follow vfio_device_ops that
>>>>>> assumes the parameter comes from userspace, it prevents support kernel
>>>>>> virtio drivers.
>>>>>>
>>>>>> 4) So comes the idea of method 3, since it register a new vhost-mdev driver,
>>>>>> we can use device specific ops instead of VFIO ones, then we can have a
>>>>>> common API between vDPA parent and vhost-mdev/virtio-mdev drivers.
>>>>> As the above draft shows, this requires introducing a new
>>>>> VFIO device driver. I think Alex's opinion matters here.
>> Just to clarify, a new type of mdev driver but provides dummy
>> vfio_device_ops for VFIO to make container DMA ioctl work.
> I see. Thanks! IIUC, you mean we can provide a very tiny
> VFIO device driver in drivers/vhost/mdev.c, e.g.:
>
> static int vfio_vhost_mdev_open(void *device_data)
> {
> if (!try_module_get(THIS_MODULE))
> return -ENODEV;
> return 0;
> }
>
> static void vfio_vhost_mdev_release(void *device_data)
> {
> module_put(THIS_MODULE);
> }
>
> static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> .name = "vfio-vhost-mdev",
> .open = vfio_vhost_mdev_open,
> .release = vfio_vhost_mdev_release,
> };
>
> static int vhost_mdev_probe(struct device *dev)
> {
> struct mdev_device *mdev = to_mdev_device(dev);
>
> ... Check the mdev device_id proposed in ...
> ... https://lkml.org/lkml/2019/9/12/151 ...
>
> return vfio_add_group_dev(dev, &vfio_vhost_mdev_dev_ops, mdev);
> }
>
> static void vhost_mdev_remove(struct device *dev)
> {
> vfio_del_group_dev(dev);
> }
>
> static struct mdev_driver vhost_mdev_driver = {
> .name = "vhost_mdev",
> .probe = vhost_mdev_probe,
> .remove = vhost_mdev_remove,
> };
>
> So we can bind above mdev driver to the virtio-mdev compatible
> mdev devices when we want to use vhost-mdev.
>
> After binding above driver to the mdev device, we can setup IOMMU
> via VFIO and get VFIO device fd of this mdev device, and pass it
> to vhost fd (/dev/vhost-mdev) with a SET_BACKEND ioctl.
>
> Thanks,
> Tiwei

Yes, something like this.

Thanks

>> Thanks
>>
>>
>>>> Yes, it is.
>>>>
>>>> Thanks
>>>>
>>>>