Changes in V3:
1. Change the device page allocation method of the Intel mic layer in Patch 1 to
align with the vring allocation.
2. Move the vring physical address changes in mmap callback from Prach 3 to 1.
3. Use must_be_zero instead of directly deleting used_address_updated in
Patch 2 to avoid the influence of ABI change.
(I tried to use dma_mmap_coherent api in Patch 4 as Christoph suggested, but
met some issues explained here https://lore.kernel.org/patchwork/patch/1313327/,
so there is no change to Patch 4, and still can't find a better way than
patch 3)
The original vop driver only supports dma coherent device, as it allocates and
maps vring by _get_free_pages and dma_map_single, but not use
dma_sync_single_for_cpu/device to sync the updates of device_page/vring between
EP and RC, which will cause memory synchronization problem for device don't
support hardware dma coherent.
And allocate vrings use dma_alloc_coherent is a common way in kernel, as the
memory interacted between two systems should use consistent memory to avoid
caching effects. So here add noncoherent platform support for vop driver.
Also add some related dma changes to make sure noncoherent platform works
well.
Sherry Sun (4):
misc: vop: change the way of allocating vring and device page
misc: vop: do not allocate and reassign the used ring
misc: vop: simply return the saved dma address instead of virt_to_phys
misc: vop: mapping kernel memory to user space as noncached
drivers/misc/mic/bus/vop_bus.h | 2 +
drivers/misc/mic/host/mic_boot.c | 8 +++
drivers/misc/mic/host/mic_main.c | 15 ++----
drivers/misc/mic/vop/vop_debugfs.c | 2 -
drivers/misc/mic/vop/vop_main.c | 48 +++--------------
drivers/misc/mic/vop/vop_vringh.c | 84 +++++++-----------------------
include/uapi/linux/mic_common.h | 5 +-
7 files changed, 42 insertions(+), 122 deletions(-)
--
2.17.1
Allocate vrings use dma_alloc_coherent is a common way in kernel. As the
memory interacted between two systems should use consistent memory to
avoid caching effects, same as device page memory.
The orginal way use __get_free_pages and dma_map_single to allocate and
map vring, but not use dma_sync_single_for_cpu/device api to sync the
changes of vring between EP and RC, which will cause memory
synchronization problem for those devices which don't support hardware
dma coherent.
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: Sherry Sun <[email protected]>
---
drivers/misc/mic/host/mic_main.c | 15 +++--------
drivers/misc/mic/vop/vop_vringh.c | 43 +++++++++----------------------
2 files changed, 16 insertions(+), 42 deletions(-)
diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c
index ea4608527ea0..ebacaa35cb47 100644
--- a/drivers/misc/mic/host/mic_main.c
+++ b/drivers/misc/mic/host/mic_main.c
@@ -10,6 +10,7 @@
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/poll.h>
+#include <linux/dma-mapping.h>
#include <linux/mic_common.h>
#include "../common/mic_dev.h"
@@ -48,18 +49,11 @@ static struct ida g_mic_ida;
/* Initialize the device page */
static int mic_dp_init(struct mic_device *mdev)
{
- mdev->dp = kzalloc(MIC_DP_SIZE, GFP_KERNEL);
+ mdev->dp = dma_alloc_coherent(&mdev->pdev->dev, MIC_DP_SIZE,
+ &mdev->dp_dma_addr, GFP_KERNEL);
if (!mdev->dp)
return -ENOMEM;
- mdev->dp_dma_addr = mic_map_single(mdev,
- mdev->dp, MIC_DP_SIZE);
- if (mic_map_error(mdev->dp_dma_addr)) {
- kfree(mdev->dp);
- dev_err(&mdev->pdev->dev, "%s %d err %d\n",
- __func__, __LINE__, -ENOMEM);
- return -ENOMEM;
- }
mdev->ops->write_spad(mdev, MIC_DPLO_SPAD, mdev->dp_dma_addr);
mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev->dp_dma_addr >> 32);
return 0;
@@ -68,8 +62,7 @@ static int mic_dp_init(struct mic_device *mdev)
/* Uninitialize the device page */
static void mic_dp_uninit(struct mic_device *mdev)
{
- mic_unmap_single(mdev, mdev->dp_dma_addr, MIC_DP_SIZE);
- kfree(mdev->dp);
+ dma_free_coherent(&mdev->pdev->dev, MIC_DP_SIZE, mdev->dp, mdev->dp_dma_addr);
}
/**
diff --git a/drivers/misc/mic/vop/vop_vringh.c b/drivers/misc/mic/vop/vop_vringh.c
index 7014ffe88632..2cc3c22482b5 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -298,9 +298,8 @@ static int vop_virtio_add_device(struct vop_vdev *vdev,
mutex_init(&vvr->vr_mutex);
vr_size = PAGE_ALIGN(round_up(vring_size(num, MIC_VIRTIO_RING_ALIGN), 4) +
sizeof(struct _mic_vring_info));
- vr->va = (void *)
- __get_free_pages(GFP_KERNEL | __GFP_ZERO,
- get_order(vr_size));
+ vr->va = dma_alloc_coherent(vop_dev(vdev), vr_size, &vr_addr,
+ GFP_KERNEL);
if (!vr->va) {
ret = -ENOMEM;
dev_err(vop_dev(vdev), "%s %d err %d\n",
@@ -310,15 +309,6 @@ static int vop_virtio_add_device(struct vop_vdev *vdev,
vr->len = vr_size;
vr->info = vr->va + round_up(vring_size(num, MIC_VIRTIO_RING_ALIGN), 4);
vr->info->magic = cpu_to_le32(MIC_MAGIC + vdev->virtio_id + i);
- vr_addr = dma_map_single(&vpdev->dev, vr->va, vr_size,
- DMA_BIDIRECTIONAL);
- if (dma_mapping_error(&vpdev->dev, vr_addr)) {
- free_pages((unsigned long)vr->va, get_order(vr_size));
- ret = -ENOMEM;
- dev_err(vop_dev(vdev), "%s %d err %d\n",
- __func__, __LINE__, ret);
- goto err;
- }
vqconfig[i].address = cpu_to_le64(vr_addr);
vring_init(&vr->vr, num, vr->va, MIC_VIRTIO_RING_ALIGN);
@@ -339,11 +329,8 @@ static int vop_virtio_add_device(struct vop_vdev *vdev,
dev_dbg(&vpdev->dev,
"%s %d index %d va %p info %p vr_size 0x%x\n",
__func__, __LINE__, i, vr->va, vr->info, vr_size);
- vvr->buf = (void *)__get_free_pages(GFP_KERNEL,
- get_order(VOP_INT_DMA_BUF_SIZE));
- vvr->buf_da = dma_map_single(&vpdev->dev,
- vvr->buf, VOP_INT_DMA_BUF_SIZE,
- DMA_BIDIRECTIONAL);
+ vvr->buf = dma_alloc_coherent(vop_dev(vdev), VOP_INT_DMA_BUF_SIZE,
+ &vvr->buf_da, GFP_KERNEL);
}
snprintf(irqname, sizeof(irqname), "vop%dvirtio%d", vpdev->index,
@@ -382,10 +369,8 @@ static int vop_virtio_add_device(struct vop_vdev *vdev,
for (j = 0; j < i; j++) {
struct vop_vringh *vvr = &vdev->vvr[j];
- dma_unmap_single(&vpdev->dev, le64_to_cpu(vqconfig[j].address),
- vvr->vring.len, DMA_BIDIRECTIONAL);
- free_pages((unsigned long)vvr->vring.va,
- get_order(vvr->vring.len));
+ dma_free_coherent(vop_dev(vdev), vvr->vring.len, vvr->vring.va,
+ le64_to_cpu(vqconfig[j].address));
}
return ret;
}
@@ -433,17 +418,12 @@ static void vop_virtio_del_device(struct vop_vdev *vdev)
for (i = 0; i < vdev->dd->num_vq; i++) {
struct vop_vringh *vvr = &vdev->vvr[i];
- dma_unmap_single(&vpdev->dev,
- vvr->buf_da, VOP_INT_DMA_BUF_SIZE,
- DMA_BIDIRECTIONAL);
- free_pages((unsigned long)vvr->buf,
- get_order(VOP_INT_DMA_BUF_SIZE));
+ dma_free_coherent(vop_dev(vdev), VOP_INT_DMA_BUF_SIZE,
+ vvr->buf, vvr->buf_da);
vringh_kiov_cleanup(&vvr->riov);
vringh_kiov_cleanup(&vvr->wiov);
- dma_unmap_single(&vpdev->dev, le64_to_cpu(vqconfig[i].address),
- vvr->vring.len, DMA_BIDIRECTIONAL);
- free_pages((unsigned long)vvr->vring.va,
- get_order(vvr->vring.len));
+ dma_free_coherent(vop_dev(vdev), vvr->vring.len, vvr->vring.va,
+ le64_to_cpu(vqconfig[i].address));
}
/*
* Order the type update with previous stores. This write barrier
@@ -1047,6 +1027,7 @@ vop_query_offset(struct vop_vdev *vdev, unsigned long offset,
unsigned long *size, unsigned long *pa)
{
struct vop_device *vpdev = vdev->vpdev;
+ struct mic_vqconfig *vqconfig = mic_vq_config(vdev->dd);
unsigned long start = MIC_DP_SIZE;
int i;
@@ -1068,7 +1049,7 @@ vop_query_offset(struct vop_vdev *vdev, unsigned long offset,
struct vop_vringh *vvr = &vdev->vvr[i];
if (offset == start) {
- *pa = virt_to_phys(vvr->vring.va);
+ *pa = vqconfig[i].address;
*size = vvr->vring.len;
return 0;
}
--
2.17.1
We don't need to allocate and reassign the used ring here and remove the
used_address_updated flag.Since RC has allocated the entire vring,
including the used ring. Simply add the corresponding offset can get the
used ring address.
If following the orginal way to reassign the used ring, will encounter a
problem. When host finished with descriptor, it will update the used
ring with putused_kern api, if reassign used ring at EP side, used
ring will be io device memory for RC, use memcpy in putused_kern will
cause kernel panic.
Signed-off-by: Sherry Sun <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
---
drivers/misc/mic/vop/vop_debugfs.c | 2 --
drivers/misc/mic/vop/vop_main.c | 48 ++++--------------------------
drivers/misc/mic/vop/vop_vringh.c | 31 -------------------
include/uapi/linux/mic_common.h | 5 ++--
4 files changed, 8 insertions(+), 78 deletions(-)
diff --git a/drivers/misc/mic/vop/vop_debugfs.c b/drivers/misc/mic/vop/vop_debugfs.c
index 9d4f175f4dd1..05eca4a98585 100644
--- a/drivers/misc/mic/vop/vop_debugfs.c
+++ b/drivers/misc/mic/vop/vop_debugfs.c
@@ -79,8 +79,6 @@ static int vop_dp_show(struct seq_file *s, void *pos)
seq_printf(s, "Vdev reset %d\n", dc->vdev_reset);
seq_printf(s, "Guest Ack %d ", dc->guest_ack);
seq_printf(s, "Host ack %d\n", dc->host_ack);
- seq_printf(s, "Used address updated %d ",
- dc->used_address_updated);
seq_printf(s, "Vdev 0x%llx\n", dc->vdev);
seq_printf(s, "c2h doorbell %d ", dc->c2h_vdev_db);
seq_printf(s, "h2c doorbell %d\n", dc->h2c_vdev_db);
diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 714b94f42d38..1ccc94dfd6ac 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -250,10 +250,6 @@ static void vop_del_vq(struct virtqueue *vq, int n)
struct _vop_vdev *vdev = to_vopvdev(vq->vdev);
struct vop_device *vpdev = vdev->vpdev;
- dma_unmap_single(&vpdev->dev, vdev->used[n],
- vdev->used_size[n], DMA_BIDIRECTIONAL);
- free_pages((unsigned long)vdev->used_virt[n],
- get_order(vdev->used_size[n]));
vring_del_virtqueue(vq);
vpdev->hw_ops->unmap(vpdev, vdev->vr[n]);
vdev->vr[n] = NULL;
@@ -340,8 +336,8 @@ static struct virtqueue *vop_find_vq(struct virtio_device *dev,
vdev->used_size[index] = PAGE_ALIGN(sizeof(__u16) * 3 +
sizeof(struct vring_used_elem) *
le16_to_cpu(config.num));
- used = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
- get_order(vdev->used_size[index]));
+ used = va + PAGE_ALIGN(sizeof(struct vring_desc) * le16_to_cpu(config.num) +
+ sizeof(__u16) * (3 + le16_to_cpu(config.num)));
vdev->used_virt[index] = used;
if (!used) {
err = -ENOMEM;
@@ -355,27 +351,15 @@ static struct virtqueue *vop_find_vq(struct virtio_device *dev,
name, used);
if (!vq) {
err = -ENOMEM;
- goto free_used;
+ goto unmap;
}
- vdev->used[index] = dma_map_single(&vpdev->dev, used,
- vdev->used_size[index],
- DMA_BIDIRECTIONAL);
- if (dma_mapping_error(&vpdev->dev, vdev->used[index])) {
- err = -ENOMEM;
- dev_err(_vop_dev(vdev), "%s %d err %d\n",
- __func__, __LINE__, err);
- goto del_vq;
- }
+ vdev->used[index] = config.address + PAGE_ALIGN(sizeof(struct vring_desc) * le16_to_cpu(config.num) +
+ sizeof(__u16) * (3 + le16_to_cpu(config.num)));
writeq(vdev->used[index], &vqconfig->used_address);
vq->priv = vdev;
return vq;
-del_vq:
- vring_del_virtqueue(vq);
-free_used:
- free_pages((unsigned long)used,
- get_order(vdev->used_size[index]));
unmap:
vpdev->hw_ops->unmap(vpdev, vdev->vr[index]);
return ERR_PTR(err);
@@ -388,9 +372,7 @@ static int vop_find_vqs(struct virtio_device *dev, unsigned nvqs,
struct irq_affinity *desc)
{
struct _vop_vdev *vdev = to_vopvdev(dev);
- struct vop_device *vpdev = vdev->vpdev;
- struct mic_device_ctrl __iomem *dc = vdev->dc;
- int i, err, retry, queue_idx = 0;
+ int i, err, queue_idx = 0;
/* We must have this many virtqueues. */
if (nvqs > ioread8(&vdev->desc->num_vq))
@@ -412,24 +394,6 @@ static int vop_find_vqs(struct virtio_device *dev, unsigned nvqs,
}
}
- iowrite8(1, &dc->used_address_updated);
- /*
- * Send an interrupt to the host to inform it that used
- * rings have been re-assigned.
- */
- vpdev->hw_ops->send_intr(vpdev, vdev->c2h_vdev_db);
- for (retry = 100; --retry;) {
- if (!ioread8(&dc->used_address_updated))
- break;
- msleep(100);
- }
-
- dev_dbg(_vop_dev(vdev), "%s: retry: %d\n", __func__, retry);
- if (!retry) {
- err = -ENODEV;
- goto error;
- }
-
return 0;
error:
vop_del_vqs(dev);
diff --git a/drivers/misc/mic/vop/vop_vringh.c b/drivers/misc/mic/vop/vop_vringh.c
index 2cc3c22482b5..cc928d45af5a 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -53,33 +53,6 @@ static void _vop_notify(struct vringh *vrh)
vpdev->hw_ops->send_intr(vpdev, db);
}
-static void vop_virtio_init_post(struct vop_vdev *vdev)
-{
- struct mic_vqconfig *vqconfig = mic_vq_config(vdev->dd);
- struct vop_device *vpdev = vdev->vpdev;
- int i, used_size;
-
- for (i = 0; i < vdev->dd->num_vq; i++) {
- used_size = PAGE_ALIGN(sizeof(u16) * 3 +
- sizeof(struct vring_used_elem) *
- le16_to_cpu(vqconfig->num));
- if (!le64_to_cpu(vqconfig[i].used_address)) {
- dev_warn(vop_dev(vdev), "used_address zero??\n");
- continue;
- }
- vdev->vvr[i].vrh.vring.used =
- (void __force *)vpdev->hw_ops->remap(
- vpdev,
- le64_to_cpu(vqconfig[i].used_address),
- used_size);
- }
-
- vdev->dc->used_address_updated = 0;
-
- dev_info(vop_dev(vdev), "%s: device type %d LINKUP\n",
- __func__, vdev->virtio_id);
-}
-
static inline void vop_virtio_device_reset(struct vop_vdev *vdev)
{
int i;
@@ -130,9 +103,6 @@ static void vop_bh_handler(struct work_struct *work)
struct vop_vdev *vdev = container_of(work, struct vop_vdev,
virtio_bh_work);
- if (vdev->dc->used_address_updated)
- vop_virtio_init_post(vdev);
-
if (vdev->dc->vdev_reset)
vop_virtio_device_reset(vdev);
@@ -250,7 +220,6 @@ static void vop_init_device_ctrl(struct vop_vdev *vdev,
dc->guest_ack = 0;
dc->vdev_reset = 0;
dc->host_ack = 0;
- dc->used_address_updated = 0;
dc->c2h_vdev_db = -1;
dc->h2c_vdev_db = -1;
vdev->dc = dc;
diff --git a/include/uapi/linux/mic_common.h b/include/uapi/linux/mic_common.h
index 504e523f702c..fe660d486b20 100644
--- a/include/uapi/linux/mic_common.h
+++ b/include/uapi/linux/mic_common.h
@@ -56,8 +56,7 @@ struct mic_device_desc {
* @vdev_reset: Set to 1 by guest to indicate virtio device has been reset.
* @guest_ack: Set to 1 by guest to ack a command.
* @host_ack: Set to 1 by host to ack a command.
- * @used_address_updated: Set to 1 by guest when the used address should be
- * updated.
+ * @must_be_zero: Reserved because this bit is no longer needed.
* @c2h_vdev_db: The doorbell number to be used by guest. Set by host.
* @h2c_vdev_db: The doorbell number to be used by host. Set by guest.
*/
@@ -67,7 +66,7 @@ struct mic_device_ctrl {
__u8 vdev_reset;
__u8 guest_ack;
__u8 host_ack;
- __u8 used_address_updated;
+ __u8 must_be_zero;
__s8 c2h_vdev_db;
__s8 h2c_vdev_db;
} __attribute__ ((aligned(8)));
--
2.17.1
Mapping kernel space memory to user space as noncached, since user space
need check the updates of avail_idx and device page flags timely.
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: Sherry Sun <[email protected]>
---
drivers/misc/mic/vop/vop_vringh.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/mic/vop/vop_vringh.c b/drivers/misc/mic/vop/vop_vringh.c
index b5612183dcb8..b75c2b713a3b 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -1058,7 +1058,7 @@ static int vop_mmap(struct file *f, struct vm_area_struct *vma)
}
err = remap_pfn_range(vma, vma->vm_start + offset,
pa >> PAGE_SHIFT, size,
- vma->vm_page_prot);
+ pgprot_noncached(vma->vm_page_prot));
if (err)
goto ret;
size_rem -= size;
--
2.17.1
On Thu, Oct 22, 2020 at 01:06:36PM +0800, Sherry Sun wrote:
> We don't need to allocate and reassign the used ring here and remove the
> used_address_updated flag.Since RC has allocated the entire vring,
> including the used ring. Simply add the corresponding offset can get the
> used ring address.
Someone needs to verify this vs the existing intel implementations.
> - used = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> - get_order(vdev->used_size[index]));
> + used = va + PAGE_ALIGN(sizeof(struct vring_desc) * le16_to_cpu(config.num) +
This adds an over 80 char line.
> + vdev->used[index] = config.address + PAGE_ALIGN(sizeof(struct vring_desc) * le16_to_cpu(config.num) +
Again.
> static int mic_dp_init(struct mic_device *mdev)
> {
> - mdev->dp = kzalloc(MIC_DP_SIZE, GFP_KERNEL);
> + mdev->dp = dma_alloc_coherent(&mdev->pdev->dev, MIC_DP_SIZE,
> + &mdev->dp_dma_addr, GFP_KERNEL);
> if (!mdev->dp)
> return -ENOMEM;
>
> - mdev->dp_dma_addr = mic_map_single(mdev,
> - mdev->dp, MIC_DP_SIZE);
> - if (mic_map_error(mdev->dp_dma_addr)) {
> - kfree(mdev->dp);
> - dev_err(&mdev->pdev->dev, "%s %d err %d\n",
> - __func__, __LINE__, -ENOMEM);
> - return -ENOMEM;
> - }
> mdev->ops->write_spad(mdev, MIC_DPLO_SPAD, mdev->dp_dma_addr);
> mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev->dp_dma_addr >> 32);
> return 0;
> @@ -68,8 +62,7 @@ static int mic_dp_init(struct mic_device *mdev)
> /* Uninitialize the device page */
> static void mic_dp_uninit(struct mic_device *mdev)
> {
> - mic_unmap_single(mdev, mdev->dp_dma_addr, MIC_DP_SIZE);
> - kfree(mdev->dp);
> + dma_free_coherent(&mdev->pdev->dev, MIC_DP_SIZE, mdev->dp, mdev->dp_dma_addr);
> }
This adds an over 80 char line. Also please just kill mic_dp_init and
mic_dp_uninit and inline those into the callers.
> + vvr->buf = dma_alloc_coherent(vop_dev(vdev), VOP_INT_DMA_BUF_SIZE,
> + &vvr->buf_da, GFP_KERNEL);
Another overly long line.
> @@ -1068,7 +1049,7 @@ vop_query_offset(struct vop_vdev *vdev, unsigned long offset,
> struct vop_vringh *vvr = &vdev->vvr[i];
>
> if (offset == start) {
> - *pa = virt_to_phys(vvr->vring.va);
> + *pa = vqconfig[i].address;
vqconfig[i].address is a __le64, so this needs an endian swap.
But more importantly the caller of vop_query_offset, vop_mmap, uses
remap_pfn_range and pa. You cannot mix remap_pfn_range with DMA
coherent allocations, it can only be mmaped using dma_mmap_coherent.
On Thu, Oct 22, 2020 at 01:06:38PM +0800, Sherry Sun wrote:
> Mapping kernel space memory to user space as noncached, since user space
> need check the updates of avail_idx and device page flags timely.
>
> Signed-off-by: Joakim Zhang <[email protected]>
> Signed-off-by: Sherry Sun <[email protected]>
> ---
> drivers/misc/mic/vop/vop_vringh.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/misc/mic/vop/vop_vringh.c b/drivers/misc/mic/vop/vop_vringh.c
> index b5612183dcb8..b75c2b713a3b 100644
> --- a/drivers/misc/mic/vop/vop_vringh.c
> +++ b/drivers/misc/mic/vop/vop_vringh.c
> @@ -1058,7 +1058,7 @@ static int vop_mmap(struct file *f, struct vm_area_struct *vma)
> }
> err = remap_pfn_range(vma, vma->vm_start + offset,
> pa >> PAGE_SHIFT, size,
> - vma->vm_page_prot);
> + pgprot_noncached(vma->vm_page_prot));
Again, memory allocated using dma_alloc_coherent can only be mapped
using dma_mmap_coherent, which will use the right attributes for the
mapping, which often are cached.
Hi Christoph,
>
> > static int mic_dp_init(struct mic_device *mdev) {
> > - mdev->dp = kzalloc(MIC_DP_SIZE, GFP_KERNEL);
> > + mdev->dp = dma_alloc_coherent(&mdev->pdev->dev, MIC_DP_SIZE,
> > + &mdev->dp_dma_addr, GFP_KERNEL);
> > if (!mdev->dp)
> > return -ENOMEM;
> >
> > - mdev->dp_dma_addr = mic_map_single(mdev,
> > - mdev->dp, MIC_DP_SIZE);
> > - if (mic_map_error(mdev->dp_dma_addr)) {
> > - kfree(mdev->dp);
> > - dev_err(&mdev->pdev->dev, "%s %d err %d\n",
> > - __func__, __LINE__, -ENOMEM);
> > - return -ENOMEM;
> > - }
> > mdev->ops->write_spad(mdev, MIC_DPLO_SPAD, mdev-
> >dp_dma_addr);
> > mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev-
> >dp_dma_addr >> 32);
> > return 0;
> > @@ -68,8 +62,7 @@ static int mic_dp_init(struct mic_device *mdev)
> > /* Uninitialize the device page */
> > static void mic_dp_uninit(struct mic_device *mdev) {
> > - mic_unmap_single(mdev, mdev->dp_dma_addr, MIC_DP_SIZE);
> > - kfree(mdev->dp);
> > + dma_free_coherent(&mdev->pdev->dev, MIC_DP_SIZE, mdev->dp,
> > +mdev->dp_dma_addr);
> > }
>
> This adds an over 80 char line. Also please just kill mic_dp_init and
> mic_dp_uninit and inline those into the callers.
Okay, will fix them.
>
> > + vvr->buf = dma_alloc_coherent(vop_dev(vdev),
> VOP_INT_DMA_BUF_SIZE,
> > + &vvr->buf_da, GFP_KERNEL);
>
> Another overly long line.
>
> > @@ -1068,7 +1049,7 @@ vop_query_offset(struct vop_vdev *vdev,
> unsigned long offset,
> > struct vop_vringh *vvr = &vdev->vvr[i];
> >
> > if (offset == start) {
> > - *pa = virt_to_phys(vvr->vring.va);
> > + *pa = vqconfig[i].address;
>
> vqconfig[i].address is a __le64, so this needs an endian swap.
>
> But more importantly the caller of vop_query_offset, vop_mmap, uses
> remap_pfn_range and pa. You cannot mix remap_pfn_range with DMA
> coherent allocations, it can only be mmaped using dma_mmap_coherent.
Will change to use dma_mmap_coherent in V4.
Best regards
Sherry
Hi Greg & Christoph,
> Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> ring
>
> On Thu, Oct 22, 2020 at 01:06:36PM +0800, Sherry Sun wrote:
> > We don't need to allocate and reassign the used ring here and remove
> > the used_address_updated flag.Since RC has allocated the entire vring,
> > including the used ring. Simply add the corresponding offset can get
> > the used ring address.
>
> Someone needs to verify this vs the existing intel implementations.
Hi Greg, @[email protected], sorry I don't have the Intel MIC devices so cannot test it, do you know anyone who can help test this patch changes on the Intel MIC platform? Thanks.
>
> > - used = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > - get_order(vdev->used_size[index]));
> > + used = va + PAGE_ALIGN(sizeof(struct vring_desc) *
> > +le16_to_cpu(config.num) +
>
> This adds an over 80 char line.
Hi Christoph, will fix it in V4, thanks.
>
> > + vdev->used[index] = config.address + PAGE_ALIGN(sizeof(struct
> > +vring_desc) * le16_to_cpu(config.num) +
>
> Again.
Best regards
Sherry
Hi Sudeep & Ashutosh & Vincent,
Can you help test the patch about removing the codes of reassign used ring, and comment on the impact for Intel MIC platform?
Thanks for any help.
Best regards
Sherry
> Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> ring
>
> On Mon, Oct 26, 2020 at 03:04:45AM +0000, Sherry Sun wrote:
> > Hi Greg & Christoph,
> >
> > > Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign
> > > the used ring
> > >
> > > On Thu, Oct 22, 2020 at 01:06:36PM +0800, Sherry Sun wrote:
> > > > We don't need to allocate and reassign the used ring here and
> > > > remove the used_address_updated flag.Since RC has allocated the
> > > > entire vring, including the used ring. Simply add the
> > > > corresponding offset can get the used ring address.
> > >
> > > Someone needs to verify this vs the existing intel implementations.
> >
> > Hi Greg, @[email protected], sorry I don't have the Intel MIC
> devices so cannot test it, do you know anyone who can help test this patch
> changes on the Intel MIC platform? Thanks.
>
> Why not ask the authors/maintainers of that code?
>
> greg k-h
On Mon, Oct 26, 2020 at 03:04:45AM +0000, Sherry Sun wrote:
> Hi Greg & Christoph,
>
> > Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> > ring
> >
> > On Thu, Oct 22, 2020 at 01:06:36PM +0800, Sherry Sun wrote:
> > > We don't need to allocate and reassign the used ring here and remove
> > > the used_address_updated flag.Since RC has allocated the entire vring,
> > > including the used ring. Simply add the corresponding offset can get
> > > the used ring address.
> >
> > Someone needs to verify this vs the existing intel implementations.
>
> Hi Greg, @[email protected], sorry I don't have the Intel MIC devices so cannot test it, do you know anyone who can help test this patch changes on the Intel MIC platform? Thanks.
Why not ask the authors/maintainers of that code?
greg k-h
On Tue, Oct 27, 2020 at 08:05:43AM +0100, Sherry Sun wrote:
> Can you help test the patch about removing the codes of reassign used
> ring, and comment on the impact for Intel MIC platform? Thanks for
> any help.
I don't have access to MIC hardware myself, either.
But this patch is quite certainly going to break it since guests using a
kernel without the patch will not work with hosts with a kernel with the
patch.
On Wed, Oct 28, 2020 at 02:47:49AM +0100, Sherry Sun wrote:
> > Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> > ring
> >
> > On Tue, Oct 27, 2020 at 08:05:43AM +0100, Sherry Sun wrote:
> > > Can you help test the patch about removing the codes of reassign used
> > > ring, and comment on the impact for Intel MIC platform? Thanks for
> > > any help.
> >
> > I don't have access to MIC hardware myself, either.
> >
> > But this patch is quite certainly going to break it since guests using a kernel
> > without the patch will not work with hosts with a kernel with the patch.
>
> Thanks for your reply.
> This patch can be used by both guests and hosts.
> I have tested it on imx8qm platform(both guest and host are ARM64 architecture), and it works well.
> So I guess Intel MIC won't meet big problems when both guest and host
> apply this patch. But it is best if it can be tested.
The guest and host are different systems and it should be possible to
upgrade the host kernel without being forced to upgrade the guest
kernel, and vice-versa. This patch breaks that.
If MIC hardware has no users and the drivers are being deleted then of
course backward compatibility doesn't matter.
Hi Sudeep,
> Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> ring
>
> On Wed, 2020-10-28 at 01:47 +0000, Sherry Sun wrote:
> > Hi Vincent,
> >
> > > Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign
> > > the used ring
> > >
> > > On Tue, Oct 27, 2020 at 08:05:43AM +0100, Sherry Sun wrote:
> > > > Can you help test the patch about removing the codes of reassign
> > > > used ring, and comment on the impact for Intel MIC platform?
> > > > Thanks for any help.
> > >
> > > I don't have access to MIC hardware myself, either.
> > >
> > > But this patch is quite certainly going to break it since guests
> > > using a kernel without the patch will not work with hosts with a
> > > kernel with the patch.
> >
> > Thanks for your reply.
> > This patch can be used by both guests and hosts.
> > I have tested it on imx8qm platform(both guest and host are ARM64
> > architecture), and it works well.
> > So I guess Intel MIC won't meet big problems when both guest and
>
> Hi Greg/Sherry,
>
> Both Ashutosh and I have moved on to other projects. The MIC devices have
> been discontinued. I have just sent across a patch to remove the MIC drivers
> from the kernel tree.
>
> We are very glad to see that Sherry is able to reuse some of the VOP logic
> and it is working well. It is best if the MIC drivers are removed so Sherry can
> add the specific VOP logic required for imx8qm subsequently without having
> to worry about other driver dependencies.
> Hoping this results in a cleaner imx8qm driver moving forward.
I'm ok with your patch.
Since you have deprecated the MIC related code, may I ask do you have a better solution instead of vop/scif?
Best regards
Sherry
>
> Thanks,
> Sudeep Dutt
Hi Vincent,
> Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> ring
>
> On Tue, Oct 27, 2020 at 08:05:43AM +0100, Sherry Sun wrote:
> > Can you help test the patch about removing the codes of reassign used
> > ring, and comment on the impact for Intel MIC platform? Thanks for
> > any help.
>
> I don't have access to MIC hardware myself, either.
>
> But this patch is quite certainly going to break it since guests using a kernel
> without the patch will not work with hosts with a kernel with the patch.
Thanks for your reply.
This patch can be used by both guests and hosts.
I have tested it on imx8qm platform(both guest and host are ARM64 architecture), and it works well.
So I guess Intel MIC won't meet big problems when both guest and host apply this patch. But it is best if it can be tested.
Best regards
Sherry
(resending from the kernel.org address after getting bounces again)
On Wed, Oct 28, 2020 at 7:29 AM Sherry Sun <[email protected]> wrote:
> > Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> >
> > Both Ashutosh and I have moved on to other projects. The MIC devices have
> > been discontinued. I have just sent across a patch to remove the MIC drivers
> > from the kernel tree.
> >
> > We are very glad to see that Sherry is able to reuse some of the VOP logic
> > and it is working well. It is best if the MIC drivers are removed so Sherry can
> > add the specific VOP logic required for imx8qm subsequently without having
> > to worry about other driver dependencies.
> > Hoping this results in a cleaner imx8qm driver moving forward.
>
> I'm ok with your patch.
> Since you have deprecated the MIC related code, may I ask do you have
> a better solution instead of vop/scif?
I think we should try to do something on top of the PCIe endpoint subsystem
to make it work across arbitrary combinations of host and device
implementations,
and provide a superset of what the MIC driver, (out-of-tree) Bluefield endpoint
driver, and the NTB subsystem as well as a couple of others used to do,
each of them tunneling block/network/serial/... over a PCIe link of some
sort, usually with virtio.
At the moment, there is only one driver for the endpoint framework in the
kernel, in drivers/pci/endpoint/functions/pci-epf-test.c, but I think this can
serve as a starting point.
The PCI endpoint subsystem already uses configfs for configuring the
available devices, and this seems like a good fit for making it work
in general. However, there are a number of use cases that have
somewhat conflicting requirements, so the first step would be to
figure out what everyone actually needs for virtio communication.
These are some of the main differences that I have noticed in the
past:
- The simple case would be to use one PCIe endpoint device
for each virtio device, but I think this needs to be multiplexed
so that hardware that only supports a single PCIe endpoint
can still have multiple virtio devices tunneled through it.
- While sometimes the configuration is hardcoded in the driver, ideally
the type of virtio device(s) that is tunneled over the PCIe link should
be configurable. The configuration of the endpoint device itself is
done on the machine running on the endpoint side, but for the
virtio devices, this might be either on the host or the endpoint.
Not sure if one of the two ways is common enough, or we have to
allow both.
- When the link is configured, you still need one side to provide a
virtio device host implementation, while the other side would
run the normal virtio device driver. Again, these could be done
either way, and it is independent of which side has configured
the link, and we might want to only allow one of the two options,
or do both, or tie it to who configures it (e.g. the side that creates
the device must be the virtio device host, while the other side
just sees the device pop up and uses a virtio driver).
Arnd
Hi Arnd,
> Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> ring
>
> (resending from the kernel.org address after getting bounces again)
>
> On Wed, Oct 28, 2020 at 7:29 AM Sherry Sun <[email protected]> wrote:
> > > Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign
> > > the used
> > >
> > > Both Ashutosh and I have moved on to other projects. The MIC devices
> > > have been discontinued. I have just sent across a patch to remove
> > > the MIC drivers from the kernel tree.
> > >
> > > We are very glad to see that Sherry is able to reuse some of the VOP
> > > logic and it is working well. It is best if the MIC drivers are
> > > removed so Sherry can add the specific VOP logic required for imx8qm
> > > subsequently without having to worry about other driver dependencies.
> > > Hoping this results in a cleaner imx8qm driver moving forward.
> >
> > I'm ok with your patch.
> > Since you have deprecated the MIC related code, may I ask do you have
> > a better solution instead of vop/scif?
>
> I think we should try to do something on top of the PCIe endpoint subsystem
> to make it work across arbitrary combinations of host and device
> implementations, and provide a superset of what the MIC driver, (out-of-
> tree) Bluefield endpoint driver, and the NTB subsystem as well as a couple of
> others used to do, each of them tunneling block/network/serial/... over a
> PCIe link of some sort, usually with virtio.
>
> At the moment, there is only one driver for the endpoint framework in the
> kernel, in drivers/pci/endpoint/functions/pci-epf-test.c, but I think this can
> serve as a starting point.
>
Thanks for your detailed reply.
Yes, the PCIe endpoint subsystem is the base code, actually we have implemented a set of pci endpoint code similar to pci-epf-test.c, combine with vop (Virtio Over PCIe).
But now the vop code is going to be removed, we planned to change to NTB framework, I saw Kishon has done some jobs based on NTB and PCIe endpoint subsystem, will get a deep look. Maybe it is a good solution.
Best regards
Sherry
> The PCI endpoint subsystem already uses configfs for configuring the
> available devices, and this seems like a good fit for making it work in general.
> However, there are a number of use cases that have somewhat conflicting
> requirements, so the first step would be to figure out what everyone actually
> needs for virtio communication.
>
> These are some of the main differences that I have noticed in the
> past:
>
> - The simple case would be to use one PCIe endpoint device
> for each virtio device, but I think this needs to be multiplexed
> so that hardware that only supports a single PCIe endpoint
> can still have multiple virtio devices tunneled through it.
>
> - While sometimes the configuration is hardcoded in the driver, ideally
> the type of virtio device(s) that is tunneled over the PCIe link should
> be configurable. The configuration of the endpoint device itself is
> done on the machine running on the endpoint side, but for the
> virtio devices, this might be either on the host or the endpoint.
> Not sure if one of the two ways is common enough, or we have to
> allow both.
>
> - When the link is configured, you still need one side to provide a
> virtio device host implementation, while the other side would
> run the normal virtio device driver. Again, these could be done
> either way, and it is independent of which side has configured
> the link, and we might want to only allow one of the two options,
> or do both, or tie it to who configures it (e.g. the side that creates
> the device must be the virtio device host, while the other side
> just sees the device pop up and uses a virtio driver).
>
> Arnd
On Thu, Oct 29, 2020 at 3:42 AM Sherry Sun <[email protected]> wrote:
> > Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used ring
>
> Thanks for your detailed reply.
> Yes, the PCIe endpoint subsystem is the base code, actually we have implemented a set
> of pci endpoint code similar to pci-epf-test.c, combine with vop (Virtio Over PCIe).
>
> But now the vop code is going to be removed, we planned to change to NTB framework,
> I saw Kishon has done some jobs based on NTB and PCIe endpoint subsystem, will get
> a deep look. Maybe it is a good solution.
Ok, great. Regarding the VoP code, I think nothing stops you from reusing
anything you find useful in there and build on top of it, but we should
consider it as a new submission then, which means that you are free to
change the interfaces without worrying about backwards compatibility.
On the flip side, this also means we need to carefully review the interface
to make sure we can cover the requirements of as many users as possible
without adding too much complexity for cases that we do not care about.
Arnd
On Wed, Oct 28, 2020 at 04:50:36PM +0100, Arnd Bergmann wrote:
> I think we should try to do something on top of the PCIe endpoint subsystem
> to make it work across arbitrary combinations of host and device
> implementations,
> and provide a superset of what the MIC driver, (out-of-tree) Bluefield endpoint
> driver, and the NTB subsystem as well as a couple of others used to do,
> each of them tunneling block/network/serial/... over a PCIe link of some
> sort, usually with virtio.
VOP is not PCIe-specific (as demonstrated by the vop-loopback patches I
posted a while ago [1]), and it would be a shame for a replacement to be
tied to the PCIe endpoint subsystem. There are many SOCs out there
which have multiple Linux-capable processors without cache-coherency
between them. VOP is (or should I say was since I guess it's being
deleted) the closest we have in mainline to easily get generic virtio
(and not just rpmsg) running between these kind of Linux instances. If
a new replacement framework were to be PCIe-exclusive then we'd have to
invent one more framework for non-PCIe links to do pretty much the same
thing.
[1] https://lore.kernel.org/lkml/[email protected]/
On Thu, Oct 29, 2020 at 11:07:27AM +0100, Vincent Whitchurch wrote:
> On Wed, Oct 28, 2020 at 04:50:36PM +0100, Arnd Bergmann wrote:
> > I think we should try to do something on top of the PCIe endpoint subsystem
> > to make it work across arbitrary combinations of host and device
> > implementations,
> > and provide a superset of what the MIC driver, (out-of-tree) Bluefield endpoint
> > driver, and the NTB subsystem as well as a couple of others used to do,
> > each of them tunneling block/network/serial/... over a PCIe link of some
> > sort, usually with virtio.
>
> VOP is not PCIe-specific (as demonstrated by the vop-loopback patches I
> posted a while ago [1]), and it would be a shame for a replacement to be
> tied to the PCIe endpoint subsystem. There are many SOCs out there
> which have multiple Linux-capable processors without cache-coherency
> between them. VOP is (or should I say was since I guess it's being
> deleted) the closest we have in mainline to easily get generic virtio
> (and not just rpmsg) running between these kind of Linux instances. If
> a new replacement framework were to be PCIe-exclusive then we'd have to
> invent one more framework for non-PCIe links to do pretty much the same
> thing.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
If this works well, please restore the existing code and move it to a
new directory and we can take it from there, right?
thanks,
greg k-h
On Thu, Oct 29, 2020 at 11:07 AM Vincent Whitchurch
<[email protected]> wrote:
>
> On Wed, Oct 28, 2020 at 04:50:36PM +0100, Arnd Bergmann wrote:
> > I think we should try to do something on top of the PCIe endpoint subsystem
> > to make it work across arbitrary combinations of host and device
> > implementations,
> > and provide a superset of what the MIC driver, (out-of-tree) Bluefield endpoint
> > driver, and the NTB subsystem as well as a couple of others used to do,
> > each of them tunneling block/network/serial/... over a PCIe link of some
> > sort, usually with virtio.
>
> VOP is not PCIe-specific (as demonstrated by the vop-loopback patches I
> posted a while ago [1]), and it would be a shame for a replacement to be
> tied to the PCIe endpoint subsystem. There are many SOCs out there
> which have multiple Linux-capable processors without cache-coherency
> between them. VOP is (or should I say was since I guess it's being
> deleted) the closest we have in mainline to easily get generic virtio
> (and not just rpmsg) running between these kind of Linux instances. If
> a new replacement framework were to be PCIe-exclusive then we'd have to
> invent one more framework for non-PCIe links to do pretty much the same
> thing.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
Right, sorry I forgot about that. I think this means we should keep having
an abstraction between VOP (under whichever name) and the lower levels,
and be aware that it might run on any number of these:
- PCIe endpoint, with the endpoint controlling the virtio configuration
- PCIe endpoint, with the host (the side that has the pci_driver) controlling
the virtio configuration
- NTB connections
- your loopback mode
- Virtio tunnels between VM guests (see https://www.linaro.org/projects/#STR)
- Intel MIC (to be removed, but it would be wrong to make assumptions that
cannot be made on that type of hardware)
- ...
The existing VOP codebase does look like a reasonable start, though
I think we need to discuss whether the ioctl interface should be
replaced with a configfs interface, and what other changes would
be needed to make it support the generalized hardware case.
Arnd
On Thu, 29 Oct 2020 03:07:27 -0700, Vincent Whitchurch wrote:
>
> On Wed, Oct 28, 2020 at 04:50:36PM +0100, Arnd Bergmann wrote:
> > I think we should try to do something on top of the PCIe endpoint subsystem
> > to make it work across arbitrary combinations of host and device
> > implementations,
> > and provide a superset of what the MIC driver, (out-of-tree) Bluefield endpoint
> > driver, and the NTB subsystem as well as a couple of others used to do,
> > each of them tunneling block/network/serial/... over a PCIe link of some
> > sort, usually with virtio.
>
> VOP is not PCIe-specific (as demonstrated by the vop-loopback patches I
> posted a while ago [1]), and it would be a shame for a replacement to be
> tied to the PCIe endpoint subsystem. There are many SOCs out there
> which have multiple Linux-capable processors without cache-coherency
> between them. VOP is (or should I say was since I guess it's being
> deleted) the closest we have in mainline to easily get generic virtio
> (and not just rpmsg) running between these kind of Linux instances. If
> a new replacement framework were to be PCIe-exclusive then we'd have to
> invent one more framework for non-PCIe links to do pretty much the same
> thing.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
It may be possible to use VOP in other instances but it was optimized for
PCIe. The rings were specifically placed to avoid doing reads across PCIe
and only do writes across PCIe.
On Thu, 29 Oct 2020 04:53:09 -0700, Arnd Bergmann wrote:
>
> On Thu, Oct 29, 2020 at 11:07 AM Vincent Whitchurch
> <[email protected]> wrote:
> >
> > On Wed, Oct 28, 2020 at 04:50:36PM +0100, Arnd Bergmann wrote:
> > > I think we should try to do something on top of the PCIe endpoint subsystem
> > > to make it work across arbitrary combinations of host and device
> > > implementations,
> > > and provide a superset of what the MIC driver, (out-of-tree) Bluefield endpoint
> > > driver, and the NTB subsystem as well as a couple of others used to do,
> > > each of them tunneling block/network/serial/... over a PCIe link of some
> > > sort, usually with virtio.
> >
> > VOP is not PCIe-specific (as demonstrated by the vop-loopback patches I
> > posted a while ago [1]), and it would be a shame for a replacement to be
> > tied to the PCIe endpoint subsystem. There are many SOCs out there
> > which have multiple Linux-capable processors without cache-coherency
> > between them. VOP is (or should I say was since I guess it's being
> > deleted) the closest we have in mainline to easily get generic virtio
> > (and not just rpmsg) running between these kind of Linux instances. If
> > a new replacement framework were to be PCIe-exclusive then we'd have to
> > invent one more framework for non-PCIe links to do pretty much the same
> > thing.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
>
> Right, sorry I forgot about that. I think this means we should keep having
> an abstraction between VOP (under whichever name) and the lower levels,
> and be aware that it might run on any number of these:
>
> - PCIe endpoint, with the endpoint controlling the virtio configuration
> - PCIe endpoint, with the host (the side that has the pci_driver) controlling
> the virtio configuration
> - NTB connections
> - your loopback mode
> - Virtio tunnels between VM guests (see https://www.linaro.org/projects/#STR)
> - Intel MIC (to be removed, but it would be wrong to make assumptions that
> cannot be made on that type of hardware)
A virtio interface being one between host and guest is inherently
asymmetric. The whole innovation of the VOP design was to treat Linux on a
PCIe device as a guest, there was even talk at some point of the "guest"
being managed via libvirt. So here host and guest retain their specific
role/personality. The host "inserts" devices which appear in the guest
e.g. So I am not sure how this asymmetry plays in the scenarios mentioned
above.
On Thu, Oct 29, 2020 at 2:35 PM Dixit, Ashutosh
<[email protected]> wrote:
> On Thu, 29 Oct 2020 04:53:09 -0700, Arnd Bergmann wrote:
> >
> > - PCIe endpoint, with the endpoint controlling the virtio configuration
> > - PCIe endpoint, with the host (the side that has the pci_driver) controlling
> > the virtio configuration
> > - NTB connections
> > - your loopback mode
> > - Virtio tunnels between VM guests (see https://www.linaro.org/projects/#STR)
> > - Intel MIC (to be removed, but it would be wrong to make assumptions that
> > cannot be made on that type of hardware)
>
> A virtio interface being one between host and guest is inherently
> asymmetric. The whole innovation of the VOP design was to treat Linux on a
> PCIe device as a guest, there was even talk at some point of the "guest"
> being managed via libvirt. So here host and guest retain their specific
> role/personality. The host "inserts" devices which appear in the guest
> e.g. So I am not sure how this asymmetry plays in the scenarios mentioned
> above.
This is the reason I listed the PCIe endpoint mode twice. I expect that
some use cases require the same setup as MIC, with Linux on some
kind of PCIe add-on card using devices that are implemented on the host.
In other cases we may need the opposite: you may have a PCIe
add-on card that provides arbitrary services to the host in the same
way that most PCIe endpoint devices work. An example might be
a smart NIC running a standalone Linux, but implementing virtio-net
to avoid the need for custom drivers on the host.
In the endpoint framework, it is always the endpoint that decides what
PCI devices it wants to implement, but in case of VOP that device
could be either the side that configures and implements the virtio
devices, or the side that probes and uses them.
Arnd