2020-07-02 08:22:43

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

This series enhances Linux Vhost support to enable SoC-to-SoC
communication over MMIO. This series enables rpmsg communication between
two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2

1) Modify vhost to use standard Linux driver model
2) Add support in vring to access virtqueue over MMIO
3) Add vhost client driver for rpmsg
4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
rpmsg communication between two SoCs connected to each other
5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
between two SoCs connected via NTB
6) Add configfs to configure the components

UseCase1 :

VHOST RPMSG VIRTIO RPMSG
+ +
| |
| |
| |
| |
+-----v------+ +------v-------+
| Linux | | Linux |
| Endpoint | | Root Complex |
| <-----------------> |
| | | |
| SOC1 | | SOC2 |
+------------+ +--------------+

UseCase 2:

VHOST RPMSG VIRTIO RPMSG
+ +
| |
| |
| |
| |
+------v------+ +------v------+
| | | |
| HOST1 | | HOST2 |
| | | |
+------^------+ +------^------+
| |
| |
+---------------------------------------------------------------------+
| +------v------+ +------v------+ |
| | | | | |
| | EP | | EP | |
| | CONTROLLER1 | | CONTROLLER2 | |
| | <-----------------------------------> | |
| | | | | |
| | | | | |
| | | SoC With Multiple EP Instances | | |
| | | (Configured using NTB Function) | | |
| +-------------+ +-------------+ |
+---------------------------------------------------------------------+

Software Layering:

The high-level SW layering should look something like below. This series
adds support only for RPMSG VHOST, however something similar should be
done for net and scsi. With that any vhost device (PCI, NTB, Platform
device, user) can use any of the vhost client driver.


+----------------+ +-----------+ +------------+ +----------+
| RPMSG VHOST | | NET VHOST | | SCSI VHOST | | X |
+-------^--------+ +-----^-----+ +-----^------+ +----^-----+
| | | |
| | | |
| | | |
+-----------v-----------------v--------------v--------------v----------+
| VHOST CORE |
+--------^---------------^--------------------^------------------^-----+
| | | |
| | | |
| | | |
+--------v-------+ +----v------+ +----------v----------+ +----v-----+
| PCI EPF VHOST | | NTB VHOST | |PLATFORM DEVICE VHOST| | X |
+----------------+ +-----------+ +---------------------+ +----------+

This was initially proposed here [1]

[1] -> https://lore.kernel.org/r/[email protected]


Kishon Vijay Abraham I (22):
vhost: Make _feature_ bits a property of vhost device
vhost: Introduce standard Linux driver model in VHOST
vhost: Add ops for the VHOST driver to configure VHOST device
vringh: Add helpers to access vring in MMIO
vhost: Add MMIO helpers for operations on vhost virtqueue
vhost: Introduce configfs entry for configuring VHOST
virtio_pci: Use request_threaded_irq() instead of request_irq()
rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
reading messages
rpmsg: Introduce configfs entry for configuring rpmsg
rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
rpmsg_internal.h
virtio: Add ops to allocate and free buffer
rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
virtio_free_buffer()
rpmsg: Add VHOST based remote processor messaging bus
samples/rpmsg: Setup delayed work to send message
samples/rpmsg: Wait for address to be bound to rpdev for sending
message
rpmsg.txt: Add Documentation to configure rpmsg using configfs
virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
device
PCI: endpoint: Add EP function driver to provide VHOST interface
NTB: Add a new NTB client driver to implement VIRTIO functionality
NTB: Add a new NTB client driver to implement VHOST functionality
NTB: Describe the ntb_virtio and ntb_vhost client in the documentation

Documentation/driver-api/ntb.rst | 11 +
Documentation/rpmsg.txt | 56 +
drivers/ntb/Kconfig | 18 +
drivers/ntb/Makefile | 2 +
drivers/ntb/ntb_vhost.c | 776 +++++++++++
drivers/ntb/ntb_virtio.c | 853 ++++++++++++
drivers/ntb/ntb_virtio.h | 56 +
drivers/pci/endpoint/functions/Kconfig | 11 +
drivers/pci/endpoint/functions/Makefile | 1 +
.../pci/endpoint/functions/pci-epf-vhost.c | 1144 ++++++++++++++++
drivers/rpmsg/Kconfig | 10 +
drivers/rpmsg/Makefile | 3 +-
drivers/rpmsg/rpmsg_cfs.c | 394 ++++++
drivers/rpmsg/rpmsg_core.c | 7 +
drivers/rpmsg/rpmsg_internal.h | 136 ++
drivers/rpmsg/vhost_rpmsg_bus.c | 1151 +++++++++++++++++
drivers/rpmsg/virtio_rpmsg_bus.c | 184 ++-
drivers/vhost/Kconfig | 1 +
drivers/vhost/Makefile | 2 +-
drivers/vhost/net.c | 10 +-
drivers/vhost/scsi.c | 24 +-
drivers/vhost/test.c | 17 +-
drivers/vhost/vdpa.c | 2 +-
drivers/vhost/vhost.c | 730 ++++++++++-
drivers/vhost/vhost_cfs.c | 341 +++++
drivers/vhost/vringh.c | 332 +++++
drivers/vhost/vsock.c | 20 +-
drivers/virtio/Kconfig | 9 +
drivers/virtio/Makefile | 1 +
drivers/virtio/virtio_pci_common.c | 25 +-
drivers/virtio/virtio_pci_epf.c | 670 ++++++++++
include/linux/mod_devicetable.h | 6 +
include/linux/rpmsg.h | 6 +
{drivers/vhost => include/linux}/vhost.h | 132 +-
include/linux/virtio.h | 3 +
include/linux/virtio_config.h | 42 +
include/linux/vringh.h | 46 +
samples/rpmsg/rpmsg_client_sample.c | 32 +-
tools/virtio/virtio_test.c | 2 +-
39 files changed, 7083 insertions(+), 183 deletions(-)
create mode 100644 drivers/ntb/ntb_vhost.c
create mode 100644 drivers/ntb/ntb_virtio.c
create mode 100644 drivers/ntb/ntb_virtio.h
create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
create mode 100644 drivers/rpmsg/rpmsg_cfs.c
create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
create mode 100644 drivers/vhost/vhost_cfs.c
create mode 100644 drivers/virtio/virtio_pci_epf.c
rename {drivers/vhost => include/linux}/vhost.h (66%)

--
2.17.1


2020-07-02 08:22:51

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 01/22] vhost: Make _feature_ bits a property of vhost device

No functional change intended. The feature bits defined in virtio
specification is associated with virtio device and not virtqueue.
In order to correctly reflect this in the vhost backend, remove
"acked_features" from struct vhost_virtqueue and add "features" in
struct vhost_dev. This will also make it look symmetrical to virtio
in guest.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/vhost/net.c | 7 ++++---
drivers/vhost/scsi.c | 22 ++++++++--------------
drivers/vhost/test.c | 14 +++++---------
drivers/vhost/vhost.c | 33 +++++++++++++++++++++------------
drivers/vhost/vhost.h | 6 +++---
drivers/vhost/vsock.c | 18 ++++++------------
6 files changed, 47 insertions(+), 53 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 516519dcc8ff..437126219116 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1137,9 +1137,9 @@ static void handle_rx(struct vhost_net *net)
vhost_hlen = nvq->vhost_hlen;
sock_hlen = nvq->sock_hlen;

- vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ?
+ vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
vq->log : NULL;
- mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
+ mergeable = vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF);

do {
sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
@@ -1633,6 +1633,7 @@ static int vhost_net_set_backend_features(struct vhost_net *n, u64 features)
static int vhost_net_set_features(struct vhost_net *n, u64 features)
{
size_t vhost_hlen, sock_hlen, hdr_len;
+ struct vhost_dev *vdev = &n->dev;
int i;

hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
@@ -1658,9 +1659,9 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features)
goto out_unlock;
}

+ vdev->features = features;
for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
mutex_lock(&n->vqs[i].vq.mutex);
- n->vqs[i].vq.acked_features = features;
n->vqs[i].vhost_hlen = vhost_hlen;
n->vqs[i].sock_hlen = sock_hlen;
mutex_unlock(&n->vqs[i].vq.mutex);
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 8b104f76f324..f5138379659e 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -921,7 +921,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
int ret, prot_bytes, c = 0;
u16 lun;
u8 task_attr;
- bool t10_pi = vhost_has_feature(vq, VIRTIO_SCSI_F_T10_PI);
+ bool t10_pi = vhost_has_feature(&vs->dev, VIRTIO_SCSI_F_T10_PI);
void *cdb;

mutex_lock(&vq->mutex);
@@ -1573,26 +1573,20 @@ vhost_scsi_clear_endpoint(struct vhost_scsi *vs,

static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
{
- struct vhost_virtqueue *vq;
- int i;
+ struct vhost_dev *vdev = &vs->dev;

if (features & ~VHOST_SCSI_FEATURES)
return -EOPNOTSUPP;

- mutex_lock(&vs->dev.mutex);
+ mutex_lock(&vdev->mutex);
if ((features & (1 << VHOST_F_LOG_ALL)) &&
- !vhost_log_access_ok(&vs->dev)) {
- mutex_unlock(&vs->dev.mutex);
+ !vhost_log_access_ok(vdev)) {
+ mutex_unlock(&vdev->mutex);
return -EFAULT;
}

- for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
- vq = &vs->vqs[i].vq;
- mutex_lock(&vq->mutex);
- vq->acked_features = features;
- mutex_unlock(&vq->mutex);
- }
- mutex_unlock(&vs->dev.mutex);
+ vdev->features = features;
+ mutex_unlock(&vdev->mutex);
return 0;
}

@@ -1789,7 +1783,7 @@ vhost_scsi_do_plug(struct vhost_scsi_tpg *tpg,

vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
mutex_lock(&vq->mutex);
- if (vhost_has_feature(vq, VIRTIO_SCSI_F_HOTPLUG))
+ if (vhost_has_feature(&vs->dev, VIRTIO_SCSI_F_HOTPLUG))
vhost_scsi_send_evt(vs, tpg, lun,
VIRTIO_SCSI_T_TRANSPORT_RESET, reason);
mutex_unlock(&vq->mutex);
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 9a3a09005e03..6518b48c0633 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -247,19 +247,15 @@ static long vhost_test_reset_owner(struct vhost_test *n)

static int vhost_test_set_features(struct vhost_test *n, u64 features)
{
- struct vhost_virtqueue *vq;
+ struct vhost_dev *vdev = &n->dev;

- mutex_lock(&n->dev.mutex);
+ mutex_lock(&vdev->mutex);
if ((features & (1 << VHOST_F_LOG_ALL)) &&
- !vhost_log_access_ok(&n->dev)) {
- mutex_unlock(&n->dev.mutex);
+ !vhost_log_access_ok(vdev)) {
+ mutex_unlock(&vdev->mutex);
return -EFAULT;
}
- vq = &n->vqs[VHOST_TEST_VQ];
- mutex_lock(&vq->mutex);
- vq->acked_features = features;
- mutex_unlock(&vq->mutex);
- mutex_unlock(&n->dev.mutex);
+ mutex_unlock(&vdev->mutex);
return 0;
}

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 21a59b598ed8..3c2633fb519d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -104,12 +104,14 @@ static long vhost_get_vring_endian(struct vhost_virtqueue *vq, u32 idx,

static void vhost_init_is_le(struct vhost_virtqueue *vq)
{
+ struct vhost_dev *vdev = vq->dev;
+
/* Note for legacy virtio: user_be is initialized at reset time
* according to the host endianness. If userspace does not set an
* explicit endianness, the default behavior is native endian, as
* expected by legacy virtio.
*/
- vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;
+ vq->is_le = vhost_has_feature(vdev, VIRTIO_F_VERSION_1) || !vq->user_be;
}
#else
static void vhost_disable_cross_endian(struct vhost_virtqueue *vq)
@@ -129,7 +131,9 @@ static long vhost_get_vring_endian(struct vhost_virtqueue *vq, u32 idx,

static void vhost_init_is_le(struct vhost_virtqueue *vq)
{
- vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1)
+ struct vhost_dev *vdev = vq->dev;
+
+ vq->is_le = vhost_has_feature(vdev, VIRTIO_F_VERSION_1)
|| virtio_legacy_is_little_endian();
}
#endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */
@@ -310,7 +314,6 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->log_used = false;
vq->log_addr = -1ull;
vq->private_data = NULL;
- vq->acked_features = 0;
vq->acked_backend_features = 0;
vq->log_base = NULL;
vq->error_ctx = NULL;
@@ -428,8 +431,9 @@ EXPORT_SYMBOL_GPL(vhost_exceeds_weight);
static size_t vhost_get_avail_size(struct vhost_virtqueue *vq,
unsigned int num)
{
+ struct vhost_dev *vdev = vq->dev;
size_t event __maybe_unused =
- vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
+ vhost_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;

return sizeof(*vq->avail) +
sizeof(*vq->avail->ring) * num + event;
@@ -438,8 +442,9 @@ static size_t vhost_get_avail_size(struct vhost_virtqueue *vq,
static size_t vhost_get_used_size(struct vhost_virtqueue *vq,
unsigned int num)
{
+ struct vhost_dev *vdev = vq->dev;
size_t event __maybe_unused =
- vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
+ vhost_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;

return sizeof(*vq->used) +
sizeof(*vq->used->ring) * num + event;
@@ -468,6 +473,7 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->iotlb = NULL;
dev->mm = NULL;
dev->worker = NULL;
+ dev->features = 0;
dev->iov_limit = iov_limit;
dev->weight = weight;
dev->byte_weight = byte_weight;
@@ -738,14 +744,15 @@ static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem,
int log_all)
{
+ bool log;
int i;

+ log = log_all || vhost_has_feature(d, VHOST_F_LOG_ALL);
+
for (i = 0; i < d->nvqs; ++i) {
bool ok;
- bool log;

mutex_lock(&d->vqs[i]->mutex);
- log = log_all || vhost_has_feature(d->vqs[i], VHOST_F_LOG_ALL);
/* If ring is inactive, will check when it's enabled. */
if (d->vqs[i]->private_data)
ok = vq_memory_access_ok(d->vqs[i]->log_base,
@@ -1329,8 +1336,10 @@ EXPORT_SYMBOL_GPL(vhost_log_access_ok);
static bool vq_log_access_ok(struct vhost_virtqueue *vq,
void __user *log_base)
{
+ struct vhost_dev *vdev = vq->dev;
+
return vq_memory_access_ok(log_base, vq->umem,
- vhost_has_feature(vq, VHOST_F_LOG_ALL)) &&
+ vhost_has_feature(vdev, VHOST_F_LOG_ALL)) &&
(!vq->log_used || log_access_ok(log_base, vq->log_addr,
vhost_get_used_size(vq, vq->num)));
}
@@ -2376,11 +2385,11 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
* interrupts. */
smp_mb();

- if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) &&
+ if (vhost_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY) &&
unlikely(vq->avail_idx == vq->last_avail_idx))
return true;

- if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
+ if (!vhost_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
__virtio16 flags;
if (vhost_get_avail_flags(vq, &flags)) {
vq_err(vq, "Failed to get flags");
@@ -2459,7 +2468,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
return false;
vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
- if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
+ if (!vhost_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
r = vhost_update_used_flags(vq);
if (r) {
vq_err(vq, "Failed to enable notification at %p: %d\n",
@@ -2496,7 +2505,7 @@ void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
return;
vq->used_flags |= VRING_USED_F_NO_NOTIFY;
- if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
+ if (!vhost_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
r = vhost_update_used_flags(vq);
if (r)
vq_err(vq, "Failed to enable notification at %p: %d\n",
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f8403bd46b85..5d1d00363e79 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -111,7 +111,6 @@ struct vhost_virtqueue {
struct vhost_iotlb *umem;
struct vhost_iotlb *iotlb;
void *private_data;
- u64 acked_features;
u64 acked_backend_features;
/* Log write descriptors */
void __user *log_base;
@@ -140,6 +139,7 @@ struct vhost_dev {
struct mm_struct *mm;
struct mutex mutex;
struct vhost_virtqueue **vqs;
+ u64 features;
int nvqs;
struct eventfd_ctx *log_ctx;
struct llist_head work_list;
@@ -258,9 +258,9 @@ static inline void *vhost_vq_get_backend(struct vhost_virtqueue *vq)
return vq->private_data;
}

-static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)
+static inline bool vhost_has_feature(struct vhost_dev *vdev, int bit)
{
- return vq->acked_features & (1ULL << bit);
+ return vdev->features & (1ULL << bit);
}

static inline bool vhost_backend_has_feature(struct vhost_virtqueue *vq, int bit)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index fb4e944c4d0d..8317ad026e3d 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -757,26 +757,20 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)

static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
{
- struct vhost_virtqueue *vq;
- int i;
+ struct vhost_dev *vdev = &vsock->dev;

if (features & ~VHOST_VSOCK_FEATURES)
return -EOPNOTSUPP;

- mutex_lock(&vsock->dev.mutex);
+ mutex_lock(&vdev->mutex);
if ((features & (1 << VHOST_F_LOG_ALL)) &&
- !vhost_log_access_ok(&vsock->dev)) {
- mutex_unlock(&vsock->dev.mutex);
+ !vhost_log_access_ok(vdev)) {
+ mutex_unlock(&vdev->mutex);
return -EFAULT;
}

- for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
- vq = &vsock->vqs[i];
- mutex_lock(&vq->mutex);
- vq->acked_features = features;
- mutex_unlock(&vq->mutex);
- }
- mutex_unlock(&vsock->dev.mutex);
+ vdev->features = features;
+ mutex_unlock(&vdev->mutex);
return 0;
}

--
2.17.1

2020-07-02 08:22:55

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 02/22] vhost: Introduce standard Linux driver model in VHOST

Introduce standard driver model in VHOST. This will facilitate using
multiple VHOST drivers (like net, scsi etc.,) over different VHOST
devices using MMIO (like PCIe or NTB), using kernel pointers (like
platform devices) or using userspace pointers.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/vhost/net.c | 3 +-
drivers/vhost/scsi.c | 2 +-
drivers/vhost/test.c | 3 +-
drivers/vhost/vdpa.c | 2 +-
drivers/vhost/vhost.c | 157 ++++++++++++++++++++++-
drivers/vhost/vsock.c | 2 +-
include/linux/mod_devicetable.h | 6 +
{drivers/vhost => include/linux}/vhost.h | 22 +++-
tools/virtio/virtio_test.c | 2 +-
9 files changed, 190 insertions(+), 9 deletions(-)
rename {drivers/vhost => include/linux}/vhost.h (93%)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 437126219116..3c57c345cbfd 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -18,6 +18,7 @@
#include <linux/slab.h>
#include <linux/sched/clock.h>
#include <linux/sched/signal.h>
+#include <linux/vhost.h>
#include <linux/vmalloc.h>

#include <linux/net.h>
@@ -33,7 +34,7 @@
#include <net/sock.h>
#include <net/xdp.h>

-#include "vhost.h"
+#include <uapi/linux/vhost.h>

static int experimental_zcopytx = 0;
module_param(experimental_zcopytx, int, 0444);
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index f5138379659e..06898b7ce7dd 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -47,7 +47,7 @@
#include <linux/llist.h>
#include <linux/bitmap.h>

-#include "vhost.h"
+#include <uapi/linux/vhost.h>

#define VHOST_SCSI_VERSION "v0.1"
#define VHOST_SCSI_NAMELEN 256
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 6518b48c0633..07508526182f 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -14,9 +14,10 @@
#include <linux/workqueue.h>
#include <linux/file.h>
#include <linux/slab.h>
+#include <linux/vhost.h>
+#include <uapi/linux/vhost.h>

#include "test.h"
-#include "vhost.h"

/* Max number of bytes transferred before requeueing the job.
* Using this limit prevents one virtqueue from starving others. */
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 0968361e3b77..61d90100db89 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,7 +22,7 @@
#include <linux/vhost.h>
#include <linux/virtio_net.h>

-#include "vhost.h"
+#include <uapi/linux/vhost.h>

enum {
VHOST_VDPA_FEATURES =
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 3c2633fb519d..fa2bc6e68be2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -32,8 +32,6 @@
#include <linux/nospec.h>
#include <linux/kcov.h>

-#include "vhost.h"
-
static ushort max_mem_regions = 64;
module_param(max_mem_regions, ushort, 0444);
MODULE_PARM_DESC(max_mem_regions,
@@ -43,6 +41,9 @@ module_param(max_iotlb_entries, int, 0444);
MODULE_PARM_DESC(max_iotlb_entries,
"Maximum number of iotlb entries. (default: 2048)");

+static DEFINE_IDA(vhost_index_ida);
+static DEFINE_MUTEX(vhost_index_mutex);
+
enum {
VHOST_MEMORY_F_LOG = 0x1,
};
@@ -2557,14 +2558,166 @@ struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev,
}
EXPORT_SYMBOL_GPL(vhost_dequeue_msg);

+static inline int vhost_id_match(const struct vhost_dev *vdev,
+ const struct vhost_device_id *id)
+{
+ if (id->device != vdev->id.device && id->device != VIRTIO_DEV_ANY_ID)
+ return 0;
+
+ return id->vendor == VIRTIO_DEV_ANY_ID || id->vendor == vdev->id.vendor;
+}
+
+static int vhost_dev_match(struct device *dev, struct device_driver *drv)
+{
+ struct vhost_driver *driver = to_vhost_driver(drv);
+ struct vhost_dev *vdev = to_vhost_dev(dev);
+ const struct vhost_device_id *ids;
+ int i;
+
+ ids = driver->id_table;
+ for (i = 0; ids[i].device; i++)
+ if (vhost_id_match(vdev, &ids[i]))
+ return 1;
+
+ return 0;
+}
+
+static int vhost_dev_probe(struct device *dev)
+{
+ struct vhost_driver *driver = to_vhost_driver(dev->driver);
+ struct vhost_dev *vdev = to_vhost_dev(dev);
+
+ if (!driver->probe)
+ return -ENODEV;
+
+ vdev->driver = driver;
+
+ return driver->probe(vdev);
+}
+
+static int vhost_dev_remove(struct device *dev)
+{
+ struct vhost_driver *driver = to_vhost_driver(dev->driver);
+ struct vhost_dev *vdev = to_vhost_dev(dev);
+ int ret = 0;
+
+ if (driver->remove)
+ ret = driver->remove(vdev);
+ vdev->driver = NULL;
+
+ return ret;
+}
+
+static struct bus_type vhost_bus_type = {
+ .name = "vhost",
+ .match = vhost_dev_match,
+ .probe = vhost_dev_probe,
+ .remove = vhost_dev_remove,
+};
+
+/**
+ * vhost_register_driver() - Register a vhost driver
+ * @driver: Vhost driver that has to be registered
+ *
+ * Register a vhost driver.
+ */
+int vhost_register_driver(struct vhost_driver *driver)
+{
+ int ret;
+
+ driver->driver.bus = &vhost_bus_type;
+
+ ret = driver_register(&driver->driver);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vhost_register_driver);
+
+/**
+ * vhost_unregister_driver() - Unregister a vhost driver
+ * @driver: Vhost driver that has to be un-registered
+ *
+ * Unregister a vhost driver.
+ */
+void vhost_unregister_driver(struct vhost_driver *driver)
+{
+ driver_unregister(&driver->driver);
+}
+EXPORT_SYMBOL_GPL(vhost_unregister_driver);
+
+/**
+ * vhost_register_device() - Register vhost device
+ * @vdev: Vhost device that has to be registered
+ *
+ * Allocate a ID and register vhost device.
+ */
+int vhost_register_device(struct vhost_dev *vdev)
+{
+ struct device *dev = &vdev->dev;
+ int ret;
+
+ mutex_lock(&vhost_index_mutex);
+ ret = ida_simple_get(&vhost_index_ida, 0, 0, GFP_KERNEL);
+ mutex_unlock(&vhost_index_mutex);
+ if (ret < 0)
+ return ret;
+
+ vdev->index = ret;
+ dev->bus = &vhost_bus_type;
+ device_initialize(dev);
+
+ dev_set_name(dev, "vhost%u", ret);
+
+ ret = device_add(dev);
+ if (ret) {
+ put_device(dev);
+ goto err;
+ }
+
+ return 0;
+
+err:
+ mutex_lock(&vhost_index_mutex);
+ ida_simple_remove(&vhost_index_ida, vdev->index);
+ mutex_unlock(&vhost_index_mutex);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vhost_register_device);
+
+/**
+ * vhost_unregister_device() - Un-register vhost device
+ * @vdev: Vhost device that has to be un-registered
+ *
+ * Un-register vhost device and free the allocated ID.
+ */
+void vhost_unregister_device(struct vhost_dev *vdev)
+{
+ device_unregister(&vdev->dev);
+ mutex_lock(&vhost_index_mutex);
+ ida_simple_remove(&vhost_index_ida, vdev->index);
+ mutex_unlock(&vhost_index_mutex);
+}
+EXPORT_SYMBOL_GPL(vhost_unregister_device);

static int __init vhost_init(void)
{
+ int ret;
+
+ ret = bus_register(&vhost_bus_type);
+ if (ret) {
+ pr_err("failed to register vhost bus --> %d\n", ret);
+ return ret;
+ }
+
return 0;
}

static void __exit vhost_exit(void)
{
+ bus_unregister(&vhost_bus_type);
}

module_init(vhost_init);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 8317ad026e3d..5753048b7405 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -17,7 +17,7 @@
#include <linux/hashtable.h>

#include <net/af_vsock.h>
-#include "vhost.h"
+#include <uapi/linux/vhost.h>

#define VHOST_VSOCK_DEFAULT_HOST_CID 2
/* Max number of bytes transferred before requeueing the job.
diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 8d764aab29de..c7df018989e3 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -430,6 +430,12 @@ struct virtio_device_id {
};
#define VIRTIO_DEV_ANY_ID 0xffffffff

+/* VHOST */
+struct vhost_device_id {
+ __u32 device;
+ __u32 vendor;
+};
+
/*
* For Hyper-V devices we use the device guid as the id.
*/
diff --git a/drivers/vhost/vhost.h b/include/linux/vhost.h
similarity index 93%
rename from drivers/vhost/vhost.h
rename to include/linux/vhost.h
index 5d1d00363e79..16c374a8fa12 100644
--- a/drivers/vhost/vhost.h
+++ b/include/linux/vhost.h
@@ -3,7 +3,6 @@
#define _VHOST_H

#include <linux/eventfd.h>
-#include <linux/vhost.h>
#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/poll.h>
@@ -13,6 +12,7 @@
#include <linux/virtio_ring.h>
#include <linux/atomic.h>
#include <linux/vhost_iotlb.h>
+#include <uapi/linux/vhost.h>

struct vhost_work;
typedef void (*vhost_work_fn_t)(struct vhost_work *work);
@@ -135,7 +135,20 @@ struct vhost_msg_node {
struct list_head node;
};

+struct vhost_driver {
+ struct device_driver driver;
+ struct vhost_device_id *id_table;
+ int (*probe)(struct vhost_dev *dev);
+ int (*remove)(struct vhost_dev *dev);
+};
+
+#define to_vhost_driver(drv) (container_of((drv), struct vhost_driver, driver))
+
struct vhost_dev {
+ struct device dev;
+ struct vhost_driver *driver;
+ struct vhost_device_id id;
+ int index;
struct mm_struct *mm;
struct mutex mutex;
struct vhost_virtqueue **vqs;
@@ -158,6 +171,13 @@ struct vhost_dev {
struct vhost_iotlb_msg *msg);
};

+#define to_vhost_dev(d) container_of((d), struct vhost_dev, dev)
+
+int vhost_register_driver(struct vhost_driver *driver);
+void vhost_unregister_driver(struct vhost_driver *driver);
+int vhost_register_device(struct vhost_dev *vdev);
+void vhost_unregister_device(struct vhost_dev *vdev);
+
bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs,
int nvqs, int iov_limit, int weight, int byte_weight,
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index b427def67e7e..b13434d6c976 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -13,9 +13,9 @@
#include <fcntl.h>
#include <stdbool.h>
#include <linux/virtio_types.h>
-#include <linux/vhost.h>
#include <linux/virtio.h>
#include <linux/virtio_ring.h>
+#include <uapi/linux/vhost.h>
#include "../../drivers/vhost/test.h"

/* Unused */
--
2.17.1

2020-07-02 08:23:08

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 03/22] vhost: Add ops for the VHOST driver to configure VHOST device

Add "vhost_config_ops" in *struct vhost_driver* for the VHOST driver to
configure VHOST device and add facility for VHOST device to notify
VHOST driver (whenever VIRTIO sets a new status or finalize features).
This is in preparation to use the same vhost_driver across different
VHOST devices (like PCIe or NTB or platform device).

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/vhost/vhost.c | 185 ++++++++++++++++++++++++++++++++++++++++++
include/linux/vhost.h | 57 +++++++++++++
2 files changed, 242 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index fa2bc6e68be2..f959abb0b1bb 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2558,6 +2558,190 @@ struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev,
}
EXPORT_SYMBOL_GPL(vhost_dequeue_msg);

+/**
+ * vhost_create_vqs() - Invoke vhost_config_ops to create virtqueue
+ * @vdev: Vhost device that provides create_vqs() callback to create virtqueue
+ * @nvqs: Number of vhost virtqueues to be created
+ * @num_bufs: The number of buffers that should be supported by the vhost
+ * virtqueue (number of descriptors in the vhost virtqueue)
+ * @vqs: Pointers to all the created vhost virtqueues
+ * @callback: Callback function associated with the virtqueue
+ * @names: Names associated with each virtqueue
+ *
+ * Wrapper that invokes vhost_config_ops to create virtqueue.
+ */
+int vhost_create_vqs(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[])
+{
+ int ret;
+
+ if (IS_ERR_OR_NULL(vdev))
+ return -EINVAL;
+
+ if (!vdev->ops && !vdev->ops->create_vqs)
+ return -EINVAL;
+
+ mutex_lock(&vdev->mutex);
+ ret = vdev->ops->create_vqs(vdev, nvqs, num_bufs, vqs, callbacks,
+ names);
+ mutex_unlock(&vdev->mutex);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vhost_create_vqs);
+
+/* vhost_del_vqs - Invoke vhost_config_ops to delete the created virtqueues
+ * @vdev: Vhost device that provides del_vqs() callback to delete virtqueue
+ *
+ * Wrapper that invokes vhost_config_ops to delete all the virtqueues
+ * associated with the vhost device.
+ */
+void vhost_del_vqs(struct vhost_dev *vdev)
+{
+ if (IS_ERR_OR_NULL(vdev))
+ return;
+
+ if (!vdev->ops && !vdev->ops->del_vqs)
+ return;
+
+ mutex_lock(&vdev->mutex);
+ vdev->ops->del_vqs(vdev);
+ mutex_unlock(&vdev->mutex);
+}
+EXPORT_SYMBOL_GPL(vhost_del_vqs);
+
+/* vhost_write - Invoke vhost_config_ops to write data to buffer provided
+ * by remote virtio driver
+ * @vdev: Vhost device that provides write() callback to write data
+ * @dst: Buffer address in the remote device provided by the remote virtio
+ * driver
+ * @src: Buffer address in the local device provided by the vhost client driver
+ * @len: Length of the data to be copied from @src to @dst
+ *
+ * Wrapper that invokes vhost_config_ops to write data to buffer provided by
+ * remote virtio driver from buffer provided by vhost client driver.
+ */
+int vhost_write(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len)
+{
+ if (IS_ERR_OR_NULL(vdev))
+ return -EINVAL;
+
+ if (!vdev->ops && !vdev->ops->write)
+ return -EINVAL;
+
+ return vdev->ops->write(vdev, vhost_dst, src, len);
+}
+EXPORT_SYMBOL_GPL(vhost_write);
+
+/* vhost_read - Invoke vhost_config_ops to read data from buffers provided by
+ * remote virtio driver
+ * @vdev: Vhost device that provides read() callback to read data
+ * @dst: Buffer address in the local device provided by the vhost client driver
+ * @src: Buffer address in the remote device provided by the remote virtio
+ * driver
+ * @len: Length of the data to be copied from @src to @dst
+ *
+ * Wrapper that invokes vhost_config_ops to read data from buffers provided by
+ * remote virtio driver to the address provided by vhost client driver.
+ */
+int vhost_read(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len)
+{
+ if (IS_ERR_OR_NULL(vdev))
+ return -EINVAL;
+
+ if (!vdev->ops && !vdev->ops->read)
+ return -EINVAL;
+
+ return vdev->ops->read(vdev, dst, vhost_src, len);
+}
+EXPORT_SYMBOL_GPL(vhost_read);
+
+/* vhost_set_status - Invoke vhost_config_ops to set vhost device status
+ * @vdev: Vhost device that provides set_status() callback to set device status
+ * @status: Vhost device status configured by vhost client driver
+ *
+ * Wrapper that invokes vhost_config_ops to set vhost device status.
+ */
+int vhost_set_status(struct vhost_dev *vdev, u8 status)
+{
+ int ret;
+
+ if (IS_ERR_OR_NULL(vdev))
+ return -EINVAL;
+
+ if (!vdev->ops && !vdev->ops->set_status)
+ return -EINVAL;
+
+ mutex_lock(&vdev->mutex);
+ ret = vdev->ops->set_status(vdev, status);
+ mutex_unlock(&vdev->mutex);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vhost_set_status);
+
+/* vhost_get_status - Invoke vhost_config_ops to get vhost device status
+ * @vdev: Vhost device that provides get_status() callback to get device status
+ *
+ * Wrapper that invokes vhost_config_ops to get vhost device status.
+ */
+u8 vhost_get_status(struct vhost_dev *vdev)
+{
+ u8 status;
+
+ if (IS_ERR_OR_NULL(vdev))
+ return -EINVAL;
+
+ if (!vdev->ops && !vdev->ops->get_status)
+ return -EINVAL;
+
+ mutex_lock(&vdev->mutex);
+ status = vdev->ops->get_status(vdev);
+ mutex_unlock(&vdev->mutex);
+
+ return status;
+}
+EXPORT_SYMBOL_GPL(vhost_get_status);
+
+/* vhost_set_features - Invoke vhost_config_ops to set vhost device features
+ * @vdev: Vhost device that provides set_features() callback to set device
+ * features
+ *
+ * Wrapper that invokes vhost_config_ops to set device features.
+ */
+int vhost_set_features(struct vhost_dev *vdev, u64 device_features)
+{
+ int ret;
+
+ if (IS_ERR_OR_NULL(vdev))
+ return -EINVAL;
+
+ if (!vdev->ops && !vdev->ops->set_features)
+ return -EINVAL;
+
+ mutex_lock(&vdev->mutex);
+ ret = vdev->ops->set_features(vdev, device_features);
+ mutex_unlock(&vdev->mutex);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vhost_set_features);
+
+/* vhost_register_notifier - Register notifier to receive notification from
+ * vhost device
+ * @vdev: Vhost device from which notification has to be received.
+ * @nb: Notifier block holding the callback function
+ *
+ * Invoked by vhost client to receive notification from vhost device.
+ */
+int vhost_register_notifier(struct vhost_dev *vdev, struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&vdev->notifier, nb);
+}
+EXPORT_SYMBOL_GPL(vhost_register_notifier);
+
static inline int vhost_id_match(const struct vhost_dev *vdev,
const struct vhost_device_id *id)
{
@@ -2669,6 +2853,7 @@ int vhost_register_device(struct vhost_dev *vdev)
device_initialize(dev);

dev_set_name(dev, "vhost%u", ret);
+ BLOCKING_INIT_NOTIFIER_HEAD(&vdev->notifier);

ret = device_add(dev);
if (ret) {
diff --git a/include/linux/vhost.h b/include/linux/vhost.h
index 16c374a8fa12..b22a19c66109 100644
--- a/include/linux/vhost.h
+++ b/include/linux/vhost.h
@@ -135,6 +135,37 @@ struct vhost_msg_node {
struct list_head node;
};

+enum vhost_notify_event {
+ NOTIFY_SET_STATUS,
+ NOTIFY_FINALIZE_FEATURES,
+ NOTIFY_RESET,
+};
+
+typedef void vhost_vq_callback_t(struct vhost_virtqueue *);
+/**
+ * struct vhost_config_ops - set of function pointers for performing vhost
+ * device specific operation
+ * @create_vqs: ops to create vhost virtqueue
+ * @del_vqs: ops to delete vhost virtqueue
+ * @write: ops to write data to buffer provided by remote virtio driver
+ * @read: ops to read data from buffer provided by remote virtio driver
+ * @set_features: ops to set vhost device features
+ * @set_status: ops to set vhost device status
+ * @get_status: ops to get vhost device status
+ */
+struct vhost_config_ops {
+ int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[]);
+ void (*del_vqs)(struct vhost_dev *vdev);
+ int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len);
+ int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len);
+ int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+ int (*set_status)(struct vhost_dev *vdev, u8 status);
+ u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct vhost_driver {
struct device_driver driver;
struct vhost_device_id *id_table;
@@ -149,6 +180,8 @@ struct vhost_dev {
struct vhost_driver *driver;
struct vhost_device_id id;
int index;
+ const struct vhost_config_ops *ops;
+ struct blocking_notifier_head notifier;
struct mm_struct *mm;
struct mutex mutex;
struct vhost_virtqueue **vqs;
@@ -173,11 +206,35 @@ struct vhost_dev {

#define to_vhost_dev(d) container_of((d), struct vhost_dev, dev)

+static inline void vhost_set_drvdata(struct vhost_dev *vdev, void *data)
+{
+ dev_set_drvdata(&vdev->dev, data);
+}
+
+static inline void *vhost_get_drvdata(struct vhost_dev *vdev)
+{
+ return dev_get_drvdata(&vdev->dev);
+}
+
int vhost_register_driver(struct vhost_driver *driver);
void vhost_unregister_driver(struct vhost_driver *driver);
int vhost_register_device(struct vhost_dev *vdev);
void vhost_unregister_device(struct vhost_dev *vdev);

+int vhost_create_vqs(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[]);
+void vhost_del_vqs(struct vhost_dev *vdev);
+int vhost_write(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len);
+int vhost_read(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len);
+int vhost_set_features(struct vhost_dev *vdev, u64 device_features);
+u64 vhost_get_features(struct vhost_dev *vdev);
+int vhost_set_status(struct vhost_dev *vdev, u8 status);
+u8 vhost_get_status(struct vhost_dev *vdev);
+
+int vhost_register_notifier(struct vhost_dev *vdev, struct notifier_block *nb);
+
bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs,
int nvqs, int iov_limit, int weight, int byte_weight,
--
2.17.1

2020-07-02 08:23:33

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 05/22] vhost: Add MMIO helpers for operations on vhost virtqueue

Add helpers for VHOST drivers to read descriptor data from
vhost_virtqueue for IN transfers or write descriptor data to
vhost_virtqueue for OUT transfers respectively. Also add
helpers to enable callback, disable callback and notify remote
virtio for events on virtqueue.

This adds helpers only for virtqueue in MMIO (helpers for virtqueue
in kernel space and user space can be added later).

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/vhost/Kconfig | 1 +
drivers/vhost/vhost.c | 292 ++++++++++++++++++++++++++++++++++++++++++
include/linux/vhost.h | 22 ++++
3 files changed, 315 insertions(+)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index c4f273793595..77e195a38469 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -24,6 +24,7 @@ config VHOST_DPN

config VHOST
tristate
+ select VHOST_RING
select VHOST_IOTLB
help
This option is selected by any driver which needs to access
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f959abb0b1bb..8a3ad4698393 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2558,6 +2558,298 @@ struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev,
}
EXPORT_SYMBOL_GPL(vhost_dequeue_msg);

+/**
+ * vhost_virtqueue_disable_cb_mmio() - Write to used ring in virtio accessed
+ * using MMIO to stop notification
+ * @vq: vhost_virtqueue for which callbacks have to be disabled
+ *
+ * Write to used ring in virtio accessed using MMIO to stop sending notification
+ * to the vhost virtqueue.
+ */
+static void vhost_virtqueue_disable_cb_mmio(struct vhost_virtqueue *vq)
+{
+ struct vringh *vringh;
+
+ vringh = &vq->vringh;
+ vringh_notify_disable_mmio(vringh);
+}
+
+/**
+ * vhost_virtqueue_disable_cb() - Write to used ring in virtio to stop
+ * notification
+ * @vq: vhost_virtqueue for which callbacks have to be disabled
+ *
+ * Wrapper to write to used ring in virtio to stop sending notification
+ * to the vhost virtqueue.
+ */
+void vhost_virtqueue_disable_cb(struct vhost_virtqueue *vq)
+{
+ enum vhost_type type;
+
+ type = vq->type;
+
+ /* TODO: Add support for other VHOST TYPES */
+ if (type == VHOST_TYPE_MMIO)
+ return vhost_virtqueue_disable_cb_mmio(vq);
+}
+EXPORT_SYMBOL_GPL(vhost_virtqueue_disable_cb);
+
+/**
+ * vhost_virtqueue_enable_cb_mmio() - Write to used ring in virtio accessed
+ * using MMIO to enable notification
+ * @vq: vhost_virtqueue for which callbacks have to be enabled
+ *
+ * Write to used ring in virtio accessed using MMIO to enable notification
+ * to the vhost virtqueue.
+ */
+static bool vhost_virtqueue_enable_cb_mmio(struct vhost_virtqueue *vq)
+{
+ struct vringh *vringh;
+
+ vringh = &vq->vringh;
+ return vringh_notify_enable_mmio(vringh);
+}
+
+/**
+ * vhost_virtqueue_enable_cb() - Write to used ring in virtio to enable
+ * notification
+ * @vq: vhost_virtqueue for which callbacks have to be enabled
+ *
+ * Wrapper to write to used ring in virtio to enable notification to the
+ * vhost virtqueue.
+ */
+bool vhost_virtqueue_enable_cb(struct vhost_virtqueue *vq)
+{
+ enum vhost_type type;
+
+ type = vq->type;
+
+ /* TODO: Add support for other VHOST TYPES */
+ if (type == VHOST_TYPE_MMIO)
+ return vhost_virtqueue_enable_cb_mmio(vq);
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(vhost_virtqueue_enable_cb);
+
+/**
+ * vhost_virtqueue_notify() - Send notification to the remote virtqueue
+ * @vq: vhost_virtqueue that sends the notification
+ *
+ * Invokes ->notify() callback to send notification to the remote virtqueue.
+ */
+void vhost_virtqueue_notify(struct vhost_virtqueue *vq)
+{
+ if (!vq->notify)
+ return;
+
+ vq->notify(vq);
+}
+EXPORT_SYMBOL_GPL(vhost_virtqueue_notify);
+
+/**
+ * vhost_virtqueue_kick_mmio() - Check if the remote virtqueue has enabled
+ * notification (by reading available ring in virtio accessed using MMIO)
+ * before sending notification
+ * @vq: vhost_virtqueue that sends the notification
+ *
+ * Check if the remote virtqueue has enabled notification (by reading available
+ * ring in virtio accessed using MMIO) and then invoke vhost_virtqueue_notify()
+ * to send notification to the remote virtqueue.
+ */
+static void vhost_virtqueue_kick_mmio(struct vhost_virtqueue *vq)
+{
+ if (vringh_need_notify_mmio(&vq->vringh))
+ vhost_virtqueue_notify(vq);
+}
+
+/**
+ * vhost_virtqueue_kick() - Check if the remote virtqueue has enabled
+ * notification before sending notification
+ * @vq: vhost_virtqueue that sends the notification
+ *
+ * Wrapper to send notification to the remote virtqueue using
+ * vhost_virtqueue_kick_mmio() that checks if the remote virtqueue has
+ * enabled notification before sending the notification.
+ */
+void vhost_virtqueue_kick(struct vhost_virtqueue *vq)
+{
+ enum vhost_type type;
+
+ type = vq->type;
+
+ /* TODO: Add support for other VHOST TYPES */
+ if (type == VHOST_TYPE_MMIO)
+ return vhost_virtqueue_kick_mmio(vq);
+}
+EXPORT_SYMBOL_GPL(vhost_virtqueue_kick);
+
+/**
+ * vhost_virtqueue_callback() - Invoke vhost virtqueue callback provided by
+ * vhost client driver
+ * @vq: vhost_virtqueue for which the callback is invoked
+ *
+ * Invoked by the driver that creates vhost device when the remote virtio
+ * driver sends notification to this virtqueue.
+ */
+void vhost_virtqueue_callback(struct vhost_virtqueue *vq)
+{
+ if (!vq->callback)
+ return;
+
+ vq->callback(vq);
+}
+EXPORT_SYMBOL_GPL(vhost_virtqueue_callback);
+
+/**
+ * vhost_virtqueue_get_outbuf_mmio() - Get the output buffer address by reading
+ * virtqueue descriptor accessed using MMIO
+ * @vq: vhost_virtqueue used to access the descriptor
+ * @head: head index for passing to vhost_virtqueue_put_buf()
+ * @len: Length of the buffer
+ *
+ * Get the output buffer address by reading virtqueue descriptor accessed using
+ * MMIO.
+ */
+static u64 vhost_virtqueue_get_outbuf_mmio(struct vhost_virtqueue *vq,
+ u16 *head, int *len)
+{
+ struct vringh_mmiov wiov;
+ struct mmiovec *mmiovec;
+ struct vringh *vringh;
+ int desc;
+
+ vringh = &vq->vringh;
+ vringh_mmiov_init(&wiov, NULL, 0);
+
+ desc = vringh_getdesc_mmio(vringh, NULL, &wiov, head, GFP_KERNEL);
+ if (!desc)
+ return 0;
+ mmiovec = &wiov.iov[0];
+
+ *len = mmiovec->iov_len;
+ return mmiovec->iov_base;
+}
+
+/**
+ * vhost_virtqueue_get_outbuf() - Get the output buffer address by reading
+ * virtqueue descriptor
+ * @vq: vhost_virtqueue used to access the descriptor
+ * @head: head index for passing to vhost_virtqueue_put_buf()
+ * @len: Length of the buffer
+ *
+ * Wrapper to get the output buffer address by reading virtqueue descriptor.
+ */
+u64 vhost_virtqueue_get_outbuf(struct vhost_virtqueue *vq, u16 *head, int *len)
+{
+ enum vhost_type type;
+
+ type = vq->type;
+
+ /* TODO: Add support for other VHOST TYPES */
+ if (type == VHOST_TYPE_MMIO)
+ return vhost_virtqueue_get_outbuf_mmio(vq, head, len);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vhost_virtqueue_get_outbuf);
+
+/**
+ * vhost_virtqueue_get_inbuf_mmio() - Get the input buffer address by reading
+ * virtqueue descriptor accessed using MMIO
+ * @vq: vhost_virtqueue used to access the descriptor
+ * @head: Head index for passing to vhost_virtqueue_put_buf()
+ * @len: Length of the buffer
+ *
+ * Get the input buffer address by reading virtqueue descriptor accessed using
+ * MMIO.
+ */
+static u64 vhost_virtqueue_get_inbuf_mmio(struct vhost_virtqueue *vq,
+ u16 *head, int *len)
+{
+ struct vringh_mmiov riov;
+ struct mmiovec *mmiovec;
+ struct vringh *vringh;
+ int desc;
+
+ vringh = &vq->vringh;
+ vringh_mmiov_init(&riov, NULL, 0);
+
+ desc = vringh_getdesc_mmio(vringh, &riov, NULL, head, GFP_KERNEL);
+ if (!desc)
+ return 0;
+
+ mmiovec = &riov.iov[0];
+
+ *len = mmiovec->iov_len;
+ return mmiovec->iov_base;
+}
+
+/**
+ * vhost_virtqueue_get_inbuf() - Get the input buffer address by reading
+ * virtqueue descriptor
+ * @vq: vhost_virtqueue used to access the descriptor
+ * @head: head index for passing to vhost_virtqueue_put_buf()
+ * @len: Length of the buffer
+ *
+ * Wrapper to get the input buffer address by reading virtqueue descriptor.
+ */
+u64 vhost_virtqueue_get_inbuf(struct vhost_virtqueue *vq, u16 *head, int *len)
+{
+ enum vhost_type type;
+
+ type = vq->type;
+
+ /* TODO: Add support for other VHOST TYPES */
+ if (type == VHOST_TYPE_MMIO)
+ return vhost_virtqueue_get_inbuf_mmio(vq, head, len);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vhost_virtqueue_get_inbuf);
+
+/**
+ * vhost_virtqueue_put_buf_mmio() - Publish to the remote virtio (update
+ * used ring in virtio using MMIO) to indicate the buffer has been processed
+ * @vq: vhost_virtqueue used to update the used ring
+ * @head: Head index receive from vhost_virtqueue_get_*()
+ * @len: Length of the buffer
+ *
+ * Publish to the remote virtio (update used ring in virtio using MMIO) to
+ * indicate the buffer has been processed
+ */
+static void vhost_virtqueue_put_buf_mmio(struct vhost_virtqueue *vq,
+ u16 head, int len)
+{
+ struct vringh *vringh;
+
+ vringh = &vq->vringh;
+
+ vringh_complete_mmio(vringh, head, len);
+}
+
+/**
+ * vhost_virtqueue_put_buf() - Publish to the remote virtio to indicate the
+ * buffer has been processed
+ * @vq: vhost_virtqueue used to update the used ring
+ * @head: Head index receive from vhost_virtqueue_get_*()
+ * @len: Length of the buffer
+ *
+ * Wrapper to publish to the remote virtio to indicate the buffer has been
+ * processed.
+ */
+void vhost_virtqueue_put_buf(struct vhost_virtqueue *vq, u16 head, int len)
+{
+ enum vhost_type type;
+
+ type = vq->type;
+
+ /* TODO: Add support for other VHOST TYPES */
+ if (type == VHOST_TYPE_MMIO)
+ return vhost_virtqueue_put_buf_mmio(vq, head, len);
+}
+EXPORT_SYMBOL_GPL(vhost_virtqueue_put_buf);
+
/**
* vhost_create_vqs() - Invoke vhost_config_ops to create virtqueue
* @vdev: Vhost device that provides create_vqs() callback to create virtqueue
diff --git a/include/linux/vhost.h b/include/linux/vhost.h
index b22a19c66109..8efb9829c1b1 100644
--- a/include/linux/vhost.h
+++ b/include/linux/vhost.h
@@ -10,6 +10,7 @@
#include <linux/uio.h>
#include <linux/virtio_config.h>
#include <linux/virtio_ring.h>
+#include <linux/vringh.h>
#include <linux/atomic.h>
#include <linux/vhost_iotlb.h>
#include <uapi/linux/vhost.h>
@@ -60,9 +61,20 @@ enum vhost_uaddr_type {
VHOST_NUM_ADDRS = 3,
};

+enum vhost_type {
+ VHOST_TYPE_UNKNOWN,
+ VHOST_TYPE_USER,
+ VHOST_TYPE_KERN,
+ VHOST_TYPE_MMIO,
+};
+
/* The virtqueue structure describes a queue attached to a device. */
struct vhost_virtqueue {
struct vhost_dev *dev;
+ enum vhost_type type;
+ struct vringh vringh;
+ void (*callback)(struct vhost_virtqueue *vq);
+ void (*notify)(struct vhost_virtqueue *vq);

/* The actual ring of buffers. */
struct mutex mutex;
@@ -235,6 +247,16 @@ u8 vhost_get_status(struct vhost_dev *vdev);

int vhost_register_notifier(struct vhost_dev *vdev, struct notifier_block *nb);

+u64 vhost_virtqueue_get_outbuf(struct vhost_virtqueue *vq, u16 *head, int *len);
+u64 vhost_virtqueue_get_inbuf(struct vhost_virtqueue *vq, u16 *head, int *len);
+void vhost_virtqueue_put_buf(struct vhost_virtqueue *vq, u16 head, int len);
+
+void vhost_virtqueue_disable_cb(struct vhost_virtqueue *vq);
+bool vhost_virtqueue_enable_cb(struct vhost_virtqueue *vq);
+void vhost_virtqueue_notify(struct vhost_virtqueue *vq);
+void vhost_virtqueue_kick(struct vhost_virtqueue *vq);
+void vhost_virtqueue_callback(struct vhost_virtqueue *vq);
+
bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs,
int nvqs, int iov_limit, int weight, int byte_weight,
--
2.17.1

2020-07-02 08:23:45

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 04/22] vringh: Add helpers to access vring in MMIO

Add helpers to access vring in memory mapped IO. This would require
using IO accessors to access vring and the addresses populated by
virtio driver (in the descriptor) should be directly given to the
vhost client driver. Even if the vhost runs in 32 bit system, it
can access 64 bit address provided by the virtio if the vhost device
supports translation.

This is in preparation for adding VHOST devices (PCIe Endpoint or
Host in NTB) to access vrings created by VIRTIO devices (PCIe RC or
Host in NTB) over memory mapped IO.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/vhost/vringh.c | 332 +++++++++++++++++++++++++++++++++++++++++
include/linux/vringh.h | 46 ++++++
2 files changed, 378 insertions(+)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index ba8e0d6cfd97..b3f1910b99ec 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -5,6 +5,7 @@
* Since these may be in userspace, we use (inline) accessors.
*/
#include <linux/compiler.h>
+#include <linux/io.h>
#include <linux/module.h>
#include <linux/vringh.h>
#include <linux/virtio_ring.h>
@@ -188,6 +189,32 @@ static int move_to_indirect(const struct vringh *vrh,
return 0;
}

+static int resize_mmiovec(struct vringh_mmiov *iov, gfp_t gfp)
+{
+ unsigned int flag, new_num = (iov->max_num & ~VRINGH_IOV_ALLOCATED) * 2;
+ struct mmiovec *new;
+
+ if (new_num < 8)
+ new_num = 8;
+
+ flag = (iov->max_num & VRINGH_IOV_ALLOCATED);
+ if (flag) {
+ new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
+ } else {
+ new = kmalloc_array(new_num, sizeof(struct iovec), gfp);
+ if (new) {
+ memcpy(new, iov->iov,
+ iov->max_num * sizeof(struct iovec));
+ flag = VRINGH_IOV_ALLOCATED;
+ }
+ }
+ if (!new)
+ return -ENOMEM;
+ iov->iov = new;
+ iov->max_num = (new_num | flag);
+ return 0;
+}
+
static int resize_iovec(struct vringh_kiov *iov, gfp_t gfp)
{
struct kvec *new;
@@ -261,6 +288,142 @@ static int slow_copy(struct vringh *vrh, void *dst, const void *src,
return 0;
}

+static inline int
+__vringh_mmiov(struct vringh *vrh, u16 i, struct vringh_mmiov *riov,
+ struct vringh_mmiov *wiov,
+ bool (*rcheck)(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *, u64,
+ struct vringh_range *)),
+ bool (*getrange)(struct vringh *, u64, struct vringh_range *),
+ gfp_t gfp,
+ int (*copy)(const struct vringh *vrh,
+ void *dst, const void *src, size_t len))
+{
+ int err, count = 0, up_next, desc_max;
+ struct vring_desc desc, *descs;
+ struct vringh_range range = { -1ULL, 0 }, slowrange;
+ bool slow = false;
+
+ /* We start traversing vring's descriptor table. */
+ descs = vrh->vring.desc;
+ desc_max = vrh->vring.num;
+ up_next = -1;
+
+ if (riov) {
+ riov->i = 0;
+ riov->used = 0;
+ } else if (wiov) {
+ wiov->i = 0;
+ wiov->used = 0;
+ } else {
+ /* You must want something! */
+ WARN_ON(1);
+ }
+
+ for (;;) {
+ u64 addr;
+ struct vringh_mmiov *iov;
+ size_t len;
+
+ if (unlikely(slow))
+ err = slow_copy(vrh, &desc, &descs[i], rcheck, getrange,
+ &slowrange, copy);
+ else
+ err = copy(vrh, &desc, &descs[i], sizeof(desc));
+ if (unlikely(err))
+ goto fail;
+
+ if (unlikely(desc.flags &
+ cpu_to_vringh16(vrh, VRING_DESC_F_INDIRECT))) {
+ /* VRING_DESC_F_INDIRECT is not supported */
+ err = -EINVAL;
+ goto fail;
+ }
+
+ if (count++ == vrh->vring.num) {
+ vringh_bad("Descriptor loop in %p", descs);
+ err = -ELOOP;
+ goto fail;
+ }
+
+ if (desc.flags & cpu_to_vringh16(vrh, VRING_DESC_F_WRITE)) {
+ iov = wiov;
+ } else {
+ iov = riov;
+ if (unlikely(wiov && wiov->i)) {
+ vringh_bad("Readable desc %p after writable",
+ &descs[i]);
+ err = -EINVAL;
+ goto fail;
+ }
+ }
+
+ if (!iov) {
+ vringh_bad("Unexpected %s desc",
+ !wiov ? "writable" : "readable");
+ err = -EPROTO;
+ goto fail;
+ }
+
+again:
+ /* Make sure it's OK, and get offset. */
+ len = vringh32_to_cpu(vrh, desc.len);
+ if (!rcheck(vrh, vringh64_to_cpu(vrh, desc.addr), &len, &range,
+ getrange)) {
+ err = -EINVAL;
+ goto fail;
+ }
+ addr = vringh64_to_cpu(vrh, desc.addr) + range.offset;
+
+ if (unlikely(iov->used == (iov->max_num & ~VRINGH_IOV_ALLOCATED))) {
+ err = resize_mmiovec(iov, gfp);
+ if (err)
+ goto fail;
+ }
+
+ iov->iov[iov->used].iov_base = addr;
+ iov->iov[iov->used].iov_len = len;
+ iov->used++;
+
+ if (unlikely(len != vringh32_to_cpu(vrh, desc.len))) {
+ desc.len =
+ cpu_to_vringh32(vrh,
+ vringh32_to_cpu(vrh, desc.len)
+ - len);
+ desc.addr =
+ cpu_to_vringh64(vrh,
+ vringh64_to_cpu(vrh, desc.addr)
+ + len);
+ goto again;
+ }
+
+ if (desc.flags & cpu_to_vringh16(vrh, VRING_DESC_F_NEXT)) {
+ i = vringh16_to_cpu(vrh, desc.next);
+ } else {
+ /* Just in case we need to finish traversing above. */
+ if (unlikely(up_next > 0)) {
+ i = return_from_indirect(vrh, &up_next,
+ &descs, &desc_max);
+ slow = false;
+ } else {
+ break;
+ }
+ }
+
+ if (i >= desc_max) {
+ vringh_bad("Chained index %u > %u", i, desc_max);
+ err = -EINVAL;
+ goto fail;
+ }
+ }
+
+ return 0;
+
+fail:
+ return err;
+}
+
static inline int
__vringh_iov(struct vringh *vrh, u16 i,
struct vringh_kiov *riov,
@@ -833,6 +996,175 @@ int vringh_need_notify_user(struct vringh *vrh)
}
EXPORT_SYMBOL(vringh_need_notify_user);

+/* MMIO access helpers */
+static inline int getu16_mmio(const struct vringh *vrh,
+ u16 *val, const __virtio16 *p)
+{
+ *val = vringh16_to_cpu(vrh, readw(p));
+ return 0;
+}
+
+static inline int putu16_mmio(const struct vringh *vrh, __virtio16 *p, u16 val)
+{
+ writew(cpu_to_vringh16(vrh, val), p);
+ return 0;
+}
+
+static inline int copydesc_mmio(const struct vringh *vrh,
+ void *dst, const void *src, size_t len)
+{
+ memcpy_fromio(dst, src, len);
+ return 0;
+}
+
+static inline int putused_mmio(const struct vringh *vrh,
+ struct vring_used_elem *dst,
+ const struct vring_used_elem *src,
+ unsigned int num)
+{
+ memcpy_toio(dst, src, num * sizeof(*dst));
+ return 0;
+}
+
+/**
+ * vringh_init_mmio - initialize a vringh for a MMIO vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid.
+ */
+int vringh_init_mmio(struct vringh *vrh, u64 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc *desc,
+ struct vring_avail *avail,
+ struct vring_used *used)
+{
+ /* Sane power of 2 please! */
+ if (!num || num > 0xffff || (num & (num - 1))) {
+ vringh_bad("Bad ring size %u", num);
+ return -EINVAL;
+ }
+
+ vrh->little_endian = (features & (1ULL << VIRTIO_F_VERSION_1));
+ vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
+ vrh->weak_barriers = weak_barriers;
+ vrh->completed = 0;
+ vrh->last_avail_idx = 0;
+ vrh->last_used_idx = 0;
+ vrh->vring.num = num;
+ vrh->vring.desc = desc;
+ vrh->vring.avail = avail;
+ vrh->vring.used = used;
+ return 0;
+}
+EXPORT_SYMBOL(vringh_init_mmio);
+
+/**
+ * vringh_getdesc_mmio - get next available descriptor from MMIO ring.
+ * @vrh: the MMIO vring.
+ * @riov: where to put the readable descriptors (or NULL)
+ * @wiov: where to put the writable descriptors (or NULL)
+ * @head: head index we received, for passing to vringh_complete_mmio().
+ * @gfp: flags for allocating larger riov/wiov.
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * Note that on error return, you can tell the difference between an
+ * invalid ring and a single invalid descriptor: in the former case,
+ * *head will be vrh->vring.num. You may be able to ignore an invalid
+ * descriptor, but there's not much you can do with an invalid ring.
+ *
+ * Note that you may need to clean up riov and wiov, even on error!
+ */
+int vringh_getdesc_mmio(struct vringh *vrh,
+ struct vringh_mmiov *riov,
+ struct vringh_mmiov *wiov,
+ u16 *head,
+ gfp_t gfp)
+{
+ int err;
+
+ err = __vringh_get_head(vrh, getu16_mmio, &vrh->last_avail_idx);
+ if (err < 0)
+ return err;
+
+ /* Empty... */
+ if (err == vrh->vring.num)
+ return 0;
+
+ *head = err;
+ err = __vringh_mmiov(vrh, *head, riov, wiov, no_range_check, NULL,
+ gfp, copydesc_mmio);
+ if (err)
+ return err;
+
+ return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_mmio);
+
+/**
+ * vringh_complete_mmio - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_mmio.
+ * @len: the length of data we have written.
+ *
+ * You should check vringh_need_notify_mmio() after one or more calls
+ * to this function.
+ */
+int vringh_complete_mmio(struct vringh *vrh, u16 head, u32 len)
+{
+ struct vring_used_elem used;
+
+ used.id = cpu_to_vringh32(vrh, head);
+ used.len = cpu_to_vringh32(vrh, len);
+
+ return __vringh_complete(vrh, &used, 1, putu16_mmio, putused_mmio);
+}
+EXPORT_SYMBOL(vringh_complete_mmio);
+
+/**
+ * vringh_notify_enable_mmio - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns false if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_mmio(struct vringh *vrh)
+{
+ return __vringh_notify_enable(vrh, getu16_mmio, putu16_mmio);
+}
+EXPORT_SYMBOL(vringh_notify_enable_mmio);
+
+/**
+ * vringh_notify_disable_mmio - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_mmio(struct vringh *vrh)
+{
+ __vringh_notify_disable(vrh, putu16_mmio);
+}
+EXPORT_SYMBOL(vringh_notify_disable_mmio);
+
+/**
+ * vringh_need_notify_mmio - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_mmio() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_mmio(struct vringh *vrh)
+{
+ return __vringh_need_notify(vrh, getu16_mmio);
+}
+EXPORT_SYMBOL(vringh_need_notify_mmio);
+
/* Kernelspace access helpers. */
static inline int getu16_kern(const struct vringh *vrh,
u16 *val, const __virtio16 *p)
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 9e2763d7c159..0ba63a72b124 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -99,6 +99,23 @@ struct vringh_kiov {
unsigned i, used, max_num;
};

+struct mmiovec {
+ u64 iov_base;
+ size_t iov_len;
+};
+
+/**
+ * struct vringh_mmiov - mmiovec mangler.
+ *
+ * Mangles mmiovec in place, and restores it.
+ * Remaining data is iov + i, of used - i elements.
+ */
+struct vringh_mmiov {
+ struct mmiovec *iov;
+ size_t consumed; /* Within iov[i] */
+ unsigned int i, used, max_num;
+};
+
/* Flag on max_num to indicate we're kmalloced. */
#define VRINGH_IOV_ALLOCATED 0x8000000

@@ -213,6 +230,35 @@ void vringh_notify_disable_kern(struct vringh *vrh);

int vringh_need_notify_kern(struct vringh *vrh);

+/* Helpers for kernelspace vrings. */
+int vringh_init_mmio(struct vringh *vrh, u64 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc *desc,
+ struct vring_avail *avail,
+ struct vring_used *used);
+
+static inline void vringh_mmiov_init(struct vringh_mmiov *mmiov,
+ struct mmiovec *mmiovec, unsigned int num)
+{
+ mmiov->used = 0;
+ mmiov->i = 0;
+ mmiov->consumed = 0;
+ mmiov->max_num = num;
+ mmiov->iov = mmiovec;
+}
+
+int vringh_getdesc_mmio(struct vringh *vrh,
+ struct vringh_mmiov *riov,
+ struct vringh_mmiov *wiov,
+ u16 *head,
+ gfp_t gfp);
+
+int vringh_complete_mmio(struct vringh *vrh, u16 head, u32 len);
+
+bool vringh_notify_enable_mmio(struct vringh *vrh);
+void vringh_notify_disable_mmio(struct vringh *vrh);
+int vringh_need_notify_mmio(struct vringh *vrh);
+
/* Notify the guest about buffers added to the used ring */
static inline void vringh_notify(struct vringh *vrh)
{
--
2.17.1

2020-07-02 08:24:02

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 09/22] rpmsg: Introduce configfs entry for configuring rpmsg

Create a configfs entry for each "struct rpmsg_device_id" populated
in rpmsg client driver and create a configfs entry for each rpmsg
device. This will be used to bind a rpmsg client driver to a rpmsg
device in order to create a new rpmsg channel.

This is used for creating channel for VHOST based rpmsg bus (channels
are created in VIRTIO based bus during namespace announcement).

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/rpmsg/Makefile | 2 +-
drivers/rpmsg/rpmsg_cfs.c | 394 +++++++++++++++++++++++++++++++++
drivers/rpmsg/rpmsg_core.c | 7 +
drivers/rpmsg/rpmsg_internal.h | 16 ++
include/linux/rpmsg.h | 5 +
5 files changed, 423 insertions(+), 1 deletion(-)
create mode 100644 drivers/rpmsg/rpmsg_cfs.c

diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
index ae92a7fb08f6..047acfda518a 100644
--- a/drivers/rpmsg/Makefile
+++ b/drivers/rpmsg/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_RPMSG) += rpmsg_core.o
+obj-$(CONFIG_RPMSG) += rpmsg_core.o rpmsg_cfs.o
obj-$(CONFIG_RPMSG_CHAR) += rpmsg_char.o
obj-$(CONFIG_RPMSG_MTK_SCP) += mtk_rpmsg.o
obj-$(CONFIG_RPMSG_QCOM_GLINK_RPM) += qcom_glink_rpm.o
diff --git a/drivers/rpmsg/rpmsg_cfs.c b/drivers/rpmsg/rpmsg_cfs.c
new file mode 100644
index 000000000000..a5c77aba00ee
--- /dev/null
+++ b/drivers/rpmsg/rpmsg_cfs.c
@@ -0,0 +1,394 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * configfs to configure RPMSG
+ *
+ * Copyright (C) 2020 Texas Instruments
+ * Author: Kishon Vijay Abraham I <[email protected]>
+ */
+
+#include <linux/configfs.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+
+#include "rpmsg_internal.h"
+
+static struct config_group *channel_group;
+static struct config_group *virtproc_group;
+
+enum rpmsg_channel_status {
+ STATUS_FREE,
+ STATUS_BUSY,
+};
+
+struct rpmsg_channel {
+ struct config_item item;
+ struct device *dev;
+ enum rpmsg_channel_status status;
+};
+
+struct rpmsg_channel_group {
+ struct config_group group;
+};
+
+struct rpmsg_virtproc_group {
+ struct config_group group;
+ struct device *dev;
+ const struct rpmsg_virtproc_ops *ops;
+};
+
+static inline
+struct rpmsg_channel *to_rpmsg_channel(struct config_item *channel_item)
+{
+ return container_of(channel_item, struct rpmsg_channel, item);
+}
+
+static inline struct rpmsg_channel_group
+*to_rpmsg_channel_group(struct config_group *channel_group)
+{
+ return container_of(channel_group, struct rpmsg_channel_group, group);
+}
+
+static inline
+struct rpmsg_virtproc_group *to_rpmsg_virtproc_group(struct config_item *item)
+{
+ return container_of(to_config_group(item), struct rpmsg_virtproc_group,
+ group);
+}
+
+/**
+ * rpmsg_virtproc_channel_link() - Create softlink of rpmsg client device
+ * directory to virtproc configfs directory
+ * @virtproc_item: Config item representing configfs entry of virtual remote
+ * processor
+ * @channel_item: Config item representing configfs entry of rpmsg client
+ * driver
+ *
+ * Bind rpmsg client device to virtual remote processor by creating softlink
+ * between rpmsg client device directory to virtproc configfs directory
+ * in order to create a new rpmsg channel.
+ */
+static int rpmsg_virtproc_channel_link(struct config_item *virtproc_item,
+ struct config_item *channel_item)
+{
+ struct rpmsg_virtproc_group *vgroup;
+ struct rpmsg_channel *channel;
+ struct config_group *cgroup;
+ struct device *dev;
+
+ vgroup = to_rpmsg_virtproc_group(virtproc_item);
+ channel = to_rpmsg_channel(channel_item);
+
+ if (channel->status == STATUS_BUSY)
+ return -EBUSY;
+
+ cgroup = channel_item->ci_group;
+
+ if (vgroup->ops && vgroup->ops->create_channel) {
+ dev = vgroup->ops->create_channel(vgroup->dev,
+ cgroup->cg_item.ci_name);
+ if (IS_ERR_OR_NULL(dev))
+ return PTR_ERR(dev);
+ }
+
+ channel->dev = dev;
+ channel->status = STATUS_BUSY;
+
+ return 0;
+}
+
+/**
+ * rpmsg_virtproc_channel_unlink() - Remove softlink of rpmsg client device
+ * directory from virtproc configfs directory
+ * @virtproc_item: Config item representing configfs entry of virtual remote
+ * processor
+ * @channel_item: Config item representing configfs entry of rpmsg client
+ * driver
+ *
+ * Unbind rpmsg client device from virtual remote processor by removing softlink
+ * of rpmsg client device directory from virtproc configfs directory which
+ * deletes the rpmsg channel.
+ */
+static void rpmsg_virtproc_channel_unlink(struct config_item *virtproc_item,
+ struct config_item *channel_item)
+{
+ struct rpmsg_virtproc_group *vgroup;
+ struct rpmsg_channel *channel;
+
+ channel = to_rpmsg_channel(channel_item);
+ vgroup = to_rpmsg_virtproc_group(virtproc_item);
+
+ if (vgroup->ops && vgroup->ops->delete_channel)
+ vgroup->ops->delete_channel(channel->dev);
+
+ channel->status = STATUS_FREE;
+}
+
+static struct configfs_item_operations rpmsg_virtproc_item_ops = {
+ .allow_link = rpmsg_virtproc_channel_link,
+ .drop_link = rpmsg_virtproc_channel_unlink,
+};
+
+static const struct config_item_type rpmsg_virtproc_item_type = {
+ .ct_item_ops = &rpmsg_virtproc_item_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+/**
+ * rpmsg_cfs_add_virtproc_group() - Add new configfs directory for virtproc
+ * device
+ * @dev: Device representing the virtual remote processor
+ * @ops: rpmsg_virtproc_ops to create or delete rpmsg channel
+ *
+ * Add new configfs directory for virtproc device. The rpmsg client driver's
+ * configfs entry can be linked with this directory for creating a new
+ * rpmsg channel and the link can be removed for deleting the rpmsg channel.
+ */
+struct config_group *
+rpmsg_cfs_add_virtproc_group(struct device *dev,
+ const struct rpmsg_virtproc_ops *ops)
+{
+ struct rpmsg_virtproc_group *vgroup;
+ struct config_group *group;
+ struct device *vdev;
+ int ret;
+
+ vgroup = kzalloc(sizeof(*vgroup), GFP_KERNEL);
+ if (!vgroup)
+ return ERR_PTR(-ENOMEM);
+
+ group = &vgroup->group;
+ config_group_init_type_name(group, dev_name(dev),
+ &rpmsg_virtproc_item_type);
+ ret = configfs_register_group(virtproc_group, group);
+ if (ret)
+ goto err_register_group;
+
+ if (!try_module_get(ops->owner)) {
+ ret = -EPROBE_DEFER;
+ goto err_module_get;
+ }
+
+ vdev = get_device(dev);
+ vgroup->dev = vdev;
+ vgroup->ops = ops;
+
+ return group;
+
+err_module_get:
+ configfs_unregister_group(group);
+
+err_register_group:
+ kfree(vgroup);
+
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(rpmsg_cfs_add_virtproc_group);
+
+/**
+ * rpmsg_cfs_remove_virtproc_group() - Remove the configfs directory for
+ * virtproc device
+ * @group: config_group of the virtproc device
+ *
+ * Remove the configfs directory for virtproc device.
+ */
+void rpmsg_cfs_remove_virtproc_group(struct config_group *group)
+{
+ struct rpmsg_virtproc_group *vgroup;
+
+ if (!group)
+ return;
+
+ vgroup = container_of(group, struct rpmsg_virtproc_group, group);
+ put_device(vgroup->dev);
+ module_put(vgroup->ops->owner);
+ configfs_unregister_group(&vgroup->group);
+ kfree(vgroup);
+}
+EXPORT_SYMBOL(rpmsg_cfs_remove_virtproc_group);
+
+static const struct config_item_type rpmsg_channel_item_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+/**
+ * rpmsg_channel_make() - Allow user to create sub-directory of rpmsg client
+ * driver
+ * @name: Name of the sub-directory created by the user.
+ *
+ * Invoked when user creates a sub-directory to the configfs directory
+ * representing the rpmsg client driver. This can be linked with the virtproc
+ * directory for creating a new rpmsg channel.
+ */
+static struct config_item *
+rpmsg_channel_make(struct config_group *group, const char *name)
+{
+ struct rpmsg_channel *channel;
+
+ channel = kzalloc(sizeof(*channel), GFP_KERNEL);
+ if (!channel)
+ return ERR_PTR(-ENOMEM);
+
+ channel->status = STATUS_FREE;
+
+ config_item_init_type_name(&channel->item, name, &rpmsg_channel_item_type);
+ return &channel->item;
+}
+
+/**
+ * rpmsg_channel_drop() - Allow user to delete sub-directory of rpmsg client
+ * driver
+ * @item: Config item representing the sub-directory the user created returned
+ * by rpmsg_channel_make()
+ *
+ * Invoked when user creates a sub-directory to the configfs directory
+ * representing the rpmsg client driver. This can be linked with the virtproc
+ * directory for creating a new rpmsg channel.
+ */
+static void rpmsg_channel_drop(struct config_group *group, struct config_item *item)
+{
+ struct rpmsg_channel *channel;
+
+ channel = to_rpmsg_channel(item);
+ kfree(channel);
+}
+
+static struct configfs_group_operations rpmsg_channel_group_ops = {
+ .make_item = &rpmsg_channel_make,
+ .drop_item = &rpmsg_channel_drop,
+};
+
+static const struct config_item_type rpmsg_channel_group_type = {
+ .ct_group_ops = &rpmsg_channel_group_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+/**
+ * rpmsg_cfs_add_channel_group() - Create a configfs directory for each
+ * registered rpmsg client driver
+ * @name: The name of the rpmsg client driver
+ *
+ * Create a configfs directory for each registered rpmsg client driver. The
+ * user can create sub-directory within this directory for creating
+ * rpmsg channels to be used by the rpmsg client driver.
+ */
+struct config_group *rpmsg_cfs_add_channel_group(const char *name)
+{
+ struct rpmsg_channel_group *cgroup;
+ struct config_group *group;
+ int ret;
+
+ cgroup = kzalloc(sizeof(*cgroup), GFP_KERNEL);
+ if (!cgroup)
+ return ERR_PTR(-ENOMEM);
+
+ group = &cgroup->group;
+ config_group_init_type_name(group, name, &rpmsg_channel_group_type);
+ ret = configfs_register_group(channel_group, group);
+ if (ret)
+ return ERR_PTR(ret);
+
+ return group;
+}
+EXPORT_SYMBOL(rpmsg_cfs_add_channel_group);
+
+/**
+ * rpmsg_cfs_remove_channel_group() - Remove the configfs directory associated
+ * with the rpmsg client driver
+ * @group: Config group representing the rpmsg client driver
+ *
+ * Remove the configfs directory associated with the rpmsg client driver.
+ */
+void rpmsg_cfs_remove_channel_group(struct config_group *group)
+{
+ struct rpmsg_channel_group *cgroup;
+
+ if (IS_ERR_OR_NULL(group))
+ return;
+
+ cgroup = to_rpmsg_channel_group(group);
+ configfs_unregister_default_group(group);
+ kfree(cgroup);
+}
+EXPORT_SYMBOL(rpmsg_cfs_remove_channel_group);
+
+static const struct config_item_type rpmsg_channel_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+static const struct config_item_type rpmsg_virtproc_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+static const struct config_item_type rpmsg_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+static struct configfs_subsystem rpmsg_cfs_subsys = {
+ .su_group = {
+ .cg_item = {
+ .ci_namebuf = "rpmsg",
+ .ci_type = &rpmsg_type,
+ },
+ },
+ .su_mutex = __MUTEX_INITIALIZER(rpmsg_cfs_subsys.su_mutex),
+};
+
+static int __init rpmsg_cfs_init(void)
+{
+ int ret;
+ struct config_group *root = &rpmsg_cfs_subsys.su_group;
+
+ config_group_init(root);
+
+ ret = configfs_register_subsystem(&rpmsg_cfs_subsys);
+ if (ret) {
+ pr_err("Error %d while registering subsystem %s\n",
+ ret, root->cg_item.ci_namebuf);
+ goto err;
+ }
+
+ channel_group = configfs_register_default_group(root, "channel",
+ &rpmsg_channel_type);
+ if (IS_ERR(channel_group)) {
+ ret = PTR_ERR(channel_group);
+ pr_err("Error %d while registering channel group\n",
+ ret);
+ goto err_channel_group;
+ }
+
+ virtproc_group =
+ configfs_register_default_group(root, "virtproc",
+ &rpmsg_virtproc_type);
+ if (IS_ERR(virtproc_group)) {
+ ret = PTR_ERR(virtproc_group);
+ pr_err("Error %d while registering virtproc group\n",
+ ret);
+ goto err_virtproc_group;
+ }
+
+ return 0;
+
+err_virtproc_group:
+ configfs_unregister_default_group(channel_group);
+
+err_channel_group:
+ configfs_unregister_subsystem(&rpmsg_cfs_subsys);
+
+err:
+ return ret;
+}
+module_init(rpmsg_cfs_init);
+
+static void __exit rpmsg_cfs_exit(void)
+{
+ configfs_unregister_default_group(virtproc_group);
+ configfs_unregister_default_group(channel_group);
+ configfs_unregister_subsystem(&rpmsg_cfs_subsys);
+}
+module_exit(rpmsg_cfs_exit);
+
+MODULE_DESCRIPTION("PCI RPMSG CONFIGFS");
+MODULE_AUTHOR("Kishon Vijay Abraham I <[email protected]>");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/rpmsg/rpmsg_core.c b/drivers/rpmsg/rpmsg_core.c
index e330ec4dfc33..68569fec03e2 100644
--- a/drivers/rpmsg/rpmsg_core.c
+++ b/drivers/rpmsg/rpmsg_core.c
@@ -563,8 +563,15 @@ EXPORT_SYMBOL(rpmsg_unregister_device);
*/
int __register_rpmsg_driver(struct rpmsg_driver *rpdrv, struct module *owner)
{
+ const struct rpmsg_device_id *ids = rpdrv->id_table;
rpdrv->drv.bus = &rpmsg_bus;
rpdrv->drv.owner = owner;
+
+ while (ids && ids->name[0]) {
+ rpmsg_cfs_add_channel_group(ids->name);
+ ids++;
+ }
+
return driver_register(&rpdrv->drv);
}
EXPORT_SYMBOL(__register_rpmsg_driver);
diff --git a/drivers/rpmsg/rpmsg_internal.h b/drivers/rpmsg/rpmsg_internal.h
index 3fc83cd50e98..39b3a5caf242 100644
--- a/drivers/rpmsg/rpmsg_internal.h
+++ b/drivers/rpmsg/rpmsg_internal.h
@@ -68,6 +68,18 @@ struct rpmsg_endpoint_ops {
poll_table *wait);
};

+/**
+ * struct rpmsg_virtproc_ops - indirection table for rpmsg_virtproc operations
+ * @create_channel: Create a new rpdev channel
+ * @delete_channel: Delete the rpdev channel
+ * @owner: Owner of the module holding the ops
+ */
+struct rpmsg_virtproc_ops {
+ struct device *(*create_channel)(struct device *dev, const char *name);
+ void (*delete_channel)(struct device *dev);
+ struct module *owner;
+};
+
int rpmsg_register_device(struct rpmsg_device *rpdev);
int rpmsg_unregister_device(struct device *parent,
struct rpmsg_channel_info *chinfo);
@@ -75,6 +87,10 @@ int rpmsg_unregister_device(struct device *parent,
struct device *rpmsg_find_device(struct device *parent,
struct rpmsg_channel_info *chinfo);

+struct config_group *
+rpmsg_cfs_add_virtproc_group(struct device *dev,
+ const struct rpmsg_virtproc_ops *ops);
+
/**
* rpmsg_chrdev_register_device() - register chrdev device based on rpdev
* @rpdev: prepared rpdev to be used for creating endpoints
diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h
index 9fe156d1c018..b9d9283b46ac 100644
--- a/include/linux/rpmsg.h
+++ b/include/linux/rpmsg.h
@@ -135,6 +135,7 @@ int rpmsg_trysend_offchannel(struct rpmsg_endpoint *ept, u32 src, u32 dst,
__poll_t rpmsg_poll(struct rpmsg_endpoint *ept, struct file *filp,
poll_table *wait);

+struct config_group *rpmsg_cfs_add_channel_group(const char *name);
#else

static inline int register_rpmsg_device(struct rpmsg_device *dev)
@@ -242,6 +243,10 @@ static inline __poll_t rpmsg_poll(struct rpmsg_endpoint *ept,
return 0;
}

+static inline struct config_group *rpmsg_cfs_add_channel_group(const char *name)
+{
+ return NULL;
+}
#endif /* IS_ENABLED(CONFIG_RPMSG) */

/* use a macro to avoid include chaining to get THIS_MODULE */
--
2.17.1

2020-07-02 08:24:11

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 11/22] rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to rpmsg_internal.h

No functional change intended. Move generic rpmsg structures like
"struct rpmsg_hdr", "struct rpmsg_ns_msg" and "struct rpmsg_as_msg",
its associated flags and generic macros to rpmsg_internal.h. This is
in preparation to add VHOST based vhost_rpmsg_bus.c which will use
the same structures and macros.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/rpmsg/rpmsg_internal.h | 120 +++++++++++++++++++++++++++++++
drivers/rpmsg/virtio_rpmsg_bus.c | 120 -------------------------------
2 files changed, 120 insertions(+), 120 deletions(-)

diff --git a/drivers/rpmsg/rpmsg_internal.h b/drivers/rpmsg/rpmsg_internal.h
index 39b3a5caf242..69d9c9579b50 100644
--- a/drivers/rpmsg/rpmsg_internal.h
+++ b/drivers/rpmsg/rpmsg_internal.h
@@ -15,6 +15,126 @@
#include <linux/rpmsg.h>
#include <linux/poll.h>

+/* The feature bitmap for virtio rpmsg */
+#define VIRTIO_RPMSG_F_NS 0 /* RP supports name service notifications */
+#define VIRTIO_RPMSG_F_AS 1 /* RP supports address service notifications */
+
+/**
+ * struct rpmsg_hdr - common header for all rpmsg messages
+ * @src: source address
+ * @dst: destination address
+ * @reserved: reserved for future use
+ * @len: length of payload (in bytes)
+ * @flags: message flags
+ * @data: @len bytes of message payload data
+ *
+ * Every message sent(/received) on the rpmsg bus begins with this header.
+ */
+struct rpmsg_hdr {
+ u32 src;
+ u32 dst;
+ u32 reserved;
+ u16 len;
+ u16 flags;
+ u8 data[0];
+} __packed;
+
+/**
+ * struct rpmsg_ns_msg - dynamic name service announcement message
+ * @name: name of remote service that is published
+ * @addr: address of remote service that is published
+ * @flags: indicates whether service is created or destroyed
+ *
+ * This message is sent across to publish a new service, or announce
+ * about its removal. When we receive these messages, an appropriate
+ * rpmsg channel (i.e device) is created/destroyed. In turn, the ->probe()
+ * or ->remove() handler of the appropriate rpmsg driver will be invoked
+ * (if/as-soon-as one is registered).
+ */
+struct rpmsg_ns_msg {
+ char name[RPMSG_NAME_SIZE];
+ u32 addr;
+ u32 flags;
+} __packed;
+
+/**
+ * struct rpmsg_as_msg - dynamic address service announcement message
+ * @name: name of the created channel
+ * @dst: destination address to be used by the backend rpdev
+ * @src: source address of the backend rpdev (the one that sent name service
+ * announcement message)
+ * @flags: indicates whether service is created or destroyed
+ *
+ * This message is sent (by virtio_rpmsg_bus) when a new channel is created
+ * in response to name service announcement message by backend rpdev to create
+ * a new channel. This sends the allocated source address for the channel
+ * (destination address for the backend rpdev) to the backend rpdev.
+ */
+struct rpmsg_as_msg {
+ char name[RPMSG_NAME_SIZE];
+ u32 dst;
+ u32 src;
+ u32 flags;
+} __packed;
+
+/**
+ * enum rpmsg_ns_flags - dynamic name service announcement flags
+ *
+ * @RPMSG_NS_CREATE: a new remote service was just created
+ * @RPMSG_NS_DESTROY: a known remote service was just destroyed
+ */
+enum rpmsg_ns_flags {
+ RPMSG_NS_CREATE = 0,
+ RPMSG_NS_DESTROY = 1,
+ RPMSG_AS_ANNOUNCE = 2,
+};
+
+/**
+ * enum rpmsg_as_flags - dynamic address service announcement flags
+ *
+ * @RPMSG_AS_ASSIGN: address has been assigned to the newly created channel
+ * @RPMSG_AS_FREE: assigned address is freed from the channel and no longer can
+ * be used
+ */
+enum rpmsg_as_flags {
+ RPMSG_AS_ASSIGN = 1,
+ RPMSG_AS_FREE = 2,
+};
+
+/*
+ * We're allocating buffers of 512 bytes each for communications. The
+ * number of buffers will be computed from the number of buffers supported
+ * by the vring, upto a maximum of 512 buffers (256 in each direction).
+ *
+ * Each buffer will have 16 bytes for the msg header and 496 bytes for
+ * the payload.
+ *
+ * This will utilize a maximum total space of 256KB for the buffers.
+ *
+ * We might also want to add support for user-provided buffers in time.
+ * This will allow bigger buffer size flexibility, and can also be used
+ * to achieve zero-copy messaging.
+ *
+ * Note that these numbers are purely a decision of this driver - we
+ * can change this without changing anything in the firmware of the remote
+ * processor.
+ */
+#define MAX_RPMSG_NUM_BUFS (512)
+#define MAX_RPMSG_BUF_SIZE (512)
+
+/*
+ * Local addresses are dynamically allocated on-demand.
+ * We do not dynamically assign addresses from the low 1024 range,
+ * in order to reserve that address range for predefined services.
+ */
+#define RPMSG_RESERVED_ADDRESSES (1024)
+
+/* Address 53 is reserved for advertising remote services */
+#define RPMSG_NS_ADDR (53)
+
+/* Address 54 is reserved for advertising address services */
+#define RPMSG_AS_ADDR (54)
+
#define to_rpmsg_device(d) container_of(d, struct rpmsg_device, dev)
#define to_rpmsg_driver(d) container_of(d, struct rpmsg_driver, drv)

diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index 19d930c9fc2c..f91143b25af7 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -69,92 +69,6 @@ struct virtproc_info {
struct rpmsg_endpoint *ns_ept;
};

-/* The feature bitmap for virtio rpmsg */
-#define VIRTIO_RPMSG_F_NS 0 /* RP supports name service notifications */
-#define VIRTIO_RPMSG_F_AS 1 /* RP supports address service notifications */
-
-/**
- * struct rpmsg_hdr - common header for all rpmsg messages
- * @src: source address
- * @dst: destination address
- * @reserved: reserved for future use
- * @len: length of payload (in bytes)
- * @flags: message flags
- * @data: @len bytes of message payload data
- *
- * Every message sent(/received) on the rpmsg bus begins with this header.
- */
-struct rpmsg_hdr {
- u32 src;
- u32 dst;
- u32 reserved;
- u16 len;
- u16 flags;
- u8 data[0];
-} __packed;
-
-/**
- * struct rpmsg_ns_msg - dynamic name service announcement message
- * @name: name of remote service that is published
- * @addr: address of remote service that is published
- * @flags: indicates whether service is created or destroyed
- *
- * This message is sent across to publish a new service, or announce
- * about its removal. When we receive these messages, an appropriate
- * rpmsg channel (i.e device) is created/destroyed. In turn, the ->probe()
- * or ->remove() handler of the appropriate rpmsg driver will be invoked
- * (if/as-soon-as one is registered).
- */
-struct rpmsg_ns_msg {
- char name[RPMSG_NAME_SIZE];
- u32 addr;
- u32 flags;
-} __packed;
-
-/**
- * struct rpmsg_as_msg - dynamic address service announcement message
- * @name: name of the created channel
- * @dst: destination address to be used by the backend rpdev
- * @src: source address of the backend rpdev (the one that sent name service
- * announcement message)
- * @flags: indicates whether service is created or destroyed
- *
- * This message is sent (by virtio_rpmsg_bus) when a new channel is created
- * in response to name service announcement message by backend rpdev to create
- * a new channel. This sends the allocated source address for the channel
- * (destination address for the backend rpdev) to the backend rpdev.
- */
-struct rpmsg_as_msg {
- char name[RPMSG_NAME_SIZE];
- u32 dst;
- u32 src;
- u32 flags;
-} __packed;
-
-/**
- * enum rpmsg_ns_flags - dynamic name service announcement flags
- *
- * @RPMSG_NS_CREATE: a new remote service was just created
- * @RPMSG_NS_DESTROY: a known remote service was just destroyed
- */
-enum rpmsg_ns_flags {
- RPMSG_NS_CREATE = 0,
- RPMSG_NS_DESTROY = 1,
- RPMSG_AS_ANNOUNCE = 2,
-};
-
-/**
- * enum rpmsg_as_flags - dynamic address service announcement flags
- *
- * @RPMSG_AS_ASSIGN: address has been assigned to the newly created channel
- * @RPMSG_AS_FREE: assigned address is freed from the channel and no longer can
- * be used
- */
-enum rpmsg_as_flags {
- RPMSG_AS_ASSIGN = 1,
- RPMSG_AS_FREE = 2,
-};
-
/**
* @vrp: the remote processor this channel belongs to
*/
@@ -167,40 +81,6 @@ struct virtio_rpmsg_channel {
#define to_virtio_rpmsg_channel(_rpdev) \
container_of(_rpdev, struct virtio_rpmsg_channel, rpdev)

-/*
- * We're allocating buffers of 512 bytes each for communications. The
- * number of buffers will be computed from the number of buffers supported
- * by the vring, upto a maximum of 512 buffers (256 in each direction).
- *
- * Each buffer will have 16 bytes for the msg header and 496 bytes for
- * the payload.
- *
- * This will utilize a maximum total space of 256KB for the buffers.
- *
- * We might also want to add support for user-provided buffers in time.
- * This will allow bigger buffer size flexibility, and can also be used
- * to achieve zero-copy messaging.
- *
- * Note that these numbers are purely a decision of this driver - we
- * can change this without changing anything in the firmware of the remote
- * processor.
- */
-#define MAX_RPMSG_NUM_BUFS (512)
-#define MAX_RPMSG_BUF_SIZE (512)
-
-/*
- * Local addresses are dynamically allocated on-demand.
- * We do not dynamically assign addresses from the low 1024 range,
- * in order to reserve that address range for predefined services.
- */
-#define RPMSG_RESERVED_ADDRESSES (1024)
-
-/* Address 53 is reserved for advertising remote services */
-#define RPMSG_NS_ADDR (53)
-
-/* Address 54 is reserved for advertising address services */
-#define RPMSG_AS_ADDR (54)
-
static void virtio_rpmsg_destroy_ept(struct rpmsg_endpoint *ept);
static int virtio_rpmsg_send(struct rpmsg_endpoint *ept, void *data, int len);
static int virtio_rpmsg_sendto(struct rpmsg_endpoint *ept, void *data, int len,
--
2.17.1

2020-07-02 08:24:21

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 12/22] virtio: Add ops to allocate and free buffer

Add ops to allocate and free buffer in struct virtio_config_ops.
Certain vhost devices can have restriction on the range of memory
it can access on the virtio. The virtio drivers attached to
such vhost devices reserves memory that can be accessed by
vhost. This function allocates buffer for such reserved region.

For instance when virtio-vhost is used by two hosts connected to
NTB, the vhost can access only memory exposed by memory windows
and the size of the memory window can be limited. Here the NTB
virtio driver can reserve a small region (few MBs) and provide
buffer address from this pool whenever requested by virtio client
driver.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
include/linux/virtio_config.h | 42 +++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index bb4cc4910750..419f733017c2 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -65,6 +65,9 @@ struct irq_affinity;
* the caller can then copy.
* @set_vq_affinity: set the affinity for a virtqueue (optional).
* @get_vq_affinity: get the affinity for a virtqueue (optional).
+ * @alloc_buffer: Allocate and provide buffer addresses that can be
+ * accessed by both virtio and vhost
+ * @free_buffer: Free the allocated buffer address
*/
typedef void vq_callback_t(struct virtqueue *);
struct virtio_config_ops {
@@ -88,6 +91,9 @@ struct virtio_config_ops {
const struct cpumask *cpu_mask);
const struct cpumask *(*get_vq_affinity)(struct virtio_device *vdev,
int index);
+ void * (*alloc_buffer)(struct virtio_device *vdev, size_t size);
+ void (*free_buffer)(struct virtio_device *vdev, void *addr,
+ size_t size);
};

/* If driver didn't advertise the feature, it will never appear. */
@@ -232,6 +238,42 @@ const char *virtio_bus_name(struct virtio_device *vdev)
return vdev->config->bus_name(vdev);
}

+/**
+ * virtio_alloc_buffer - Allocate buffer from the reserved memory
+ * @vdev: Virtio device which manages the reserved memory
+ * @size: Size of the buffer to be allocated
+ *
+ * Certain vhost devices can have restriction on the range of memory
+ * it can access on the virtio. The virtio drivers attached to
+ * such vhost devices reserves memory that can be accessed by
+ * vhost. This function allocates buffer for such reserved region.
+ */
+static inline void *
+virtio_alloc_buffer(struct virtio_device *vdev, size_t size)
+{
+ if (!vdev->config->alloc_buffer)
+ return NULL;
+
+ return vdev->config->alloc_buffer(vdev, size);
+}
+
+/**
+ * virtio_free_buffer - Free the allocated buffer
+ * @vdev: Virtio device which manages the reserved memory
+ * @addr: Address returned by virtio_alloc_buffer()
+ * @size: Size of the buffer that has to be freed
+ *
+ * Free the allocated buffer address given by virtio_alloc_buffer().
+ */
+static inline void
+virtio_free_buffer(struct virtio_device *vdev, void *addr, size_t size)
+{
+ if (!vdev->config->free_buffer)
+ return;
+
+ return vdev->config->free_buffer(vdev, addr, size);
+}
+
/**
* virtqueue_set_affinity - setting affinity for a virtqueue
* @vq: the virtqueue
--
2.17.1

2020-07-02 08:24:36

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 13/22] rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and virtio_free_buffer()

Use virtio_alloc_buffer() and virtio_free_buffer() to allocate and free
memory buffer respectively. Only if buffer allocation using
virtio_alloc_buffer() try using dma_alloc_coherent(). This is required
for devices like NTB to use rpmsg for communicating with other host.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/rpmsg/virtio_rpmsg_bus.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index f91143b25af7..2b25a8ae1539 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -882,13 +882,16 @@ static int rpmsg_probe(struct virtio_device *vdev)

total_buf_space = vrp->num_bufs * vrp->buf_size;

- /* allocate coherent memory for the buffers */
- bufs_va = dma_alloc_coherent(vdev->dev.parent,
- total_buf_space, &vrp->bufs_dma,
- GFP_KERNEL);
+ bufs_va = virtio_alloc_buffer(vdev, total_buf_space);
if (!bufs_va) {
- err = -ENOMEM;
- goto vqs_del;
+ /* allocate coherent memory for the buffers */
+ bufs_va = dma_alloc_coherent(vdev->dev.parent,
+ total_buf_space, &vrp->bufs_dma,
+ GFP_KERNEL);
+ if (!bufs_va) {
+ err = -ENOMEM;
+ goto vqs_del;
+ }
}

dev_dbg(&vdev->dev, "buffers: va %pK, dma %pad\n",
@@ -951,8 +954,13 @@ static int rpmsg_probe(struct virtio_device *vdev)
return 0;

free_coherent:
- dma_free_coherent(vdev->dev.parent, total_buf_space,
- bufs_va, vrp->bufs_dma);
+ if (!vrp->bufs_dma) {
+ virtio_free_buffer(vdev, bufs_va, total_buf_space);
+ } else {
+ dma_free_coherent(vdev->dev.parent, total_buf_space,
+ bufs_va, vrp->bufs_dma);
+ }
+
vqs_del:
vdev->config->del_vqs(vrp->vdev);
free_vrp:
@@ -986,8 +994,12 @@ static void rpmsg_remove(struct virtio_device *vdev)

vdev->config->del_vqs(vrp->vdev);

- dma_free_coherent(vdev->dev.parent, total_buf_space,
- vrp->rbufs, vrp->bufs_dma);
+ if (!vrp->bufs_dma) {
+ virtio_free_buffer(vdev, vrp->rbufs, total_buf_space);
+ } else {
+ dma_free_coherent(vdev->dev.parent, total_buf_space,
+ vrp->rbufs, vrp->bufs_dma);
+ }

kfree(vrp);
}
--
2.17.1

2020-07-02 08:24:43

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 14/22] rpmsg: Add VHOST based remote processor messaging bus

Add a VHOST-based inter-processor communication bus, which enables
kernel drivers to communicate with VIRTIO-based messaging bus,
running on remote processors, over shared memory using a simple
messaging protocol.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/rpmsg/Kconfig | 10 +
drivers/rpmsg/Makefile | 1 +
drivers/rpmsg/vhost_rpmsg_bus.c | 1151 +++++++++++++++++++++++++++++++
include/linux/rpmsg.h | 1 +
4 files changed, 1163 insertions(+)
create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c

diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
index a9108ff563dc..881712f424d3 100644
--- a/drivers/rpmsg/Kconfig
+++ b/drivers/rpmsg/Kconfig
@@ -64,4 +64,14 @@ config RPMSG_VIRTIO
select RPMSG
select VIRTIO

+config RPMSG_VHOST
+ tristate "Vhost RPMSG bus driver"
+ depends on HAS_DMA
+ select RPMSG
+ select VHOST
+ help
+ Say y here to enable support for the RPMSG VHOST driver
+ providing communication channels to remote processors running
+ RPMSG VIRTIO driver.
+
endmenu
diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
index 047acfda518a..44023b0abe9e 100644
--- a/drivers/rpmsg/Makefile
+++ b/drivers/rpmsg/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_RPMSG_QCOM_GLINK_NATIVE) += qcom_glink_native.o
obj-$(CONFIG_RPMSG_QCOM_GLINK_SMEM) += qcom_glink_smem.o
obj-$(CONFIG_RPMSG_QCOM_SMD) += qcom_smd.o
obj-$(CONFIG_RPMSG_VIRTIO) += virtio_rpmsg_bus.o
+obj-$(CONFIG_RPMSG_VHOST) += vhost_rpmsg_bus.o
diff --git a/drivers/rpmsg/vhost_rpmsg_bus.c b/drivers/rpmsg/vhost_rpmsg_bus.c
new file mode 100644
index 000000000000..6bfb5c64c95a
--- /dev/null
+++ b/drivers/rpmsg/vhost_rpmsg_bus.c
@@ -0,0 +1,1151 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Vhost-based remote processor messaging bus
+ *
+ * Based on virtio_rpmsg_bus.c
+ *
+ * Copyright (C) 2020 Texas Instruments
+ * Author: Kishon Vijay Abraham I <[email protected]>
+ *
+ */
+
+#define pr_fmt(fmt) "%s: " fmt, __func__
+
+#include <linux/configfs.h>
+#include <linux/idr.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/rpmsg.h>
+#include <linux/vhost.h>
+#include <linux/virtio_ids.h>
+
+#include "rpmsg_internal.h"
+
+/**
+ * struct virtproc_info - virtual remote processor state
+ * @vdev: the virtio device
+ * @rvq: rx vhost_virtqueue
+ * @svq: tx vhost_virtqueue
+ * @buf_size: size of one rx or tx buffer
+ * @tx_lock: protects svq, sbufs and sleepers, to allow concurrent senders.
+ * sending a message might require waking up a dozing remote
+ * processor, which involves sleeping, hence the mutex.
+ * @endpoints: idr of local endpoints, allows fast retrieval
+ * @endpoints_lock: lock of the endpoints set
+ * @sendq: wait queue of sending contexts waiting for a tx buffers
+ * @sleepers: number of senders that are waiting for a tx buffer
+ * @as_ept: the bus's address service endpoint
+ * @nb: notifier block for receiving notifications from vhost device
+ * driver
+ * @list: maintain list of client drivers bound to rpmsg vhost device
+ * @list_lock: mutex to protect updating the list
+ *
+ * This structure stores the rpmsg state of a given vhost remote processor
+ * device (there might be several virtio proc devices for each physical
+ * remote processor).
+ */
+struct virtproc_info {
+ struct vhost_dev *vdev;
+ struct vhost_virtqueue *rvq, *svq;
+ unsigned int buf_size;
+ /* mutex to protect sending messages */
+ struct mutex tx_lock;
+ /* mutex to protect receiving messages */
+ struct mutex rx_lock;
+ struct idr endpoints;
+ /* mutex to protect receiving accessing idr */
+ struct mutex endpoints_lock;
+ wait_queue_head_t sendq;
+ atomic_t sleepers;
+ struct rpmsg_endpoint *as_ept;
+ struct notifier_block nb;
+ struct list_head list;
+ /* mutex to protect updating pending rpdev in vrp */
+ struct mutex list_lock;
+};
+
+/**
+ * @vrp: the remote processor this channel belongs to
+ */
+struct vhost_rpmsg_channel {
+ struct rpmsg_device rpdev;
+
+ struct virtproc_info *vrp;
+};
+
+#define to_vhost_rpmsg_channel(_rpdev) \
+ container_of(_rpdev, struct vhost_rpmsg_channel, rpdev)
+
+static void vhost_rpmsg_destroy_ept(struct rpmsg_endpoint *ept);
+static int vhost_rpmsg_send(struct rpmsg_endpoint *ept, void *data, int len);
+static int vhost_rpmsg_sendto(struct rpmsg_endpoint *ept, void *data, int len,
+ u32 dst);
+static int vhost_rpmsg_send_offchannel(struct rpmsg_endpoint *ept, u32 src,
+ u32 dst, void *data, int len);
+static int vhost_rpmsg_trysend(struct rpmsg_endpoint *ept, void *data, int len);
+static int vhost_rpmsg_trysendto(struct rpmsg_endpoint *ept, void *data,
+ int len, u32 dst);
+static int vhost_rpmsg_trysend_offchannel(struct rpmsg_endpoint *ept, u32 src,
+ u32 dst, void *data, int len);
+
+static const struct rpmsg_endpoint_ops vhost_endpoint_ops = {
+ .destroy_ept = vhost_rpmsg_destroy_ept,
+ .send = vhost_rpmsg_send,
+ .sendto = vhost_rpmsg_sendto,
+ .send_offchannel = vhost_rpmsg_send_offchannel,
+ .trysend = vhost_rpmsg_trysend,
+ .trysendto = vhost_rpmsg_trysendto,
+ .trysend_offchannel = vhost_rpmsg_trysend_offchannel,
+};
+
+/**
+ * __ept_release() - deallocate an rpmsg endpoint
+ * @kref: the ept's reference count
+ *
+ * This function deallocates an ept, and is invoked when its @kref refcount
+ * drops to zero.
+ *
+ * Never invoke this function directly!
+ */
+static void __ept_release(struct kref *kref)
+{
+ struct rpmsg_endpoint *ept = container_of(kref, struct rpmsg_endpoint,
+ refcount);
+ /*
+ * At this point no one holds a reference to ept anymore,
+ * so we can directly free it
+ */
+ kfree(ept);
+}
+
+/**
+ * __rpmsg_create_ept() - Create rpmsg endpoint
+ * @vrp: virtual remote processor of the vhost device where endpoint has to be
+ * created
+ * @rpdev: rpmsg device on which endpoint has to be created
+ * @cb: callback associated with the endpoint
+ * @priv: private data for the driver's use
+ * @addr: channel_info with the local rpmsg address to bind with @cb
+ *
+ * Allows drivers to create an endpoint, and bind a callback with some
+ * private data, to an rpmsg address.
+ */
+static struct rpmsg_endpoint *__rpmsg_create_ept(struct virtproc_info *vrp,
+ struct rpmsg_device *rpdev,
+ rpmsg_rx_cb_t cb,
+ void *priv, u32 addr)
+{
+ int id_min, id_max, id;
+ struct rpmsg_endpoint *ept;
+ struct device *dev = rpdev ? &rpdev->dev : &vrp->vdev->dev;
+
+ ept = kzalloc(sizeof(*ept), GFP_KERNEL);
+ if (!ept)
+ return NULL;
+
+ kref_init(&ept->refcount);
+ mutex_init(&ept->cb_lock);
+
+ ept->rpdev = rpdev;
+ ept->cb = cb;
+ ept->priv = priv;
+ ept->ops = &vhost_endpoint_ops;
+
+ /* do we need to allocate a local address ? */
+ if (addr == RPMSG_ADDR_ANY) {
+ id_min = RPMSG_RESERVED_ADDRESSES;
+ id_max = 0;
+ } else {
+ id_min = addr;
+ id_max = addr + 1;
+ }
+
+ mutex_lock(&vrp->endpoints_lock);
+
+ /* bind the endpoint to an rpmsg address (and allocate one if needed) */
+ id = idr_alloc(&vrp->endpoints, ept, id_min, id_max, GFP_KERNEL);
+ if (id < 0) {
+ dev_err(dev, "idr_alloc failed: %d\n", id);
+ goto free_ept;
+ }
+ ept->addr = id;
+
+ mutex_unlock(&vrp->endpoints_lock);
+
+ return ept;
+
+free_ept:
+ mutex_unlock(&vrp->endpoints_lock);
+ kref_put(&ept->refcount, __ept_release);
+ return NULL;
+}
+
+/**
+ * vhost_rpmsg_create_ept() - Create rpmsg endpoint
+ * @rpdev: rpmsg device on which endpoint has to be created
+ * @cb: callback associated with the endpoint
+ * @priv: private data for the driver's use
+ * @chinfo: channel_info with the local rpmsg address to bind with @cb
+ *
+ * Wrapper to __rpmsg_create_ept() to create rpmsg endpoint
+ */
+static struct rpmsg_endpoint
+*vhost_rpmsg_create_ept(struct rpmsg_device *rpdev, rpmsg_rx_cb_t cb, void *priv,
+ struct rpmsg_channel_info chinfo)
+{
+ struct vhost_rpmsg_channel *vch = to_vhost_rpmsg_channel(rpdev);
+
+ return __rpmsg_create_ept(vch->vrp, rpdev, cb, priv, chinfo.src);
+}
+
+/**
+ * __rpmsg_destroy_ept() - destroy an existing rpmsg endpoint
+ * @vrp: virtproc which owns this ept
+ * @ept: endpoing to destroy
+ *
+ * An internal function which destroy an ept without assuming it is
+ * bound to an rpmsg channel. This is needed for handling the internal
+ * name service endpoint, which isn't bound to an rpmsg channel.
+ * See also __rpmsg_create_ept().
+ */
+static void
+__rpmsg_destroy_ept(struct virtproc_info *vrp, struct rpmsg_endpoint *ept)
+{
+ /* make sure new inbound messages can't find this ept anymore */
+ mutex_lock(&vrp->endpoints_lock);
+ idr_remove(&vrp->endpoints, ept->addr);
+ mutex_unlock(&vrp->endpoints_lock);
+
+ /* make sure in-flight inbound messages won't invoke cb anymore */
+ mutex_lock(&ept->cb_lock);
+ ept->cb = NULL;
+ mutex_unlock(&ept->cb_lock);
+
+ kref_put(&ept->refcount, __ept_release);
+}
+
+/**
+ * vhost_rpmsg_destroy_ept() - destroy an existing rpmsg endpoint
+ * @ept: endpoing to destroy
+ *
+ * Wrapper to __rpmsg_destroy_ept() to destroy rpmsg endpoint
+ */
+static void vhost_rpmsg_destroy_ept(struct rpmsg_endpoint *ept)
+{
+ struct vhost_rpmsg_channel *vch = to_vhost_rpmsg_channel(ept->rpdev);
+
+ __rpmsg_destroy_ept(vch->vrp, ept);
+}
+
+/**
+ * vhost_rpmsg_announce_create() - Announce creation of new channel
+ * @rpdev: rpmsg device on which new endpoint channel is created
+ *
+ * Send a message to the remote processor's name service about the
+ * creation of this channel.
+ */
+static int vhost_rpmsg_announce_create(struct rpmsg_device *rpdev)
+{
+ struct vhost_rpmsg_channel *vch = to_vhost_rpmsg_channel(rpdev);
+ struct virtproc_info *vrp = vch->vrp;
+ struct device *dev = &rpdev->dev;
+ int err = 0;
+
+ /* need to tell remote processor's name service about this channel ? */
+ if (rpdev->ept && vhost_has_feature(vrp->vdev, VIRTIO_RPMSG_F_NS)) {
+ struct rpmsg_ns_msg nsm;
+
+ strncpy(nsm.name, rpdev->id.name, RPMSG_NAME_SIZE);
+ nsm.addr = rpdev->ept->addr;
+ nsm.flags = RPMSG_NS_CREATE | RPMSG_AS_ANNOUNCE;
+
+ err = rpmsg_sendto(rpdev->ept, &nsm, sizeof(nsm), RPMSG_NS_ADDR);
+ if (err)
+ dev_err(dev, "failed to announce service %d\n", err);
+ }
+
+ return err;
+}
+
+/**
+ * vhost_rpmsg_announce_destroy() - Announce deletion of channel
+ * @rpdev: rpmsg device on which this endpoint channel is created
+ *
+ * Send a message to the remote processor's name service about the
+ * deletion of this channel.
+ */
+static int vhost_rpmsg_announce_destroy(struct rpmsg_device *rpdev)
+{
+ struct vhost_rpmsg_channel *vch = to_vhost_rpmsg_channel(rpdev);
+ struct virtproc_info *vrp = vch->vrp;
+ struct device *dev = &rpdev->dev;
+ int err = 0;
+
+ /* tell remote processor's name service we're removing this channel */
+ if (rpdev->announce && rpdev->ept &&
+ vhost_has_feature(vrp->vdev, VIRTIO_RPMSG_F_NS)) {
+ struct rpmsg_ns_msg nsm;
+
+ strncpy(nsm.name, rpdev->id.name, RPMSG_NAME_SIZE);
+ nsm.addr = rpdev->ept->addr;
+ nsm.flags = RPMSG_NS_DESTROY;
+
+ err = rpmsg_sendto(rpdev->ept, &nsm, sizeof(nsm), RPMSG_NS_ADDR);
+ if (err)
+ dev_err(dev, "failed to announce service %d\n", err);
+ }
+
+ return err;
+}
+
+static const struct rpmsg_device_ops vhost_rpmsg_ops = {
+ .create_ept = vhost_rpmsg_create_ept,
+ .announce_create = vhost_rpmsg_announce_create,
+ .announce_destroy = vhost_rpmsg_announce_destroy,
+};
+
+/**
+ * vhost_rpmsg_release_device() - Callback to free vhost_rpmsg_channel
+ * @dev: struct device of rpmsg_device
+ *
+ * Invoked from device core after all references to "dev" is removed
+ * to free the wrapper vhost_rpmsg_channel.
+ */
+static void vhost_rpmsg_release_device(struct device *dev)
+{
+ struct rpmsg_device *rpdev = to_rpmsg_device(dev);
+ struct vhost_rpmsg_channel *vch = to_vhost_rpmsg_channel(rpdev);
+
+ kfree(vch);
+}
+
+/**
+ * vhost_rpmsg_create_channel - Create an rpmsg channel
+ * @dev: struct device of vhost_dev
+ * @name: name of the rpmsg channel to be created
+ *
+ * Create an rpmsg channel using its name. Invokes rpmsg_register_device()
+ * only if status is VIRTIO_CONFIG_S_DRIVER_OK or else just adds it to
+ * list of pending rpmsg devices. This is because if the rpmsg client
+ * driver is already loaded when rpmsg is being registered, it'll try
+ * to start accessing virtqueue which will be ready only after VIRTIO
+ * sets status as VIRTIO_CONFIG_S_DRIVER_OK.
+ */
+struct device *vhost_rpmsg_create_channel(struct device *dev, const char *name)
+{
+ struct vhost_rpmsg_channel *vch;
+ struct rpmsg_device *rpdev;
+ struct virtproc_info *vrp;
+ struct vhost_dev *vdev;
+ u8 status;
+ int ret;
+
+ vdev = to_vhost_dev(dev);
+ status = vhost_get_status(vdev);
+ vrp = vhost_get_drvdata(vdev);
+
+ vch = kzalloc(sizeof(*vch), GFP_KERNEL);
+ if (!vch)
+ return ERR_PTR(-ENOMEM);
+
+ /* Link the channel to our vrp */
+ vch->vrp = vrp;
+
+ /* Assign public information to the rpmsg_device */
+ rpdev = &vch->rpdev;
+ rpdev->src = RPMSG_ADDR_ANY;
+ rpdev->dst = RPMSG_ADDR_ANY;
+ rpdev->ops = &vhost_rpmsg_ops;
+
+ rpdev->announce = true;
+
+ strncpy(rpdev->id.name, name, RPMSG_NAME_SIZE);
+
+ rpdev->dev.parent = &vrp->vdev->dev;
+ rpdev->dev.release = vhost_rpmsg_release_device;
+ if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+ mutex_lock(&vrp->list_lock);
+ list_add_tail(&rpdev->list, &vrp->list);
+ mutex_unlock(&vrp->list_lock);
+ } else {
+ ret = rpmsg_register_device(rpdev);
+ if (ret)
+ return ERR_PTR(-EINVAL);
+ }
+
+ return &rpdev->dev;
+}
+EXPORT_SYMBOL_GPL(vhost_rpmsg_create_channel);
+
+/**
+ * vhost_rpmsg_delete_channel - Delete an rpmsg channel
+ * @dev: struct device of rpmsg_device
+ *
+ * Delete channel created using vhost_rpmsg_create_channel()
+ */
+void vhost_rpmsg_delete_channel(struct device *dev)
+{
+ struct rpmsg_device *rpdev = to_rpmsg_device(dev);
+ struct vhost_rpmsg_channel *vch;
+ struct virtproc_info *vrp;
+ struct vhost_dev *vdev;
+ u8 status;
+
+ vch = to_vhost_rpmsg_channel(rpdev);
+ vrp = vch->vrp;
+ vdev = vrp->vdev;
+ status = vhost_get_status(vdev);
+
+ if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+ mutex_lock(&vrp->list_lock);
+ list_del(&rpdev->list);
+ mutex_unlock(&vrp->list_lock);
+ kfree(vch);
+ } else {
+ device_unregister(dev);
+ }
+}
+EXPORT_SYMBOL_GPL(vhost_rpmsg_delete_channel);
+
+static const struct rpmsg_virtproc_ops vhost_rpmsg_virtproc_ops = {
+ .create_channel = vhost_rpmsg_create_channel,
+ .delete_channel = vhost_rpmsg_delete_channel,
+};
+
+/**
+ * rpmsg_upref_sleepers() - enable "tx-complete" interrupts, if needed
+ * @vrp: virtual remote processor state
+ *
+ * This function is called before a sender is blocked, waiting for
+ * a tx buffer to become available.
+ *
+ * If we already have blocking senders, this function merely increases
+ * the "sleepers" reference count, and exits.
+ *
+ * Otherwise, if this is the first sender to block, we also enable
+ * virtio's tx callbacks, so we'd be immediately notified when a tx
+ * buffer is consumed (we rely on virtio's tx callback in order
+ * to wake up sleeping senders as soon as a tx buffer is used by the
+ * remote processor).
+ */
+static void rpmsg_upref_sleepers(struct virtproc_info *vrp)
+{
+ /* support multiple concurrent senders */
+ mutex_lock(&vrp->tx_lock);
+
+ /* are we the first sleeping context waiting for tx buffers ? */
+ if (atomic_inc_return(&vrp->sleepers) == 1)
+ /* enable "tx-complete" interrupts before dozing off */
+ vhost_virtqueue_enable_cb(vrp->svq);
+
+ mutex_unlock(&vrp->tx_lock);
+}
+
+/**
+ * rpmsg_downref_sleepers() - disable "tx-complete" interrupts, if needed
+ * @vrp: virtual remote processor state
+ *
+ * This function is called after a sender, that waited for a tx buffer
+ * to become available, is unblocked.
+ *
+ * If we still have blocking senders, this function merely decreases
+ * the "sleepers" reference count, and exits.
+ *
+ * Otherwise, if there are no more blocking senders, we also disable
+ * virtio's tx callbacks, to avoid the overhead incurred with handling
+ * those (now redundant) interrupts.
+ */
+static void rpmsg_downref_sleepers(struct virtproc_info *vrp)
+{
+ /* support multiple concurrent senders */
+ mutex_lock(&vrp->tx_lock);
+
+ /* are we the last sleeping context waiting for tx buffers ? */
+ if (atomic_dec_and_test(&vrp->sleepers))
+ /* disable "tx-complete" interrupts */
+ vhost_virtqueue_disable_cb(vrp->svq);
+
+ mutex_unlock(&vrp->tx_lock);
+}
+
+/**
+ * rpmsg_send_offchannel_raw() - send a message across to the remote processor
+ * @rpdev: the rpmsg channel
+ * @src: source address
+ * @dst: destination address
+ * @data: payload of message
+ * @len: length of payload
+ * @wait: indicates whether caller should block in case no TX buffers available
+ *
+ * This function is the base implementation for all of the rpmsg sending API.
+ *
+ * It will send @data of length @len to @dst, and say it's from @src. The
+ * message will be sent to the remote processor which the @rpdev channel
+ * belongs to.
+ *
+ * The message is sent using one of the TX buffers that are available for
+ * communication with this remote processor.
+ *
+ * If @wait is true, the caller will be blocked until either a TX buffer is
+ * available, or 15 seconds elapses (we don't want callers to
+ * sleep indefinitely due to misbehaving remote processors), and in that
+ * case -ERESTARTSYS is returned. The number '15' itself was picked
+ * arbitrarily; there's little point in asking drivers to provide a timeout
+ * value themselves.
+ *
+ * Otherwise, if @wait is false, and there are no TX buffers available,
+ * the function will immediately fail, and -ENOMEM will be returned.
+ *
+ * Normally drivers shouldn't use this function directly; instead, drivers
+ * should use the appropriate rpmsg_{try}send{to, _offchannel} API
+ * (see include/linux/rpmsg.h).
+ *
+ * Returns 0 on success and an appropriate error value on failure.
+ */
+static int rpmsg_send_offchannel_raw(struct rpmsg_device *rpdev,
+ u32 src, u32 dst,
+ void *data, int len, bool wait)
+{
+ struct vhost_rpmsg_channel *vch = to_vhost_rpmsg_channel(rpdev);
+ struct virtproc_info *vrp = vch->vrp;
+ struct vhost_virtqueue *svq = vrp->svq;
+ struct vhost_dev *vdev = svq->dev;
+ struct device *dev = &rpdev->dev;
+ struct rpmsg_hdr msg;
+ int length;
+ u16 head;
+ u64 base;
+ int err;
+
+ /*
+ * We currently use fixed-sized buffers, and therefore the payload
+ * length is limited.
+ *
+ * One of the possible improvements here is either to support
+ * user-provided buffers (and then we can also support zero-copy
+ * messaging), or to improve the buffer allocator, to support
+ * variable-length buffer sizes.
+ */
+ if (len > vrp->buf_size - sizeof(struct rpmsg_hdr)) {
+ dev_err(dev, "message is too big (%d)\n", len);
+ return -EMSGSIZE;
+ }
+
+ mutex_lock(&vrp->tx_lock);
+ /* grab a buffer */
+ base = vhost_virtqueue_get_outbuf(svq, &head, &length);
+ if (!base && !wait) {
+ dev_err(dev, "Failed to get buffer for OUT transfers\n");
+ err = -ENOMEM;
+ goto out;
+ }
+
+ /* no free buffer ? wait for one (but bail after 15 seconds) */
+ while (!base) {
+ /* enable "tx-complete" interrupts, if not already enabled */
+ rpmsg_upref_sleepers(vrp);
+
+ /*
+ * sleep until a free buffer is available or 15 secs elapse.
+ * the timeout period is not configurable because there's
+ * little point in asking drivers to specify that.
+ * if later this happens to be required, it'd be easy to add.
+ */
+ err = wait_event_interruptible_timeout
+ (vrp->sendq, (base =
+ vhost_virtqueue_get_outbuf(svq, &head,
+ &length)),
+ msecs_to_jiffies(15000));
+
+ /* disable "tx-complete" interrupts if we're the last sleeper */
+ rpmsg_downref_sleepers(vrp);
+
+ /* timeout ? */
+ if (!err) {
+ dev_err(dev, "timeout waiting for a tx buffer\n");
+ err = -ERESTARTSYS;
+ goto out;
+ }
+ }
+
+ msg.len = len;
+ msg.flags = 0;
+ msg.src = src;
+ msg.dst = dst;
+ msg.reserved = 0;
+ /*
+ * Perform two writes, one for rpmsg header and other for actual buffer
+ * data, instead of squashing the data into one buffer and then send
+ * them to the vhost layer.
+ */
+ err = vhost_write(vdev, base, &msg, sizeof(struct rpmsg_hdr));
+ if (err) {
+ dev_err(dev, "Failed to write rpmsg header to remote buffer\n");
+ goto out;
+ }
+
+ err = vhost_write(vdev, base + sizeof(struct rpmsg_hdr), data, len);
+ if (err) {
+ dev_err(dev, "Failed to write buffer data to remote buffer\n");
+ goto out;
+ }
+
+ dev_dbg(dev, "TX From 0x%x, To 0x%x, Len %d, Flags %d, Reserved %d\n",
+ msg.src, msg.dst, msg.len, msg.flags, msg.reserved);
+#if defined(CONFIG_DYNAMIC_DEBUG)
+ dynamic_hex_dump("rpmsg_virtio TX: ", DUMP_PREFIX_NONE, 16, 1,
+ &msg, sizeof(msg) + msg.len, true);
+#endif
+
+ vhost_virtqueue_put_buf(svq, head, len + sizeof(struct rpmsg_hdr));
+
+ /* tell the remote processor it has a pending message to read */
+ vhost_virtqueue_kick(vrp->svq);
+
+out:
+ mutex_unlock(&vrp->tx_lock);
+
+ return err;
+}
+
+static int vhost_rpmsg_send(struct rpmsg_endpoint *ept, void *data, int len)
+{
+ struct rpmsg_device *rpdev = ept->rpdev;
+ u32 src = ept->addr, dst = rpdev->dst;
+
+ return rpmsg_send_offchannel_raw(rpdev, src, dst, data, len, true);
+}
+
+static int vhost_rpmsg_sendto(struct rpmsg_endpoint *ept, void *data, int len,
+ u32 dst)
+{
+ struct rpmsg_device *rpdev = ept->rpdev;
+ u32 src = ept->addr;
+
+ return rpmsg_send_offchannel_raw(rpdev, src, dst, data, len, true);
+}
+
+static int vhost_rpmsg_send_offchannel(struct rpmsg_endpoint *ept, u32 src,
+ u32 dst, void *data, int len)
+{
+ struct rpmsg_device *rpdev = ept->rpdev;
+
+ return rpmsg_send_offchannel_raw(rpdev, src, dst, data, len, true);
+}
+
+static int vhost_rpmsg_trysend(struct rpmsg_endpoint *ept, void *data, int len)
+{
+ struct rpmsg_device *rpdev = ept->rpdev;
+ u32 src = ept->addr, dst = rpdev->dst;
+
+ return rpmsg_send_offchannel_raw(rpdev, src, dst, data, len, false);
+}
+
+static int vhost_rpmsg_trysendto(struct rpmsg_endpoint *ept, void *data,
+ int len, u32 dst)
+{
+ struct rpmsg_device *rpdev = ept->rpdev;
+ u32 src = ept->addr;
+
+ return rpmsg_send_offchannel_raw(rpdev, src, dst, data, len, false);
+}
+
+static int vhost_rpmsg_trysend_offchannel(struct rpmsg_endpoint *ept, u32 src,
+ u32 dst, void *data, int len)
+{
+ struct rpmsg_device *rpdev = ept->rpdev;
+
+ return rpmsg_send_offchannel_raw(rpdev, src, dst, data, len, false);
+}
+
+/**
+ * rpmsg_recv_single - Invoked when a buffer is received from remote VIRTIO dev
+ * @vrp: virtual remote processor of the vhost device which has received a msg
+ * @dev: struct device of vhost_dev
+ * @msg: pointer to the rpmsg_hdr
+ * @len: length of the received buffer
+ *
+ * Invoked when a buffer is received from remote VIRTIO device. It gets the
+ * destination address from rpmsg_hdr and invokes the callback of the endpoint
+ * corresponding to the address
+ */
+static int rpmsg_recv_single(struct virtproc_info *vrp, struct device *dev,
+ struct rpmsg_hdr *msg, unsigned int len)
+{
+ struct rpmsg_endpoint *ept;
+
+ dev_dbg(dev, "From: 0x%x, To: 0x%x, Len: %d, Flags: %d, Reserved: %d\n",
+ msg->src, msg->dst, msg->len, msg->flags, msg->reserved);
+#if defined(CONFIG_DYNAMIC_DEBUG)
+ dynamic_hex_dump("rpmsg_virtio RX: ", DUMP_PREFIX_NONE, 16, 1,
+ msg, sizeof(*msg) + msg->len, true);
+#endif
+
+ /*
+ * We currently use fixed-sized buffers, so trivially sanitize
+ * the reported payload length.
+ */
+ if (len > vrp->buf_size ||
+ msg->len > (len - sizeof(struct rpmsg_hdr))) {
+ dev_warn(dev, "inbound msg too big: (%d, %d)\n", len, msg->len);
+ return -EINVAL;
+ }
+
+ /* use the dst addr to fetch the callback of the appropriate user */
+ mutex_lock(&vrp->endpoints_lock);
+
+ ept = idr_find(&vrp->endpoints, msg->dst);
+
+ /* let's make sure no one deallocates ept while we use it */
+ if (ept)
+ kref_get(&ept->refcount);
+
+ mutex_unlock(&vrp->endpoints_lock);
+
+ if (ept) {
+ /* make sure ept->cb doesn't go away while we use it */
+ mutex_lock(&ept->cb_lock);
+
+ if (ept->cb)
+ ept->cb(ept->rpdev, msg->data, msg->len, ept->priv,
+ msg->src);
+
+ mutex_unlock(&ept->cb_lock);
+
+ /* farewell, ept, we don't need you anymore */
+ kref_put(&ept->refcount, __ept_release);
+ } else {
+ dev_warn(dev, "msg received with no recipient\n");
+ }
+
+ return 0;
+}
+
+/**
+ * vhost_rpmsg_recv_done - Callback of the receive virtqueue
+ * @rvq: Receive virtqueue
+ *
+ * Invoked when the remote VIRTIO device sends a notification on the receive
+ * virtqueue. It gets base address of the input buffer and repeatedly calls
+ * rpmsg_recv_single() until no more buffers are left to be read.
+ */
+static void vhost_rpmsg_recv_done(struct vhost_virtqueue *rvq)
+{
+ struct vhost_dev *vdev = rvq->dev;
+ struct virtproc_info *vrp = vhost_get_drvdata(vdev);
+ unsigned int len, msgs_received = 0;
+ struct device *dev = &vdev->dev;
+ struct rpmsg_hdr *msg;
+ u64 base;
+ u16 head;
+ int err;
+
+ base = vhost_virtqueue_get_inbuf(rvq, &head, &len);
+ if (!base) {
+ dev_err(dev, "uhm, incoming signal, but no used buffer ?\n");
+ return;
+ }
+
+ vhost_virtqueue_disable_cb(rvq);
+ while (base) {
+ msg = kzalloc(len, GFP_KERNEL);
+ if (!msg)
+ return;
+
+ vhost_read(rvq->dev, msg, base, len);
+
+ err = rpmsg_recv_single(vrp, dev, msg, len);
+ if (err)
+ break;
+
+ kfree(msg);
+ vhost_virtqueue_put_buf(rvq, head, len);
+ msgs_received++;
+
+ base = vhost_virtqueue_get_inbuf(rvq, &head, &len);
+ }
+ vhost_virtqueue_enable_cb(rvq);
+
+ dev_dbg(dev, "Received %u messages\n", msgs_received);
+
+ /* tell the remote processor we added another available rx buffer */
+ if (msgs_received)
+ vhost_virtqueue_kick(vrp->rvq);
+}
+
+/**
+ * vhost_rpmsg_xmit_done - Callback of the receive virtqueue
+ * @svq: Send virtqueue
+ *
+ * This is invoked whenever the remote processor completed processing
+ * a TX msg we just sent it, and the buffer is put back to the used ring.
+ *
+ * Normally, though, we suppress this "tx complete" interrupt in order to
+ * avoid the incurred overhead.
+ */
+static void vhost_rpmsg_xmit_done(struct vhost_virtqueue *svq)
+{
+ struct vhost_dev *vdev = svq->dev;
+ struct virtproc_info *vrp = vhost_get_drvdata(vdev);
+ struct device *dev = &vdev->dev;
+
+ dev_dbg(dev, "%s\n", __func__);
+
+ /* wake up potential senders that are waiting for a tx buffer */
+ wake_up_interruptible(&vrp->sendq);
+}
+
+/**
+ * vhost_rpmsg_as_cb - Callback of address service announcement
+ * @data: rpmsg_as_msg sent by remote VIRTIO device
+ * @len: length of the received message
+ * @priv: private data for the driver's use
+ * @src: source address of the remote VIRTIO device that sent the AS
+ * announcement
+ *
+ * Invoked when a address service announcement arrives to assign the
+ * destination address of the rpmsg device.
+ */
+static int vhost_rpmsg_as_cb(struct rpmsg_device *rpdev, void *data, int len,
+ void *priv, u32 hdr_src)
+{
+ struct virtproc_info *vrp = priv;
+ struct device *dev = &vrp->vdev->dev;
+ struct rpmsg_channel_info chinfo;
+ struct rpmsg_as_msg *msg = data;
+ struct rpmsg_device *rpmsg_dev;
+ struct device *rdev;
+ int ret = 0;
+ u32 flags;
+ u32 src;
+ u32 dst;
+
+#if defined(CONFIG_DYNAMIC_DEBUG)
+ dynamic_hex_dump("AS announcement: ", DUMP_PREFIX_NONE, 16, 1,
+ data, len, true);
+#endif
+
+ if (len == sizeof(*msg)) {
+ src = msg->src;
+ dst = msg->dst;
+ flags = msg->flags;
+ } else {
+ dev_err(dev, "malformed AS msg (%d)\n", len);
+ return -EINVAL;
+ }
+
+ /*
+ * the name service ept does _not_ belong to a real rpmsg channel,
+ * and is handled by the rpmsg bus itself.
+ * for sanity reasons, make sure a valid rpdev has _not_ sneaked
+ * in somehow.
+ */
+ if (rpdev) {
+ dev_err(dev, "anomaly: ns ept has an rpdev handle\n");
+ return -EINVAL;
+ }
+
+ /* don't trust the remote processor for null terminating the name */
+ msg->name[RPMSG_NAME_SIZE - 1] = '\0';
+
+ dev_info(dev, "%sing dst addr 0x%x to channel %s src 0x%x\n",
+ flags & RPMSG_AS_ASSIGN ? "Assign" : "Free",
+ dst, msg->name, src);
+
+ strncpy(chinfo.name, msg->name, sizeof(chinfo.name));
+ chinfo.src = src;
+ chinfo.dst = RPMSG_ADDR_ANY;
+
+ /* Find a similar channel */
+ rdev = rpmsg_find_device(dev, &chinfo);
+ if (!rdev) {
+ ret = -ENODEV;
+ goto err_find_device;
+ }
+
+ rpmsg_dev = to_rpmsg_device(rdev);
+ if (flags & RPMSG_AS_ASSIGN) {
+ if (rpmsg_dev->dst != RPMSG_ADDR_ANY) {
+ dev_err(dev, "Address bound to channel %s src 0x%x\n",
+ msg->name, src);
+ ret = -EBUSY;
+ goto err_find_device;
+ }
+ rpmsg_dev->dst = dst;
+ } else {
+ rpmsg_dev->dst = RPMSG_ADDR_ANY;
+ }
+
+err_find_device:
+ put_device(rdev);
+
+ return ret;
+}
+
+/**
+ * vhost_rpmsg_finalize_feature - Perform initializations for negotiated
+ * features
+ * @vrp: virtual remote processor of the vhost device where the feature has been
+ * negotiated
+ *
+ * Invoked when features negotiation between VHOST and VIRTIO device is
+ * completed.
+ */
+static int vhost_rpmsg_finalize_feature(struct virtproc_info *vrp)
+{
+ struct vhost_dev *vdev = vrp->vdev;
+
+ /* if supported by the remote processor, enable the address service */
+ if (vhost_has_feature(vdev, VIRTIO_RPMSG_F_AS)) {
+ /* a dedicated endpoint handles the name service msgs */
+ vrp->as_ept = __rpmsg_create_ept(vrp, NULL, vhost_rpmsg_as_cb,
+ vrp, RPMSG_AS_ADDR);
+ if (!vrp->as_ept) {
+ dev_err(&vdev->dev, "failed to create the as ept\n");
+ return -ENOMEM;
+ }
+ } else {
+ dev_err(&vdev->dev, "Address Service not supported\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/**
+ * vhost_rpmsg_set_status - Perform initialization when remote VIRTIO device
+ * updates status
+ * @vrp: virtual remote processor of the vhost device whose status has been
+ * updated
+ *
+ * Invoked when the remote VIRTIO device updates status. If status is set
+ * as VIRTIO_CONFIG_S_DRIVER_OK, invoke rpmsg_register_device() for every
+ * un-registered rpmsg device.
+ */
+static int vhost_rpmsg_set_status(struct virtproc_info *vrp)
+{
+ struct vhost_dev *vdev = vrp->vdev;
+ struct rpmsg_device *rpdev;
+ u8 status;
+ int ret;
+
+ status = vhost_get_status(vdev);
+
+ if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+ mutex_lock(&vrp->list_lock);
+ list_for_each_entry(rpdev, &vrp->list, list) {
+ ret = rpmsg_register_device(rpdev);
+ if (ret) {
+ mutex_unlock(&vrp->list_lock);
+ return -EINVAL;
+ }
+ }
+ list_del(&vrp->list);
+ mutex_unlock(&vrp->list_lock);
+ }
+
+ return 0;
+}
+
+/**
+ * vhost_rpmsg_notifier - Notifier to notify updates from remote VIRTIO device
+ * @nb: notifier block associated with this virtual remote processor
+ * @notify_reason: Indicate the updates (finalize feature or set status) by
+ * remote host
+ * @data: un-used here
+ *
+ * Invoked when the remote VIRTIO device updates status or finalize features.
+ */
+static int vhost_rpmsg_notifier(struct notifier_block *nb, unsigned long notify_reason,
+ void *data)
+{
+ struct virtproc_info *vrp = container_of(nb, struct virtproc_info, nb);
+ struct vhost_dev *vdev = vrp->vdev;
+ int ret;
+
+ switch (notify_reason) {
+ case NOTIFY_FINALIZE_FEATURES:
+ ret = vhost_rpmsg_finalize_feature(vrp);
+ if (ret)
+ dev_err(&vdev->dev, "failed to finalize features\n");
+ break;
+ case NOTIFY_SET_STATUS:
+ ret = vhost_rpmsg_set_status(vrp);
+ if (ret)
+ dev_err(&vdev->dev, "failed to set status\n");
+ break;
+ default:
+ dev_err(&vdev->dev, "Unsupported notification\n");
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static unsigned int vhost_rpmsg_features[] = {
+ VIRTIO_RPMSG_F_AS,
+ VIRTIO_RPMSG_F_NS,
+};
+
+/**
+ * vhost_rpmsg_set_features - Sets supported features on the VHOST device
+ *
+ * Build supported features from the feature table and invoke
+ * vhost_set_features() to set the supported features on the VHOST device
+ */
+static int vhost_rpmsg_set_features(struct vhost_dev *vdev)
+{
+ unsigned int feature_table_size;
+ unsigned int feature;
+ u64 device_features;
+ int ret, i;
+
+ feature_table_size = ARRAY_SIZE(vhost_rpmsg_features);
+ for (i = 0; i < feature_table_size; i++) {
+ feature = vhost_rpmsg_features[i];
+ WARN_ON(feature >= 64);
+ device_features |= (1ULL << feature);
+ }
+
+ ret = vhost_set_features(vdev, device_features);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/**
+ * vhost_rpmsg_probe - Create virtual remote processor for the VHOST device
+ * @vdev - VHOST device with vendor ID and device ID supported by this driver
+ *
+ * Invoked when VHOST device is registered with vendor ID and device ID
+ * supported by this driver. Creates and initializes the virtual remote
+ * processor for the VHOST device
+ */
+static int vhost_rpmsg_probe(struct vhost_dev *vdev)
+{
+ vhost_vq_callback_t *vq_cbs[] = { vhost_rpmsg_xmit_done, vhost_rpmsg_recv_done };
+ static const char * const names[] = { "output", "input" };
+ struct device *dev = &vdev->dev;
+ struct vhost_virtqueue *vqs[2];
+ struct config_group *group;
+ struct virtproc_info *vrp;
+ int err;
+
+ vrp = devm_kzalloc(dev, sizeof(*vrp), GFP_KERNEL);
+ if (!vrp)
+ return -ENOMEM;
+
+ vrp->vdev = vdev;
+
+ idr_init(&vrp->endpoints);
+ mutex_init(&vrp->endpoints_lock);
+ mutex_init(&vrp->tx_lock);
+ mutex_init(&vrp->rx_lock);
+ mutex_init(&vrp->list_lock);
+ init_waitqueue_head(&vrp->sendq);
+
+ err = vhost_rpmsg_set_features(vdev);
+ if (err) {
+ dev_err(dev, "Failed to set features\n");
+ return err;
+ }
+
+ /* We expect two vhost_virtqueues, tx and rx (and in this order) */
+ err = vhost_create_vqs(vdev, 2, MAX_RPMSG_NUM_BUFS / 2, vqs, vq_cbs,
+ names);
+ if (err) {
+ dev_err(dev, "Failed to create virtqueues\n");
+ return err;
+ }
+
+ vrp->svq = vqs[0];
+ vrp->rvq = vqs[1];
+
+ vrp->buf_size = MAX_RPMSG_BUF_SIZE;
+
+ vhost_set_drvdata(vdev, vrp);
+
+ vrp->nb.notifier_call = vhost_rpmsg_notifier;
+ vhost_register_notifier(vdev, &vrp->nb);
+ INIT_LIST_HEAD(&vrp->list);
+
+ group = rpmsg_cfs_add_virtproc_group(dev,
+ &vhost_rpmsg_virtproc_ops);
+ if (IS_ERR(group)) {
+ err = PTR_ERR(group);
+ goto err;
+ }
+
+ dev_info(&vdev->dev, "vhost rpmsg host is online\n");
+
+ return 0;
+
+err:
+ vhost_del_vqs(vdev);
+
+ return err;
+}
+
+static int vhost_rpmsg_remove_device(struct device *dev, void *data)
+{
+ device_unregister(dev);
+
+ return 0;
+}
+
+static int vhost_rpmsg_remove(struct vhost_dev *vdev)
+{
+ struct virtproc_info *vrp = vhost_get_drvdata(vdev);
+ int ret;
+
+ ret = device_for_each_child(&vdev->dev, NULL, vhost_rpmsg_remove_device);
+ if (ret)
+ dev_warn(&vdev->dev, "can't remove rpmsg device: %d\n", ret);
+
+ if (vrp->as_ept)
+ __rpmsg_destroy_ept(vrp, vrp->as_ept);
+
+ idr_destroy(&vrp->endpoints);
+
+ vhost_del_vqs(vdev);
+
+ kfree(vrp);
+ return 0;
+}
+
+static struct vhost_device_id vhost_rpmsg_id_table[] = {
+ { VIRTIO_ID_RPMSG, VIRTIO_DEV_ANY_ID },
+ { 0 },
+};
+
+static struct vhost_driver vhost_rpmsg_driver = {
+ .driver.name = KBUILD_MODNAME,
+ .driver.owner = THIS_MODULE,
+ .id_table = vhost_rpmsg_id_table,
+ .probe = vhost_rpmsg_probe,
+ .remove = vhost_rpmsg_remove,
+};
+
+static int __init vhost_rpmsg_init(void)
+{
+ int ret;
+
+ ret = vhost_register_driver(&vhost_rpmsg_driver);
+ if (ret)
+ pr_err("Failed to register vhost rpmsg driver: %d\n", ret);
+
+ return ret;
+}
+module_init(vhost_rpmsg_init);
+
+static void __exit vhost_rpmsg_exit(void)
+{
+ vhost_unregister_driver(&vhost_rpmsg_driver);
+}
+module_exit(vhost_rpmsg_exit);
+
+MODULE_DEVICE_TABLE(vhost, vhost_rpmsg_id_table);
+MODULE_DESCRIPTION("Vhost-based remote processor messaging bus");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h
index b9d9283b46ac..5d60d13efaf4 100644
--- a/include/linux/rpmsg.h
+++ b/include/linux/rpmsg.h
@@ -55,6 +55,7 @@ struct rpmsg_device {
u32 dst;
struct rpmsg_endpoint *ept;
bool announce;
+ struct list_head list;

const struct rpmsg_device_ops *ops;
};
--
2.17.1

2020-07-02 08:25:03

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 07/22] virtio_pci: Use request_threaded_irq() instead of request_irq()

Some of the virtio drivers (like virtio_rpmsg_bus.c) use sleeping
functions like mutex_*() in the virtqueue callback. Use
request_threaded_irq() instead of request_irq() in order for
the virtqueue callbacks to be executed in thread context instead
of interrupt context.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/virtio/virtio_pci_common.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 222d630c41fc..60998b4f1f30 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -140,9 +140,9 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
v = vp_dev->msix_used_vectors;
snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
"%s-config", name);
- err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
- vp_config_changed, 0, vp_dev->msix_names[v],
- vp_dev);
+ err = request_threaded_irq(pci_irq_vector(vp_dev->pci_dev, v), 0,
+ vp_config_changed, 0, vp_dev->msix_names[v],
+ vp_dev);
if (err)
goto error;
++vp_dev->msix_used_vectors;
@@ -159,9 +159,9 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
v = vp_dev->msix_used_vectors;
snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
"%s-virtqueues", name);
- err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
- vp_vring_interrupt, 0, vp_dev->msix_names[v],
- vp_dev);
+ err = request_threaded_irq(pci_irq_vector(vp_dev->pci_dev, v),
+ 0, vp_vring_interrupt, 0,
+ vp_dev->msix_names[v], vp_dev);
if (err)
goto error;
++vp_dev->msix_used_vectors;
@@ -336,10 +336,11 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
sizeof *vp_dev->msix_names,
"%s-%s",
dev_name(&vp_dev->vdev.dev), names[i]);
- err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
- vring_interrupt, 0,
- vp_dev->msix_names[msix_vec],
- vqs[i]);
+ err = request_threaded_irq(pci_irq_vector(vp_dev->pci_dev,
+ msix_vec),
+ 0, vring_interrupt, 0,
+ vp_dev->msix_names[msix_vec],
+ vqs[i]);
if (err)
goto error_find;
}
@@ -361,8 +362,8 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned nvqs,
if (!vp_dev->vqs)
return -ENOMEM;

- err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
- dev_name(&vdev->dev), vp_dev);
+ err = request_threaded_irq(vp_dev->pci_dev->irq, 0, vp_interrupt,
+ IRQF_SHARED, dev_name(&vdev->dev), vp_dev);
if (err)
goto out_del_vqs;

--
2.17.1

2020-07-02 08:25:20

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 19/22] PCI: endpoint: Add EP function driver to provide VHOST interface

Add a new endpoint function driver to register VHOST device and
provide interface for the VHOST driver to access virtqueues
created by the remote host (using VIRTIO).

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/pci/endpoint/functions/Kconfig | 11 +
drivers/pci/endpoint/functions/Makefile | 1 +
.../pci/endpoint/functions/pci-epf-vhost.c | 1144 +++++++++++++++++
drivers/vhost/vhost_cfs.c | 13 -
include/linux/vhost.h | 14 +
5 files changed, 1170 insertions(+), 13 deletions(-)
create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c

diff --git a/drivers/pci/endpoint/functions/Kconfig b/drivers/pci/endpoint/functions/Kconfig
index 55ac7bb2d469..21830576e1f4 100644
--- a/drivers/pci/endpoint/functions/Kconfig
+++ b/drivers/pci/endpoint/functions/Kconfig
@@ -24,3 +24,14 @@ config PCI_EPF_NTB
device tree.

If in doubt, say "N" to disable Endpoint NTB driver.
+
+config PCI_EPF_VHOST
+ tristate "PCI Endpoint VHOST driver"
+ depends on PCI_ENDPOINT
+ help
+ Select this configuration option to enable the VHOST driver
+ for PCI Endpoint. EPF VHOST driver implements VIRTIO backend
+ for EPF and uses the VHOST framework to bind any VHOST driver
+ to the VHOST device created by this driver.
+
+ If in doubt, say "N" to disable Endpoint VHOST driver.
diff --git a/drivers/pci/endpoint/functions/Makefile b/drivers/pci/endpoint/functions/Makefile
index 96ab932a537a..39d4f9daf63a 100644
--- a/drivers/pci/endpoint/functions/Makefile
+++ b/drivers/pci/endpoint/functions/Makefile
@@ -5,3 +5,4 @@

obj-$(CONFIG_PCI_EPF_TEST) += pci-epf-test.o
obj-$(CONFIG_PCI_EPF_NTB) += pci-epf-ntb.o
+obj-$(CONFIG_PCI_EPF_VHOST) += pci-epf-vhost.o
diff --git a/drivers/pci/endpoint/functions/pci-epf-vhost.c b/drivers/pci/endpoint/functions/pci-epf-vhost.c
new file mode 100644
index 000000000000..d090e5e88575
--- /dev/null
+++ b/drivers/pci/endpoint/functions/pci-epf-vhost.c
@@ -0,0 +1,1144 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Endpoint Function Driver to implement VHOST functionality
+ *
+ * Copyright (C) 2020 Texas Instruments
+ * Author: Kishon Vijay Abraham I <[email protected]>
+ */
+
+#include <linux/delay.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vhost.h>
+#include <linux/vringh.h>
+
+#include <linux/pci-epc.h>
+#include <linux/pci-epf.h>
+
+#include <uapi/linux/virtio_pci.h>
+
+#define MAX_VQS 8
+
+#define VHOST_QUEUE_STATUS_ENABLE BIT(0)
+
+#define VHOST_DEVICE_CONFIG_SIZE 1024
+#define EPF_VHOST_MAX_INTERRUPTS (MAX_VQS + 1)
+
+static struct workqueue_struct *kpcivhost_workqueue;
+
+struct epf_vhost_queue {
+ struct delayed_work cmd_handler;
+ struct vhost_virtqueue *vq;
+ struct epf_vhost *vhost;
+ phys_addr_t phys_addr;
+ void __iomem *addr;
+ unsigned int size;
+};
+
+struct epf_vhost {
+ const struct pci_epc_features *epc_features;
+ struct epf_vhost_queue vqueue[MAX_VQS];
+ struct delayed_work cmd_handler;
+ struct delayed_work cfs_work;
+ struct epf_vhost_reg *reg;
+ struct config_group group;
+ size_t msix_table_offset;
+ struct vhost_dev vdev;
+ struct pci_epf *epf;
+ struct vring vring;
+ int msix_bar;
+};
+
+static inline struct epf_vhost *to_epf_vhost_from_ci(struct config_item *item)
+{
+ return container_of(to_config_group(item), struct epf_vhost, group);
+}
+
+#define to_epf_vhost(v) container_of((v), struct epf_vhost, vdev)
+
+struct epf_vhost_reg_queue {
+ u8 cmd;
+ u8 cmd_status;
+ u16 status;
+ u16 num_buffers;
+ u16 msix_vector;
+ u64 queue_addr;
+} __packed;
+
+enum queue_cmd {
+ QUEUE_CMD_NONE,
+ QUEUE_CMD_ACTIVATE,
+ QUEUE_CMD_DEACTIVATE,
+ QUEUE_CMD_NOTIFY,
+};
+
+enum queue_cmd_status {
+ QUEUE_CMD_STATUS_NONE,
+ QUEUE_CMD_STATUS_OKAY,
+ QUEUE_CMD_STATUS_ERROR,
+};
+
+struct epf_vhost_reg {
+ u64 host_features;
+ u64 guest_features;
+ u16 msix_config;
+ u16 num_queues;
+ u8 device_status;
+ u8 config_generation;
+ u32 isr;
+ u8 cmd;
+ u8 cmd_status;
+ struct epf_vhost_reg_queue vq[MAX_VQS];
+} __packed;
+
+enum host_cmd {
+ HOST_CMD_NONE,
+ HOST_CMD_SET_STATUS,
+ HOST_CMD_FINALIZE_FEATURES,
+ HOST_CMD_RESET,
+};
+
+enum host_cmd_status {
+ HOST_CMD_STATUS_NONE,
+ HOST_CMD_STATUS_OKAY,
+ HOST_CMD_STATUS_ERROR,
+};
+
+static struct pci_epf_header epf_vhost_header = {
+ .vendorid = PCI_ANY_ID,
+ .deviceid = PCI_ANY_ID,
+ .baseclass_code = PCI_CLASS_OTHERS,
+ .interrupt_pin = PCI_INTERRUPT_INTA,
+};
+
+/* pci_epf_vhost_cmd_handler - Handle commands from remote EPF virtio driver
+ * @work: The work_struct holding the pci_epf_vhost_cmd_handler() function that
+ * is scheduled
+ *
+ * Handle commands from the remote EPF virtio driver and sends notification to
+ * the vhost client driver. The remote EPF virtio driver sends commands when the
+ * virtio driver status is updated or when the feature negotiation is complete or
+ * if the virtio driver wants to reset the device.
+ */
+static void pci_epf_vhost_cmd_handler(struct work_struct *work)
+{
+ struct epf_vhost_reg *reg;
+ struct epf_vhost *vhost;
+ struct vhost_dev *vdev;
+ struct device *dev;
+ u8 command;
+
+ vhost = container_of(work, struct epf_vhost, cmd_handler.work);
+ vdev = &vhost->vdev;
+ dev = &vhost->epf->dev;
+ reg = vhost->reg;
+
+ command = reg->cmd;
+ if (!command)
+ goto reset_handler;
+
+ reg->cmd = 0;
+
+ switch (command) {
+ case HOST_CMD_SET_STATUS:
+ blocking_notifier_call_chain(&vdev->notifier, NOTIFY_SET_STATUS,
+ NULL);
+ reg->cmd_status = HOST_CMD_STATUS_OKAY;
+ break;
+ case HOST_CMD_FINALIZE_FEATURES:
+ vdev->features = reg->guest_features;
+ blocking_notifier_call_chain(&vdev->notifier,
+ NOTIFY_FINALIZE_FEATURES, 0);
+ reg->cmd_status = HOST_CMD_STATUS_OKAY;
+ break;
+ case HOST_CMD_RESET:
+ blocking_notifier_call_chain(&vdev->notifier, NOTIFY_RESET, 0);
+ reg->cmd_status = HOST_CMD_STATUS_OKAY;
+ break;
+ default:
+ dev_err(dev, "UNKNOWN command: %d\n", command);
+ break;
+ }
+
+reset_handler:
+ queue_delayed_work(kpcivhost_workqueue, &vhost->cmd_handler,
+ msecs_to_jiffies(1));
+}
+
+/* pci_epf_vhost_queue_activate - Map virtqueue local address to remote
+ * virtqueue address provided by EPF virtio
+ * @vqueue: struct epf_vhost_queue holding the local virtqueue address
+ *
+ * In order for the local system to access the remote virtqueue, the address
+ * reserved in local system should be mapped to the remote virtqueue address.
+ * Map local virtqueue address to remote virtqueue address here.
+ */
+static int pci_epf_vhost_queue_activate(struct epf_vhost_queue *vqueue)
+{
+ struct epf_vhost_reg_queue *reg_queue;
+ struct vhost_virtqueue *vq;
+ struct epf_vhost_reg *reg;
+ phys_addr_t vq_phys_addr;
+ struct epf_vhost *vhost;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ struct device *dev;
+ u64 vq_remote_addr;
+ size_t vq_size;
+ u8 func_no;
+ int ret;
+
+ vhost = vqueue->vhost;
+ epf = vhost->epf;
+ dev = &epf->dev;
+ epc = epf->epc;
+ func_no = epf->func_no;
+
+ vq = vqueue->vq;
+ reg = vhost->reg;
+ reg_queue = &reg->vq[vq->index];
+ vq_phys_addr = vqueue->phys_addr;
+ vq_remote_addr = reg_queue->queue_addr;
+ vq_size = vqueue->size;
+
+ ret = pci_epc_map_addr(epc, func_no, vq_phys_addr, vq_remote_addr,
+ vq_size);
+ if (ret) {
+ dev_err(dev, "Failed to map outbound address\n");
+ return ret;
+ }
+
+ reg_queue->status |= VHOST_QUEUE_STATUS_ENABLE;
+
+ return 0;
+}
+
+/* pci_epf_vhost_queue_deactivate - Unmap virtqueue local address from remote
+ * virtqueue address
+ * @vqueue: struct epf_vhost_queue holding the local virtqueue address
+ *
+ * Unmap virtqueue local address from remote virtqueue address.
+ */
+static void pci_epf_vhost_queue_deactivate(struct epf_vhost_queue *vqueue)
+{
+ struct epf_vhost_reg_queue *reg_queue;
+ struct vhost_virtqueue *vq;
+ struct epf_vhost_reg *reg;
+ phys_addr_t vq_phys_addr;
+ struct epf_vhost *vhost;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ u8 func_no;
+
+ vhost = vqueue->vhost;
+
+ epf = vhost->epf;
+ epc = epf->epc;
+ func_no = epf->func_no;
+ vq_phys_addr = vqueue->phys_addr;
+
+ pci_epc_unmap_addr(epc, func_no, vq_phys_addr);
+
+ reg = vhost->reg;
+ vq = vqueue->vq;
+ reg_queue = &reg->vq[vq->index];
+ reg_queue->status &= ~VHOST_QUEUE_STATUS_ENABLE;
+}
+
+/* pci_epf_vhost_queue_cmd_handler - Handle commands from remote EPF virtio
+ * driver sent for a particular virtqueue
+ * @work: The work_struct holding the pci_epf_vhost_queue_cmd_handler()
+ * function that is scheduled
+ *
+ * Handle commands from the remote EPF virtio driver sent for a particular
+ * virtqueue to activate/de-activate a virtqueue or to send notification to
+ * the vhost client driver.
+ */
+static void pci_epf_vhost_queue_cmd_handler(struct work_struct *work)
+{
+ struct epf_vhost_reg_queue *reg_queue;
+ struct epf_vhost_queue *vqueue;
+ struct vhost_virtqueue *vq;
+ struct epf_vhost_reg *reg;
+ struct epf_vhost *vhost;
+ struct device *dev;
+ u8 command;
+ int ret;
+
+ vqueue = container_of(work, struct epf_vhost_queue, cmd_handler.work);
+ vhost = vqueue->vhost;
+ reg = vhost->reg;
+ vq = vqueue->vq;
+ reg_queue = &reg->vq[vq->index];
+ dev = &vhost->epf->dev;
+
+ command = reg_queue->cmd;
+ if (!command)
+ goto reset_handler;
+
+ reg_queue->cmd = 0;
+ vq = vqueue->vq;
+
+ switch (command) {
+ case QUEUE_CMD_ACTIVATE:
+ ret = pci_epf_vhost_queue_activate(vqueue);
+ if (ret)
+ reg_queue->cmd_status = QUEUE_CMD_STATUS_ERROR;
+ else
+ reg_queue->cmd_status = QUEUE_CMD_STATUS_OKAY;
+ break;
+ case QUEUE_CMD_DEACTIVATE:
+ pci_epf_vhost_queue_deactivate(vqueue);
+ reg_queue->cmd_status = QUEUE_CMD_STATUS_OKAY;
+ break;
+ case QUEUE_CMD_NOTIFY:
+ vhost_virtqueue_callback(vqueue->vq);
+ reg_queue->cmd_status = QUEUE_CMD_STATUS_OKAY;
+ break;
+ default:
+ dev_err(dev, "UNKNOWN QUEUE command: %d\n", command);
+ break;
+}
+
+reset_handler:
+ queue_delayed_work(kpcivhost_workqueue, &vqueue->cmd_handler,
+ msecs_to_jiffies(1));
+}
+
+/* pci_epf_vhost_write - Write data to buffer provided by remote virtio driver
+ * @vdev: Vhost device that communicates with remove virtio device
+ * @dst: Buffer address present in the memory of the remote system to which
+ * data should be written
+ * @src: Buffer address in the local device provided by the vhost client driver
+ * @len: Length of the data to be copied from @src to @dst
+ *
+ * Write data to buffer provided by remote virtio driver from buffer provided
+ * by vhost client driver.
+ */
+static int pci_epf_vhost_write(struct vhost_dev *vdev, u64 dst, void *src, int len)
+{
+ const struct pci_epc_features *epc_features;
+ struct epf_vhost *vhost;
+ phys_addr_t phys_addr;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ void __iomem *addr;
+ struct device *dev;
+ int offset, ret;
+ u64 dst_addr;
+ size_t align;
+ u8 func_no;
+
+ vhost = to_epf_vhost(vdev);
+ epf = vhost->epf;
+ dev = &epf->dev;
+ epc = epf->epc;
+ func_no = epf->func_no;
+ epc_features = vhost->epc_features;
+ align = epc_features->align;
+
+ offset = dst & (align - 1);
+ dst_addr = dst & ~(align - 1);
+
+ addr = pci_epc_mem_alloc_addr(epc, &phys_addr, len);
+ if (!addr) {
+ dev_err(dev, "Failed to allocate outbound address\n");
+ return -ENOMEM;
+ }
+
+ ret = pci_epc_map_addr(epc, func_no, phys_addr, dst_addr, len);
+ if (ret) {
+ dev_err(dev, "Failed to map outbound address\n");
+ goto ret;
+ }
+
+ memcpy_toio(addr + offset, src, len);
+
+ pci_epc_unmap_addr(epc, func_no, phys_addr);
+
+ret:
+ pci_epc_mem_free_addr(epc, phys_addr, addr, len);
+
+ return ret;
+}
+
+/* ntb_vhost_read - Read data from buffer provided by remote virtio driver
+ * @vdev: Vhost device that communicates with remove virtio device
+ * @dst: Buffer address in the local device provided by the vhost client driver
+ * @src: Buffer address in the remote device provided by the remote virtio
+ * driver
+ * @len: Length of the data to be copied from @src to @dst
+ *
+ * Read data from buffer provided by remote virtio driver to address provided
+ * by vhost client driver.
+ */
+static int pci_epf_vhost_read(struct vhost_dev *vdev, void *dst, u64 src, int len)
+{
+ const struct pci_epc_features *epc_features;
+ struct epf_vhost *vhost;
+ phys_addr_t phys_addr;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ void __iomem *addr;
+ struct device *dev;
+ int offset, ret;
+ u64 src_addr;
+ size_t align;
+ u8 func_no;
+
+ vhost = to_epf_vhost(vdev);
+ epf = vhost->epf;
+ dev = &epf->dev;
+ epc = epf->epc;
+ func_no = epf->func_no;
+ epc_features = vhost->epc_features;
+ align = epc_features->align;
+
+ offset = src & (align - 1);
+ src_addr = src & ~(align - 1);
+
+ addr = pci_epc_mem_alloc_addr(epc, &phys_addr, len);
+ if (!addr) {
+ dev_err(dev, "Failed to allocate outbound address\n");
+ return -ENOMEM;
+ }
+
+ ret = pci_epc_map_addr(epc, func_no, phys_addr, src_addr, len);
+ if (ret) {
+ dev_err(dev, "Failed to map outbound address\n");
+ goto ret;
+ }
+
+ memcpy_fromio(dst, addr + offset, len);
+
+ pci_epc_unmap_addr(epc, func_no, phys_addr);
+
+ret:
+ pci_epc_mem_free_addr(epc, phys_addr, addr, len);
+
+ return ret;
+}
+
+/* pci_epf_vhost_notify - Send notification to the remote virtqueue
+ * @vq: The local vhost virtqueue corresponding to the remote virtio virtqueue
+ *
+ * Use endpoint core framework to raise MSI-X interrupt to notify the remote
+ * virtqueue.
+ */
+static void pci_epf_vhost_notify(struct vhost_virtqueue *vq)
+{
+ struct epf_vhost_reg_queue *reg_queue;
+ struct epf_vhost_reg *reg;
+ struct epf_vhost *vhost;
+ struct vhost_dev *vdev;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ u8 func_no;
+
+ vdev = vq->dev;
+ vhost = to_epf_vhost(vdev);
+ epf = vhost->epf;
+ func_no = epf->func_no;
+ epc = epf->epc;
+ reg = vhost->reg;
+ reg_queue = &reg->vq[vq->index];
+
+ pci_epc_raise_irq(epc, func_no, PCI_EPC_IRQ_MSIX,
+ reg_queue->msix_vector + 1);
+}
+
+/* pci_epf_vhost_del_vqs - Delete all the vqs associated with the vhost device
+ * @vdev: Vhost device that communicates with remove virtio device
+ *
+ * Delete all the vqs associated with the vhost device and free the memory
+ * address reserved for accessing the remote virtqueue.
+ */
+static void pci_epf_vhost_del_vqs(struct vhost_dev *vdev)
+{
+ struct epf_vhost_queue *vqueue;
+ struct vhost_virtqueue *vq;
+ phys_addr_t vq_phys_addr;
+ struct epf_vhost *vhost;
+ void __iomem *vq_addr;
+ unsigned int vq_size;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ int i;
+
+ vhost = to_epf_vhost(vdev);
+ epf = vhost->epf;
+ epc = epf->epc;
+
+ for (i = 0; i < vdev->nvqs; i++) {
+ vq = vdev->vqs[i];
+ if (IS_ERR_OR_NULL(vq))
+ continue;
+
+ vqueue = &vhost->vqueue[i];
+ vq_phys_addr = vqueue->phys_addr;
+ vq_addr = vqueue->addr;
+ vq_size = vqueue->size;
+ pci_epc_mem_free_addr(epc, vq_phys_addr, vq_addr, vq_size);
+ kfree(vq);
+ }
+}
+
+/* pci_epf_vhost_create_vq - Create a new vhost virtqueue
+ * @vdev: Vhost device that communicates with remove virtio device
+ * @index: Index of the vhost virtqueue
+ * @num_bufs: The number of buffers that should be supported by the vhost
+ * virtqueue (number of descriptors in the vhost virtqueue)
+ * @callback: Callback function associated with the virtqueue
+ *
+ * Create a new vhost virtqueue which can be used by the vhost client driver
+ * to access the remote virtio. This sets up the local address of the vhost
+ * virtqueue but shouldn't be accessed until the virtio sets the status to
+ * VIRTIO_CONFIG_S_DRIVER_OK.
+ */
+static struct vhost_virtqueue *
+pci_epf_vhost_create_vq(struct vhost_dev *vdev, int index,
+ unsigned int num_bufs,
+ void (*callback)(struct vhost_virtqueue *))
+{
+ struct epf_vhost_reg_queue *reg_queue;
+ struct epf_vhost_queue *vqueue;
+ struct epf_vhost_reg *reg;
+ struct vhost_virtqueue *vq;
+ phys_addr_t vq_phys_addr;
+ struct epf_vhost *vhost;
+ struct vringh *vringh;
+ void __iomem *vq_addr;
+ unsigned int vq_size;
+ struct vring *vring;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ struct device *dev;
+ int ret;
+
+ vhost = to_epf_vhost(vdev);
+ vqueue = &vhost->vqueue[index];
+ reg = vhost->reg;
+ reg_queue = &reg->vq[index];
+ epf = vhost->epf;
+ epc = epf->epc;
+ dev = &epf->dev;
+
+ vq = kzalloc(sizeof(*vq), GFP_KERNEL);
+ if (!vq)
+ return ERR_PTR(-ENOMEM);
+
+ vq->dev = vdev;
+ vq->callback = callback;
+ vq->num = num_bufs;
+ vq->index = index;
+ vq->notify = pci_epf_vhost_notify;
+ vq->type = VHOST_TYPE_MMIO;
+
+ vqueue->vq = vq;
+ vqueue->vhost = vhost;
+
+ vringh = &vq->vringh;
+ vring = &vringh->vring;
+ reg_queue->num_buffers = num_bufs;
+
+ vq_size = vring_size(num_bufs, VIRTIO_PCI_VRING_ALIGN);
+ vq_addr = pci_epc_mem_alloc_addr(epc, &vq_phys_addr, vq_size);
+ if (!vq_addr) {
+ dev_err(dev, "Failed to allocate virtqueue address\n");
+ ret = -ENOMEM;
+ goto err_mem_alloc_addr;
+ }
+
+ vring_init(vring, num_bufs, vq_addr, VIRTIO_PCI_VRING_ALIGN);
+ ret = vringh_init_mmio(vringh, 0, num_bufs, false, vring->desc,
+ vring->avail, vring->used);
+ if (ret) {
+ dev_err(dev, "Failed to init vringh\n");
+ goto err_init_mmio;
+ }
+
+ vqueue->phys_addr = vq_phys_addr;
+ vqueue->addr = vq_addr;
+ vqueue->size = vq_size;
+
+ INIT_DELAYED_WORK(&vqueue->cmd_handler, pci_epf_vhost_queue_cmd_handler);
+ queue_work(kpcivhost_workqueue, &vqueue->cmd_handler.work);
+
+ return vq;
+
+err_init_mmio:
+ pci_epc_mem_free_addr(epc, vq_phys_addr, vq_addr, vq_size);
+
+err_mem_alloc_addr:
+ kfree(vq);
+
+ return ERR_PTR(ret);
+}
+
+/* pci_epf_vhost_create_vqs - Create vhost virtqueues for vhost device
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @nvqs: Number of vhost virtqueues to be created
+ * @num_bufs: The number of buffers that should be supported by the vhost
+ * virtqueue (number of descriptors in the vhost virtqueue)
+ * @vqs: Pointers to all the created vhost virtqueues
+ * @callback: Callback function associated with the virtqueue
+ * @names: Names associated with each virtqueue
+ *
+ * Create vhost virtqueues for vhost device. This acts as a wrapper to
+ * pci_epf_vhost_create_vq() which creates individual vhost virtqueue.
+ */
+static int pci_epf_vhost_create_vqs(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs,
+ struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[])
+{
+ struct epf_vhost *vhost;
+ struct pci_epf *epf;
+ struct device *dev;
+ int ret, i;
+
+ vhost = to_epf_vhost(vdev);
+ epf = vhost->epf;
+ dev = &epf->dev;
+
+ for (i = 0; i < nvqs; i++) {
+ vqs[i] = pci_epf_vhost_create_vq(vdev, i, num_bufs,
+ callbacks[i]);
+ if (IS_ERR_OR_NULL(vqs[i])) {
+ ret = PTR_ERR(vqs[i]);
+ dev_err(dev, "Failed to create virtqueue\n");
+ goto err;
+ }
+ }
+
+ vdev->nvqs = nvqs;
+ vdev->vqs = vqs;
+
+ return 0;
+
+err:
+ pci_epf_vhost_del_vqs(vdev);
+ return ret;
+}
+
+/* pci_epf_vhost_set_features - vhost_config_ops to set vhost device features
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @features: Features supported by the vhost client driver
+ *
+ * vhost_config_ops invoked by the vhost client driver to set vhost device
+ * features.
+ */
+static int pci_epf_vhost_set_features(struct vhost_dev *vdev, u64 features)
+{
+ struct epf_vhost_reg *reg;
+ struct epf_vhost *vhost;
+
+ vhost = to_epf_vhost(vdev);
+ reg = vhost->reg;
+
+ reg->host_features = features;
+
+ return 0;
+}
+
+/* ntb_vhost_set_status - vhost_config_ops to set vhost device status
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @status: Vhost device status configured by vhost client driver
+ *
+ * vhost_config_ops invoked by the vhost client driver to set vhost device
+ * status.
+ */
+static int pci_epf_vhost_set_status(struct vhost_dev *vdev, u8 status)
+{
+ struct epf_vhost_reg *reg;
+ struct epf_vhost *vhost;
+
+ vhost = to_epf_vhost(vdev);
+ reg = vhost->reg;
+
+ reg->device_status = status;
+
+ return 0;
+}
+
+/* ntb_vhost_get_status - vhost_config_ops to get vhost device status
+ * @vdev: Vhost device that communicates with the remote virtio device
+ *
+ * vhost_config_ops invoked by the vhost client driver to get vhost device
+ * status set by the remote virtio driver.
+ */
+static u8 pci_epf_vhost_get_status(struct vhost_dev *vdev)
+{
+ struct epf_vhost_reg *reg;
+ struct epf_vhost *vhost;
+
+ vhost = to_epf_vhost(vdev);
+ reg = vhost->reg;
+
+ return reg->device_status;
+}
+
+static const struct vhost_config_ops pci_epf_vhost_ops = {
+ .create_vqs = pci_epf_vhost_create_vqs,
+ .del_vqs = pci_epf_vhost_del_vqs,
+ .write = pci_epf_vhost_write,
+ .read = pci_epf_vhost_read,
+ .set_features = pci_epf_vhost_set_features,
+ .set_status = pci_epf_vhost_set_status,
+ .get_status = pci_epf_vhost_get_status,
+};
+
+/* pci_epf_vhost_write_header - Write to PCIe standard configuration space
+ * header
+ * @vhost: EPF vhost containing the vhost device that communicates with the
+ * remote virtio device
+ *
+ * Invokes endpoint core framework's pci_epc_write_header() to write to the
+ * standard configuration space header.
+ */
+static int pci_epf_vhost_write_header(struct epf_vhost *vhost)
+{
+ struct pci_epf_header *header;
+ struct vhost_dev *vdev;
+ struct pci_epc *epc;
+ struct pci_epf *epf;
+ struct device *dev;
+ u8 func_no;
+ int ret;
+
+ vdev = &vhost->vdev;
+ epf = vhost->epf;
+ dev = &epf->dev;
+ epc = epf->epc;
+ func_no = epf->func_no;
+ header = epf->header;
+
+ ret = pci_epc_write_header(epc, func_no, header);
+ if (ret) {
+ dev_err(dev, "Configuration header write failed\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+/* pci_epf_vhost_release_dev - Callback function to free device
+ * @dev: Device in vhost_dev that has to be freed
+ *
+ * Callback function from device core invoked to free the device after
+ * all references have been removed. This frees the allocated memory for
+ * struct ntb_vhost.
+ */
+static void pci_epf_vhost_release_dev(struct device *dev)
+{
+ struct epf_vhost *vhost;
+ struct vhost_dev *vdev;
+
+ vdev = to_vhost_dev(dev);
+ vhost = to_epf_vhost(vdev);
+
+ kfree(vhost);
+}
+
+/* pci_epf_vhost_register - Register a vhost device
+ * @vhost: EPF vhost containing the vhost device that communicates with the
+ * remote virtio device
+ *
+ * Invoked vhost_register_device() to register a vhost device after populating
+ * the deviceID and vendorID of the vhost device.
+ */
+static int pci_epf_vhost_register(struct epf_vhost *vhost)
+{
+ struct vhost_dev *vdev;
+ struct pci_epf *epf;
+ struct device *dev;
+ int ret;
+
+ vdev = &vhost->vdev;
+ epf = vhost->epf;
+ dev = &epf->dev;
+
+ vdev->dev.parent = dev;
+ vdev->dev.release = pci_epf_vhost_release_dev;
+ vdev->id.device = vhost->epf->header->subsys_id;
+ vdev->id.vendor = vhost->epf->header->subsys_vendor_id;
+ vdev->ops = &pci_epf_vhost_ops;
+
+ ret = vhost_register_device(vdev);
+ if (ret) {
+ dev_err(dev, "Failed to register vhost device\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+/* pci_epf_vhost_configure_bar - Configure BAR of EPF device
+ * @vhost: EPF vhost containing the vhost device that communicates with the
+ * remote virtio device
+ *
+ * Allocate memory for the standard virtio configuration space and map it to
+ * the first free BAR.
+ */
+static int pci_epf_vhost_configure_bar(struct epf_vhost *vhost)
+{
+ size_t msix_table_size = 0, pba_size = 0, align, bar_size;
+ const struct pci_epc_features *epc_features;
+ struct pci_epf_bar *epf_bar;
+ struct vhost_dev *vdev;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ struct device *dev;
+ bool msix_capable;
+ u32 config_size;
+ int barno, ret;
+ void *base;
+ u64 size;
+
+ vdev = &vhost->vdev;
+ epf = vhost->epf;
+ dev = &epf->dev;
+ epc = epf->epc;
+
+ epc_features = vhost->epc_features;
+ barno = pci_epc_get_first_free_bar(epc_features);
+ if (barno < 0) {
+ dev_err(dev, "Failed to get free BAR\n");
+ return barno;
+ }
+
+ size = epc_features->bar_fixed_size[barno];
+ align = epc_features->align;
+ /* Check if epc_features is populated incorrectly */
+ if ((!IS_ALIGNED(size, align)))
+ return -EINVAL;
+
+ config_size = sizeof(struct epf_vhost_reg) + VHOST_DEVICE_CONFIG_SIZE;
+ config_size = ALIGN(config_size, 8);
+
+ msix_capable = epc_features->msix_capable;
+ if (msix_capable) {
+ msix_table_size = PCI_MSIX_ENTRY_SIZE * epf->msix_interrupts;
+ vhost->msix_table_offset = config_size;
+ vhost->msix_bar = barno;
+ /* Align to QWORD or 8 Bytes */
+ pba_size = ALIGN(DIV_ROUND_UP(epf->msix_interrupts, 8), 8);
+ }
+
+ bar_size = config_size + msix_table_size + pba_size;
+
+ if (!align)
+ bar_size = roundup_pow_of_two(bar_size);
+ else
+ bar_size = ALIGN(bar_size, align);
+
+ if (!size)
+ size = bar_size;
+ else if (size < bar_size)
+ return -EINVAL;
+
+ base = pci_epf_alloc_space(epf, size, barno, align,
+ PRIMARY_INTERFACE);
+ if (!base) {
+ dev_err(dev, "Failed to allocate configuration region\n");
+ return -ENOMEM;
+ }
+
+ epf_bar = &epf->bar[barno];
+ ret = pci_epc_set_bar(epc, epf->func_no, epf_bar);
+ if (ret) {
+ dev_err(dev, "Failed to set BAR: %d\n", barno);
+ goto err_set_bar;
+ }
+
+ vhost->reg = base;
+
+ return 0;
+
+err_set_bar:
+ pci_epf_free_space(epf, base, barno, PRIMARY_INTERFACE);
+
+ return ret;
+}
+
+/* pci_epf_vhost_configure_interrupts - Configure MSI/MSI-X capability of EPF
+ * device
+ * @vhost: EPF vhost containing the vhost device that communicates with the
+ * remote virtio device
+ *
+ * Configure MSI/MSI-X capability of EPF device. This will be used to interrupt
+ * the vhost virtqueue.
+ */
+static int pci_epf_vhost_configure_interrupts(struct epf_vhost *vhost)
+{
+ const struct pci_epc_features *epc_features;
+ struct pci_epf *epf;
+ struct pci_epc *epc;
+ struct device *dev;
+ int ret;
+
+ epc_features = vhost->epc_features;
+ epf = vhost->epf;
+ dev = &epf->dev;
+ epc = epf->epc;
+
+ if (epc_features->msi_capable) {
+ ret = pci_epc_set_msi(epc, epf->func_no,
+ EPF_VHOST_MAX_INTERRUPTS);
+ if (ret) {
+ dev_err(dev, "MSI configuration failed\n");
+ return ret;
+ }
+ }
+
+ if (epc_features->msix_capable) {
+ ret = pci_epc_set_msix(epc, epf->func_no,
+ EPF_VHOST_MAX_INTERRUPTS,
+ vhost->msix_bar,
+ vhost->msix_table_offset);
+ if (ret) {
+ dev_err(dev, "MSI-X configuration failed\n");
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+/* pci_epf_vhost_cfs_link - Link vhost client driver with EPF vhost to get
+ * the deviceID and driverID to be writtent to the PCIe config space
+ * @epf_vhost_item: Config item representing the EPF vhost created by this
+ * driver
+ * @epf: Endpoint function device that is bound to the endpoint controller
+ *
+ * This is invoked when the user creates a softlink between the vhost client
+ * to the EPF vhost. This gets the deviceID and vendorID data from the vhost
+ * client and copies it to the subys_id and subsys_vendor_id of the EPF
+ * header. This will be used by the remote virtio to bind a virtio client
+ * driver.
+ */
+static int pci_epf_vhost_cfs_link(struct config_item *epf_vhost_item,
+ struct config_item *driver_item)
+{
+ struct vhost_driver_item *vdriver_item;
+ struct epf_vhost *vhost;
+
+ vdriver_item = to_vhost_driver_item(driver_item);
+ vhost = to_epf_vhost_from_ci(epf_vhost_item);
+
+ vhost->epf->header->subsys_id = vdriver_item->device;
+ vhost->epf->header->subsys_vendor_id = vdriver_item->vendor;
+
+ return 0;
+}
+
+static struct configfs_item_operations pci_epf_vhost_cfs_ops = {
+ .allow_link = pci_epf_vhost_cfs_link,
+};
+
+static const struct config_item_type pci_epf_vhost_cfs_type = {
+ .ct_item_ops = &pci_epf_vhost_cfs_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+/* pci_epf_vhost_cfs_work - Delayed work function to create configfs directory
+ * to perform EPF vhost specific initializations
+ * @work: The work_struct holding the pci_epf_vhost_cfs_work() function that
+ * is scheduled
+ *
+ * This is a delayed work function to create configfs directory to perform EPF
+ * vhost specific initializations. This configfs directory will be a
+ * sub-directory to the directory created by the user to create pci_epf device.
+ */
+static void pci_epf_vhost_cfs_work(struct work_struct *work)
+{
+ struct epf_vhost *vhost = container_of(work, struct epf_vhost,
+ cfs_work.work);
+ struct pci_epf *epf = vhost->epf;
+ struct device *dev = &epf->dev;
+ struct config_group *group;
+ struct vhost_dev *vdev;
+ int ret;
+
+ if (!epf->group) {
+ queue_delayed_work(kpcivhost_workqueue, &vhost->cfs_work,
+ msecs_to_jiffies(50));
+ return;
+ }
+
+ vdev = &vhost->vdev;
+ group = &vhost->group;
+ config_group_init_type_name(group, dev_name(dev), &pci_epf_vhost_cfs_type);
+ ret = configfs_register_group(epf->group, group);
+ if (ret) {
+ dev_err(dev, "Failed to register configfs group %s\n", dev_name(dev));
+ return;
+ }
+}
+
+/* pci_epf_vhost_probe - Initialize struct epf_vhost when a new EPF device is
+ * created
+ * @epf: Endpoint function device that is bound to this driver
+ *
+ * Probe function to initialize struct epf_vhost when a new EPF device is
+ * created.
+ */
+static int pci_epf_vhost_probe(struct pci_epf *epf)
+{
+ struct epf_vhost *vhost;
+
+ vhost = kzalloc(sizeof(*vhost), GFP_KERNEL);
+ if (!vhost)
+ return -ENOMEM;
+
+ epf->header = &epf_vhost_header;
+ vhost->epf = epf;
+
+ epf_set_drvdata(epf, vhost);
+ INIT_DELAYED_WORK(&vhost->cmd_handler, pci_epf_vhost_cmd_handler);
+ INIT_DELAYED_WORK(&vhost->cfs_work, pci_epf_vhost_cfs_work);
+ queue_delayed_work(kpcivhost_workqueue, &vhost->cfs_work,
+ msecs_to_jiffies(50));
+
+ return 0;
+}
+
+/* pci_epf_vhost_remove - Free the initializations performed by
+ * pci_epf_vhost_probe()
+ * @epf: Endpoint function device that is bound to this driver
+ *
+ * Free the initializations performed by pci_epf_vhost_probe().
+ */
+static int pci_epf_vhost_remove(struct pci_epf *epf)
+{
+ struct epf_vhost *vhost;
+
+ vhost = epf_get_drvdata(epf);
+ cancel_delayed_work_sync(&vhost->cfs_work);
+
+ return 0;
+}
+
+/* pci_epf_vhost_bind - Bind callback to initialize the PCIe EP controller
+ * @epf: Endpoint function device that is bound to the endpoint controller
+ *
+ * pci_epf_vhost_bind() is invoked when an endpoint controller is bound to
+ * endpoint function. This function initializes the endpoint controller
+ * with vhost endpoint function specific data.
+ */
+static int pci_epf_vhost_bind(struct pci_epf *epf)
+{
+ const struct pci_epc_features *epc_features;
+ struct epf_vhost *vhost;
+ struct pci_epc *epc;
+ struct device *dev;
+ int ret;
+
+ vhost = epf_get_drvdata(epf);
+ dev = &epf->dev;
+ epc = epf->epc;
+
+ epc_features = pci_epc_get_features(epc, epf->func_no);
+ if (!epc_features) {
+ dev_err(dev, "Fail to get EPC features\n");
+ return -EINVAL;
+ }
+ vhost->epc_features = epc_features;
+
+ ret = pci_epf_vhost_write_header(vhost);
+ if (ret) {
+ dev_err(dev, "Failed to bind VHOST config header\n");
+ return ret;
+ }
+
+ ret = pci_epf_vhost_configure_bar(vhost);
+ if (ret) {
+ dev_err(dev, "Failed to configure BAR\n");
+ return ret;
+ }
+
+ ret = pci_epf_vhost_configure_interrupts(vhost);
+ if (ret) {
+ dev_err(dev, "Failed to configure BAR\n");
+ return ret;
+ }
+
+ ret = pci_epf_vhost_register(vhost);
+ if (ret) {
+ dev_err(dev, "Failed to bind VHOST config header\n");
+ return ret;
+ }
+
+ queue_work(kpcivhost_workqueue, &vhost->cmd_handler.work);
+
+ return ret;
+}
+
+/* pci_epf_vhost_unbind - Inbind callback to cleanup the PCIe EP controller
+ * @epf: Endpoint function device that is bound to the endpoint controller
+ *
+ * pci_epf_vhost_unbind() is invoked when the binding between endpoint
+ * controller is removed from endpoint function. This will unregister vhost
+ * device and cancel pending cmd_handler work.
+ */
+static void pci_epf_vhost_unbind(struct pci_epf *epf)
+{
+ struct epf_vhost *vhost;
+ struct vhost_dev *vdev;
+
+ vhost = epf_get_drvdata(epf);
+ vdev = &vhost->vdev;
+
+ cancel_delayed_work_sync(&vhost->cmd_handler);
+ if (device_is_registered(&vdev->dev))
+ vhost_unregister_device(vdev);
+}
+
+static struct pci_epf_ops epf_ops = {
+ .bind = pci_epf_vhost_bind,
+ .unbind = pci_epf_vhost_unbind,
+};
+
+static const struct pci_epf_device_id pci_epf_vhost_ids[] = {
+ {
+ .name = "pci-epf-vhost",
+ },
+ { },
+};
+
+static struct pci_epf_driver epf_vhost_driver = {
+ .driver.name = "pci_epf_vhost",
+ .probe = pci_epf_vhost_probe,
+ .remove = pci_epf_vhost_remove,
+ .id_table = pci_epf_vhost_ids,
+ .ops = &epf_ops,
+ .owner = THIS_MODULE,
+};
+
+static int __init pci_epf_vhost_init(void)
+{
+ int ret;
+
+ kpcivhost_workqueue = alloc_workqueue("kpcivhost", WQ_MEM_RECLAIM |
+ WQ_HIGHPRI, 0);
+ ret = pci_epf_register_driver(&epf_vhost_driver);
+ if (ret) {
+ pr_err("Failed to register pci epf vhost driver --> %d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+module_init(pci_epf_vhost_init);
+
+static void __exit pci_epf_vhost_exit(void)
+{
+ pci_epf_unregister_driver(&epf_vhost_driver);
+}
+module_exit(pci_epf_vhost_exit);
+
+MODULE_DESCRIPTION("PCI EPF VHOST DRIVER");
+MODULE_AUTHOR("Kishon Vijay Abraham I <[email protected]>");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/vhost/vhost_cfs.c b/drivers/vhost/vhost_cfs.c
index ae46e71968f1..ab0393289200 100644
--- a/drivers/vhost/vhost_cfs.c
+++ b/drivers/vhost/vhost_cfs.c
@@ -18,12 +18,6 @@ static struct config_group *vhost_driver_group;
/* VHOST device like PCIe EP, NTB etc., */
static struct config_group *vhost_device_group;

-struct vhost_driver_item {
- struct config_group group;
- u32 vendor;
- u32 device;
-};
-
struct vhost_driver_group {
struct config_group group;
};
@@ -33,13 +27,6 @@ struct vhost_device_item {
struct vhost_dev *vdev;
};

-static inline
-struct vhost_driver_item *to_vhost_driver_item(struct config_item *item)
-{
- return container_of(to_config_group(item), struct vhost_driver_item,
- group);
-}
-
static inline
struct vhost_device_item *to_vhost_device_item(struct config_item *item)
{
diff --git a/include/linux/vhost.h b/include/linux/vhost.h
index be9341ffd266..640650311310 100644
--- a/include/linux/vhost.h
+++ b/include/linux/vhost.h
@@ -74,6 +74,7 @@ struct vhost_virtqueue {
struct vhost_dev *dev;
enum vhost_type type;
struct vringh vringh;
+ int index;
void (*callback)(struct vhost_virtqueue *vq);
void (*notify)(struct vhost_virtqueue *vq);

@@ -148,6 +149,12 @@ struct vhost_msg_node {
struct list_head node;
};

+struct vhost_driver_item {
+ struct config_group group;
+ u32 vendor;
+ u32 device;
+};
+
enum vhost_notify_event {
NOTIFY_SET_STATUS,
NOTIFY_FINALIZE_FEATURES,
@@ -230,6 +237,13 @@ static inline void *vhost_get_drvdata(struct vhost_dev *vdev)
return dev_get_drvdata(&vdev->dev);
}

+static inline
+struct vhost_driver_item *to_vhost_driver_item(struct config_item *item)
+{
+ return container_of(to_config_group(item), struct vhost_driver_item,
+ group);
+}
+
int vhost_register_driver(struct vhost_driver *driver);
void vhost_unregister_driver(struct vhost_driver *driver);
int vhost_register_device(struct vhost_dev *vdev);
--
2.17.1

2020-07-02 08:25:50

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 22/22] NTB: Describe ntb_virtio and ntb_vhost client in the documentation

Add a blurb in Documentation/ntb.txt to describe the ntb_virtio and
ntb_vhost client

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
Documentation/driver-api/ntb.rst | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/Documentation/driver-api/ntb.rst b/Documentation/driver-api/ntb.rst
index 87d1372da879..f84b81625397 100644
--- a/Documentation/driver-api/ntb.rst
+++ b/Documentation/driver-api/ntb.rst
@@ -227,6 +227,17 @@ test client is interacted with through the debugfs filesystem:
specified peer. That peer's interrupt's occurrence file
should be incremented.

+NTB Vhost Client (ntb\_vhost) and NTB Virtio Client (ntb\_virtio)
+------------------------------------------------------------------
+
+When two hosts are connected via NTB, one of the hosts should use NTB Vhost
+Client and the other host should use NTB Virtio Client. The NTB Vhost client
+interfaces with the Linux Vhost Framework and lets it to be used with any
+vhost client driver. The NTB Virtio client interfaces with the Linux Virtio
+Framework and lets it to be used with any virtio client driver. The Vhost
+client driver and Virtio client driver creates a logic cink to exchange data
+with each other.
+
NTB Hardware Drivers
====================

--
2.17.1

2020-07-02 08:25:55

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 21/22] NTB: Add a new NTB client driver to implement VHOST functionality

Add a new NTB client driver to implement VHOST functionality. When two
hosts are connected using NTB, one of the hosts should run NTB client
driver that implements VIRTIO functionality and the other host should
run NTB client implements VHOST functionality. This interfaces with
VHOST layer so that any vhost client driver can exchange data with
the remote virtio client driver.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/ntb/Kconfig | 9 +
drivers/ntb/Makefile | 1 +
drivers/ntb/ntb_vhost.c | 776 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 786 insertions(+)
create mode 100644 drivers/ntb/ntb_vhost.c

diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index e171b3256f68..7d1b9bb56e71 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -46,4 +46,13 @@ config NTB_VIRTIO

If unsure, say N.

+config NTB_VHOST
+ tristate "NTB VHOST"
+ help
+ The NTB vhost driver sits between the NTB HW driver and the vhost
+ client driver and lets the vhost client driver to exchange data with
+ the remote virtio driver over the NTB hardware.
+
+ If unsure, say N.
+
endif # NTB
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index d37ab488bcbc..25c9937c91cf 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -2,6 +2,7 @@
obj-$(CONFIG_NTB) += ntb.o hw/ test/
obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
obj-$(CONFIG_NTB_VIRTIO) += ntb_virtio.o
+obj-$(CONFIG_NTB_VHOST) += ntb_vhost.o

ntb-y := core.o
ntb-$(CONFIG_NTB_MSI) += msi.o
diff --git a/drivers/ntb/ntb_vhost.c b/drivers/ntb/ntb_vhost.c
new file mode 100644
index 000000000000..1d717bb98d85
--- /dev/null
+++ b/drivers/ntb/ntb_vhost.c
@@ -0,0 +1,776 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * NTB Client Driver to implement VHOST functionality
+ *
+ * Copyright (C) 2020 Texas Instruments
+ * Author: Kishon Vijay Abraham I <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/ntb.h>
+#include <linux/slab.h>
+#include <linux/vhost.h>
+#include <linux/virtio_pci.h>
+#include <linux/vringh.h>
+
+#include "ntb_virtio.h"
+
+static struct workqueue_struct *kntbvhost_workqueue;
+
+struct ntb_vhost_queue {
+ struct delayed_work db_handler;
+ struct vhost_virtqueue *vq;
+ void __iomem *vq_addr;
+};
+
+struct ntb_vhost {
+ struct ntb_vhost_queue vqueue[MAX_VQS];
+ struct work_struct link_cleanup;
+ struct delayed_work cmd_handler;
+ struct delayed_work link_work;
+ resource_size_t peer_mw_size;
+ struct config_group *group;
+ phys_addr_t peer_mw_addr;
+ struct vhost_dev vdev;
+ struct ntb_dev *ndev;
+ struct vring vring;
+ struct device *dev;
+ u64 virtio_addr;
+ u64 features;
+};
+
+#define to_ntb_vhost(v) container_of((v), struct ntb_vhost, vdev)
+
+/* ntb_vhost_finalize_features - Indicate features are finalized to vhost client
+ * driver
+ * @ntb: NTB vhost device that communicates with the remote virtio device
+ *
+ * Invoked when the remote virtio device sends HOST_CMD_FINALIZE_FEATURES
+ * command once the feature negotiation is complete. This function sends
+ * notification to the vhost client driver.
+ */
+static void ntb_vhost_finalize_features(struct ntb_vhost *ntb)
+{
+ struct vhost_dev *vdev;
+ struct ntb_dev *ndev;
+ u64 features;
+
+ vdev = &ntb->vdev;
+ ndev = ntb->ndev;
+
+ features = ntb_spad_read(ndev, VIRTIO_FEATURES_UPPER);
+ features <<= 32;
+ features |= ntb_spad_read(ndev, VIRTIO_FEATURES_LOWER);
+ vdev->features = features;
+ blocking_notifier_call_chain(&vdev->notifier, NOTIFY_FINALIZE_FEATURES, 0);
+}
+
+/* ntb_vhost_cmd_handler - Handle commands from remote NTB virtio driver
+ * @work: The work_struct holding the ntb_vhost_cmd_handler() function that is
+ * scheduled
+ *
+ * Handle commands from the remote NTB virtio driver and sends notification to
+ * the vhost client driver. The remote virtio driver sends commands when the
+ * virtio driver status is updated or when the feature negotiation is complet
+ * or if the virtio driver wants to reset the device.
+ */
+static void ntb_vhost_cmd_handler(struct work_struct *work)
+{
+ struct vhost_dev *vdev;
+ struct ntb_vhost *ntb;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ u8 command;
+
+ ntb = container_of(work, struct ntb_vhost, cmd_handler.work);
+ vdev = &ntb->vdev;
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ command = ntb_spad_read(ndev, VHOST_COMMAND);
+ if (!command)
+ goto reset_handler;
+
+ ntb_spad_write(ndev, VHOST_COMMAND, 0);
+
+ switch (command) {
+ case HOST_CMD_SET_STATUS:
+ blocking_notifier_call_chain(&vdev->notifier, NOTIFY_SET_STATUS, 0);
+ ntb_spad_write(ndev, VHOST_COMMAND_STATUS, HOST_CMD_STATUS_OKAY);
+ break;
+ case HOST_CMD_FINALIZE_FEATURES:
+ ntb_vhost_finalize_features(ntb);
+ ntb_spad_write(ndev, VHOST_COMMAND_STATUS, HOST_CMD_STATUS_OKAY);
+ break;
+ case HOST_CMD_RESET:
+ blocking_notifier_call_chain(&vdev->notifier, NOTIFY_RESET, 0);
+ ntb_spad_write(ndev, VHOST_COMMAND_STATUS, HOST_CMD_STATUS_OKAY);
+ break;
+ default:
+ dev_err(dev, "UNKNOWN command: %d\n", command);
+ break;
+ }
+
+reset_handler:
+ queue_delayed_work(kntbvhost_workqueue, &ntb->cmd_handler,
+ msecs_to_jiffies(1));
+}
+
+/* ntb_vhost_del_vqs - Delete all the vqs associated with the vhost device
+ * @vdev: Vhost device that communicates with the remote virtio device
+ *
+ * Delete all the vqs associated with the vhost device.
+ */
+void ntb_vhost_del_vqs(struct vhost_dev *vdev)
+{
+ struct ntb_vhost_queue *vqueue;
+ struct vhost_virtqueue *vq;
+ struct ntb_vhost *ntb;
+ int i;
+
+ ntb = to_ntb_vhost(vdev);
+
+ for (i = 0; i < vdev->nvqs; i++) {
+ vq = vdev->vqs[i];
+ if (IS_ERR_OR_NULL(vq))
+ continue;
+
+ vqueue = &ntb->vqueue[i];
+ cancel_delayed_work_sync(&vqueue->db_handler);
+ iounmap(vqueue->vq_addr);
+ kfree(vq);
+ }
+}
+
+/* ntb_vhost_vq_db_work - Handle doorbell event receive for a virtqueue
+ * @work: The work_struct holding the ntb_vhost_vq_db_work() function for every
+ * created virtqueue
+ *
+ * This function is invoked when the remote virtio driver sends a notification
+ * to the virtqueue. (virtqueue_kick() on the remote virtio driver). This
+ * function invokes the vhost client driver's virtqueue callback.
+ */
+static void ntb_vhost_vq_db_work(struct work_struct *work)
+{
+ struct ntb_vhost_queue *vqueue;
+
+ vqueue = container_of(work, struct ntb_vhost_queue, db_handler.work);
+ vhost_virtqueue_callback(vqueue->vq);
+}
+
+/* ntb_vhost_notify - Send notification to the remote virtqueue
+ * @vq: The local vhost virtqueue corresponding to the remote virtio virtqueue
+ *
+ * Use NTB doorbell to send notification for the remote virtqueue
+ */
+static void ntb_vhost_notify(struct vhost_virtqueue *vq)
+{
+ struct vhost_dev *vdev;
+ struct ntb_vhost *ntb;
+
+ vdev = vq->dev;
+ ntb = to_ntb_vhost(vdev);
+
+ ntb_peer_db_set(ntb->ndev, vq->index);
+}
+
+/* ntb_vhost_create_vq - Create a new vhost virtqueue
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @index: Index of the vhost virtqueue
+ * @num_bufs: The number of buffers that should be supported by the vhost
+ * virtqueue (number of descriptors in the vhost virtqueue)
+ * @callback: Callback function associated with the virtqueue
+ *
+ * Create a new vhost virtqueue which can be used by the vhost client driver
+ * to access the remote virtio. This sets up the local address of the vhost
+ * virtqueue but shouldn't be accessed until the virtio sets the status to
+ * VIRTIO_CONFIG_S_DRIVER_OK.
+ */
+static struct vhost_virtqueue *
+ntb_vhost_create_vq(struct vhost_dev *vdev, int index, unsigned int num_bufs,
+ void (*callback)(struct vhost_virtqueue *))
+{
+ struct ntb_vhost_queue *vqueue;
+ unsigned int vq_size, offset;
+ struct vhost_virtqueue *vq;
+ phys_addr_t vq_phys_addr;
+ phys_addr_t peer_mw_addr;
+ struct vringh *vringh;
+ struct ntb_vhost *ntb;
+ void __iomem *vq_addr;
+ struct ntb_dev *ndev;
+ struct vring *vring;
+ struct device *dev;
+ int ret;
+
+ ntb = to_ntb_vhost(vdev);
+ vqueue = &ntb->vqueue[index];
+ peer_mw_addr = ntb->peer_mw_addr;
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ vq = kzalloc(sizeof(*vq), GFP_KERNEL);
+ if (!vq)
+ return ERR_PTR(-ENOMEM);
+
+ vq->dev = vdev;
+ vq->callback = callback;
+ vq->num = num_bufs;
+ vq->index = index;
+ vq->notify = ntb_vhost_notify;
+ vq->type = VHOST_TYPE_MMIO;
+
+ vringh = &vq->vringh;
+ vring = &vringh->vring;
+
+ ntb_spad_write(ndev, VHOST_QUEUE_NUM_BUFFERS(index), num_bufs);
+ vq_size = vring_size(num_bufs, VIRTIO_PCI_VRING_ALIGN);
+ offset = index * vq_size;
+ if (offset + vq_size > ntb->peer_mw_size) {
+ dev_err(dev, "Not enough vhost memory for allocating vq\n");
+ ret = -ENOMEM;
+ goto err_out_of_bound;
+ }
+
+ vq_phys_addr = peer_mw_addr + offset;
+ vq_addr = ioremap_wc(vq_phys_addr, vq_size);
+ if (!vq_addr) {
+ dev_err(dev, "Fail to ioremap virtqueue address\n");
+ ret = -ENOMEM;
+ goto err_out_of_bound;
+ }
+
+ vqueue->vq = vq;
+ vqueue->vq_addr = vq_addr;
+
+ vring_init(vring, num_bufs, vq_addr, VIRTIO_PCI_VRING_ALIGN);
+ ret = vringh_init_mmio(vringh, ntb->features, num_bufs, false,
+ vring->desc, vring->avail, vring->used);
+ if (ret) {
+ dev_err(dev, "Failed to init vringh\n");
+ goto err_init_mmio;
+ }
+
+ INIT_DELAYED_WORK(&vqueue->db_handler, ntb_vhost_vq_db_work);
+
+ return vq;
+
+err_init_mmio:
+ iounmap(vq_addr);
+
+err_out_of_bound:
+ kfree(vq);
+
+ return ERR_PTR(ret);
+}
+
+/* ntb_vhost_create_vqs - Create vhost virtqueues for vhost device
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @nvqs: Number of vhost virtqueues to be created
+ * @num_bufs: The number of buffers that should be supported by the vhost
+ * virtqueue (number of descriptors in the vhost virtqueue)
+ * @vqs: Pointers to all the created vhost virtqueues
+ * @callback: Callback function associated with the virtqueue
+ * @names: Names associated with each virtqueue
+ *
+ * Create vhost virtqueues for vhost device. This acts as a wrapper to
+ * ntb_vhost_create_vq() which creates individual vhost virtqueue.
+ */
+static int ntb_vhost_create_vqs(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs,
+ struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[])
+{
+ struct ntb_vhost *ntb;
+ struct device *dev;
+ int ret, i;
+
+ ntb = to_ntb_vhost(vdev);
+ dev = ntb->dev;
+
+ for (i = 0; i < nvqs; i++) {
+ vqs[i] = ntb_vhost_create_vq(vdev, i, num_bufs, callbacks[i]);
+ if (IS_ERR_OR_NULL(vqs[i])) {
+ ret = PTR_ERR(vqs[i]);
+ dev_err(dev, "Failed to create virtqueue\n");
+ goto err;
+ }
+ }
+
+ vdev->nvqs = nvqs;
+ vdev->vqs = vqs;
+
+ return 0;
+
+err:
+ ntb_vhost_del_vqs(vdev);
+
+ return ret;
+}
+
+/* ntb_vhost_write - Write data to buffer provided by remote virtio driver
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @dst: Buffer address in the remote device provided by the remote virtio
+ * driver
+ * @src: Buffer address in the local device provided by the vhost client driver
+ * @len: Length of the data to be copied from @src to @dst
+ *
+ * Write data to buffer provided by remote virtio driver from buffer provided
+ * by vhost client driver.
+ */
+static int ntb_vhost_write(struct vhost_dev *vdev, u64 dst, void *src, int len)
+{
+ phys_addr_t peer_mw_addr, phys_addr;
+ struct ntb_vhost *ntb;
+ unsigned int offset;
+ struct device *dev;
+ u64 virtio_addr;
+ void *addr;
+
+ ntb = to_ntb_vhost(vdev);
+ dev = ntb->dev;
+
+ peer_mw_addr = ntb->peer_mw_addr;
+ virtio_addr = ntb->virtio_addr;
+
+ offset = dst - virtio_addr;
+ if (offset + len > ntb->peer_mw_size) {
+ dev_err(dev, "Overflow of vhost memory\n");
+ return -EINVAL;
+ }
+
+ phys_addr = peer_mw_addr + offset;
+ addr = ioremap_wc(phys_addr, len);
+ if (!addr) {
+ dev_err(dev, "Failed to ioremap vhost address\n");
+ return -ENOMEM;
+ }
+
+ memcpy_toio(addr, src, len);
+ iounmap(addr);
+
+ return 0;
+}
+
+/* ntb_vhost_read - Read data from buffers provided by remote virtio driver
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @dst: Buffer address in the local device provided by the vhost client driver
+ * @src: Buffer address in the remote device provided by the remote virtio
+ * driver
+ * @len: Length of the data to be copied from @src to @dst
+ *
+ * Read data from buffers provided by remote virtio driver to address provided
+ * by vhost client driver.
+ */
+static int ntb_vhost_read(struct vhost_dev *vdev, void *dst, u64 src, int len)
+{
+ phys_addr_t peer_mw_addr, phys_addr;
+ struct ntb_vhost *ntb;
+ unsigned int offset;
+ struct device *dev;
+ u64 virtio_addr;
+ void *addr;
+
+ ntb = to_ntb_vhost(vdev);
+ dev = ntb->dev;
+
+ peer_mw_addr = ntb->peer_mw_addr;
+ virtio_addr = ntb->virtio_addr;
+
+ offset = src - virtio_addr;
+ if (offset + len > ntb->peer_mw_size) {
+ dev_err(dev, "Overflow of vhost memory\n");
+ return -EINVAL;
+ }
+
+ phys_addr = peer_mw_addr + offset;
+ addr = ioremap_wc(phys_addr, len);
+ if (!addr) {
+ dev_err(dev, "Failed to ioremap vhost address\n");
+ return -ENOMEM;
+ }
+
+ memcpy_fromio(dst, addr, len);
+ iounmap(addr);
+
+ return 0;
+}
+
+/* ntb_vhost_release - Callback function to free device
+ * @dev: Device in vhost_dev that has to be freed
+ *
+ * Callback function from device core invoked to free the device after
+ * all references have been removed. This frees the allocated memory for
+ * struct ntb_vhost.
+ */
+static void ntb_vhost_release(struct device *dev)
+{
+ struct vhost_dev *vdev;
+ struct ntb_vhost *ntb;
+
+ vdev = to_vhost_dev(dev);
+ ntb = to_ntb_vhost(vdev);
+
+ kfree(ntb);
+}
+
+/* ntb_vhost_set_features - vhost_config_ops to set vhost device features
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @features: Features supported by the vhost client driver
+ *
+ * vhost_config_ops invoked by the vhost client driver to set vhost device
+ * features.
+ */
+static int ntb_vhost_set_features(struct vhost_dev *vdev, u64 features)
+{
+ struct ntb_vhost *ntb;
+ struct ntb_dev *ndev;
+
+ ntb = to_ntb_vhost(vdev);
+ ndev = ntb->ndev;
+
+ ntb_spad_write(ndev, VHOST_FEATURES_LOWER, lower_32_bits(features));
+ ntb_spad_write(ndev, VHOST_FEATURES_UPPER, upper_32_bits(features));
+ ntb->features = features;
+
+ return 0;
+}
+
+/* ntb_vhost_set_status - vhost_config_ops to set vhost device status
+ * @vdev: Vhost device that communicates with the remote virtio device
+ * @status: Vhost device status configured by vhost client driver
+ *
+ * vhost_config_ops invoked by the vhost client driver to set vhost device
+ * status.
+ */
+static int ntb_vhost_set_status(struct vhost_dev *vdev, u8 status)
+{
+ struct ntb_vhost *ntb;
+ struct ntb_dev *ndev;
+
+ ntb = to_ntb_vhost(vdev);
+ ndev = ntb->ndev;
+
+ ntb_spad_write(ndev, VHOST_DEVICE_STATUS, status);
+
+ return 0;
+}
+
+/* ntb_vhost_get_status - vhost_config_ops to get vhost device status
+ * @vdev: Vhost device that communicates with the remote virtio device
+ *
+ * vhost_config_ops invoked by the vhost client driver to get vhost device
+ * status set by the remote virtio driver.
+ */
+static u8 ntb_vhost_get_status(struct vhost_dev *vdev)
+{
+ struct ntb_vhost *ntb;
+ struct ntb_dev *ndev;
+
+ ntb = to_ntb_vhost(vdev);
+ ndev = ntb->ndev;
+
+ return ntb_spad_read(ndev, VHOST_DEVICE_STATUS);
+}
+
+static const struct vhost_config_ops ops = {
+ .create_vqs = ntb_vhost_create_vqs,
+ .del_vqs = ntb_vhost_del_vqs,
+ .write = ntb_vhost_write,
+ .read = ntb_vhost_read,
+ .set_features = ntb_vhost_set_features,
+ .set_status = ntb_vhost_set_status,
+ .get_status = ntb_vhost_get_status,
+};
+
+/* ntb_vhost_link_cleanup - Cleanup once link to the remote host is lost
+ * @ntb: NTB vhost device that communicates with the remote virtio device
+ *
+ * Performs the cleanup that has to be done once the link to the remote host
+ * is lost or when the NTB vhost driver is removed.
+ */
+static void ntb_vhost_link_cleanup(struct ntb_vhost *ntb)
+{
+ cancel_delayed_work_sync(&ntb->link_work);
+}
+
+/* ntb_vhost_link_cleanup_work - Cleanup once link to the remote host is lost
+ * @work: The work_struct holding the ntb_vhost_link_cleanup_work() function
+ * that is scheduled
+ *
+ * Performs the cleanup that has to be done once the link to the remote host
+ * is lost. This acts as a wrapper to ntb_vhost_link_cleanup() for the cleanup
+ * operation.
+ */
+static void ntb_vhost_link_cleanup_work(struct work_struct *work)
+{
+ struct ntb_vhost *ntb;
+
+ ntb = container_of(work, struct ntb_vhost, link_cleanup);
+ ntb_vhost_link_cleanup(ntb);
+}
+
+/* ntb_vhost_link_work - Initialization once link to the remote host is
+ * established
+ * @work: The work_struct holding the ntb_vhost_link_work() function that is
+ * scheduled
+ *
+ * Performs the NTB vhost initialization that has to be done once the link to
+ * the remote host is established. Initializes the scratchpad registers with
+ * data required for the remote NTB virtio driver to establish communication
+ * with this vhost driver.
+ */
+static void ntb_vhost_link_work(struct work_struct *work)
+{
+ struct vhost_dev *vdev;
+ struct ntb_vhost *ntb;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ u64 virtio_addr;
+ u32 type;
+
+ ntb = container_of(work, struct ntb_vhost, link_work.work);
+ vdev = &ntb->vdev;
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ /*
+ * Device will be registered when vhost client driver is linked to
+ * vhost transport device driver in configfs.
+ */
+ if (!device_is_registered(&vdev->dev))
+ goto out;
+
+ /*
+ * This is unlikely to happen if "vhost" configfs is used for
+ * registering vhost device.
+ */
+ if (vdev->id.device == 0 && vdev->id.vendor == 0) {
+ dev_err(dev, "vhost device is registered without valid ID\n");
+ goto out;
+ }
+
+ ntb_spad_write(ndev, VHOST_VENDORID, vdev->id.vendor);
+ ntb_spad_write(ndev, VHOST_DEVICEID, vdev->id.device);
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VIRTIO_TYPE, TYPE_VHOST);
+
+ type = ntb_spad_read(ndev, VIRTIO_TYPE);
+ if (type != TYPE_VIRTIO)
+ goto out;
+
+ virtio_addr = ntb_spad_read(ndev, VIRTIO_MW0_UPPER_ADDR);
+ virtio_addr <<= 32;
+ virtio_addr |= ntb_spad_read(ndev, VIRTIO_MW0_LOWER_ADDR);
+ ntb->virtio_addr = virtio_addr;
+
+ INIT_DELAYED_WORK(&ntb->cmd_handler, ntb_vhost_cmd_handler);
+ queue_work(kntbvhost_workqueue, &ntb->cmd_handler.work);
+
+ return;
+
+out:
+ if (ntb_link_is_up(ndev, NULL, NULL) == 1)
+ schedule_delayed_work(&ntb->link_work,
+ msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+}
+
+/* ntb_vhost_event_callback - Callback to link event interrupt
+ * @data: Private data specific to NTB vhost driver
+ *
+ * Callback function from NTB HW driver whenever both the hosts in the NTB
+ * setup has invoked ntb_link_enable().
+ */
+static void ntb_vhost_event_callback(void *data)
+{
+ struct ntb_vhost *ntb;
+ struct ntb_dev *ndev;
+
+ ntb = data;
+ ndev = ntb->ndev;
+
+ if (ntb_link_is_up(ntb->ndev, NULL, NULL) == 1)
+ schedule_delayed_work(&ntb->link_work, 0);
+ else
+ schedule_work(&ntb->link_cleanup);
+}
+
+/* ntb_vhost_vq_db_callback - Callback to doorbell interrupt to handle vhost
+ * virtqueue work
+ * @data: Private data specific to NTB vhost driver
+ * @vector: Doorbell vector on which interrupt is received
+ *
+ * Callback function from NTB HW driver whenever remote virtio driver has sent
+ * a notification using doorbell. This schedules work corresponding to the
+ * virtqueue for which notification has been received.
+ */
+static void ntb_vhost_vq_db_callback(void *data, int vector)
+{
+ struct ntb_vhost_queue *vqueue;
+ struct ntb_vhost *ntb;
+
+ ntb = data;
+ vqueue = &ntb->vqueue[vector - 1];
+
+ schedule_delayed_work(&vqueue->db_handler, 0);
+}
+
+static const struct ntb_ctx_ops ntb_vhost_ops = {
+ .link_event = ntb_vhost_event_callback,
+ .db_event = ntb_vhost_vq_db_callback,
+};
+
+/* ntb_vhost_configure_mw - Get memory window address and size
+ * @ntb: NTB vhost device that communicates with the remote virtio device
+ *
+ * Get address and size of memory window 0 and update the size of
+ * memory window 0 to scratchpad register in order for virtio driver
+ * to get the memory window size.
+ *
+ * TODO: Add support for multiple memory windows.
+ */
+static int ntb_vhost_configure_mw(struct ntb_vhost *ntb)
+{
+ struct ntb_dev *ndev;
+ struct device *dev;
+ int ret;
+
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ ret = ntb_peer_mw_get_addr(ndev, 0, &ntb->peer_mw_addr, &ntb->peer_mw_size);
+ if (ret) {
+ dev_err(dev, "Failed to get memory window address\n");
+ return ret;
+ }
+
+ ntb_spad_write(ndev, VHOST_MW0_SIZE_LOWER, lower_32_bits(ntb->peer_mw_size));
+ ntb_spad_write(ndev, VHOST_MW0_SIZE_UPPER, upper_32_bits(ntb->peer_mw_size));
+
+ return 0;
+}
+
+/* ntb_vhost_probe - Initialize struct ntb_vhost when a new NTB device is
+ * created
+ * @client: struct ntb_client * representing the ntb vhost client driver
+ * @ndev: NTB device created by NTB HW driver
+ *
+ * Probe function to initialize struct ntb_vhost when a new NTB device is
+ * created. Also get the supported MW0 size and MW0 address and write the MW0
+ * address to the self scratchpad for the remote NTB virtio driver to read.
+ */
+static int ntb_vhost_probe(struct ntb_client *self, struct ntb_dev *ndev)
+{
+ struct device *dev = &ndev->dev;
+ struct config_group *group;
+ struct vhost_dev *vdev;
+ struct ntb_vhost *ntb;
+ int ret;
+
+ ntb = kzalloc(sizeof(*ntb), GFP_KERNEL);
+ if (!ntb)
+ return -ENOMEM;
+
+ ntb->ndev = ndev;
+ ntb->dev = dev;
+
+ ret = ntb_vhost_configure_mw(ntb);
+ if (ret) {
+ dev_err(dev, "Failed to configure memory window\n");
+ goto err;
+ }
+
+ ret = ntb_set_ctx(ndev, ntb, &ntb_vhost_ops);
+ if (ret) {
+ dev_err(dev, "Failed to set NTB vhost context\n");
+ goto err;
+ }
+
+ vdev = &ntb->vdev;
+ vdev->dev.parent = dev;
+ vdev->dev.release = ntb_vhost_release;
+ vdev->ops = &ops;
+
+ group = vhost_cfs_add_device_item(vdev);
+ if (IS_ERR(group)) {
+ dev_err(dev, "Failed to add configfs entry for vhost device\n");
+ goto err;
+ }
+
+ ntb->group = group;
+
+ INIT_DELAYED_WORK(&ntb->link_work, ntb_vhost_link_work);
+ INIT_WORK(&ntb->link_cleanup, ntb_vhost_link_cleanup_work);
+
+ ntb_link_enable(ndev, NTB_SPEED_AUTO, NTB_WIDTH_AUTO);
+
+ return 0;
+
+err:
+ kfree(ntb);
+
+ return ret;
+}
+
+/* ntb_vhost_free - Free the initializations performed by ntb_vhost_probe()
+ * @client: struct ntb_client * representing the ntb vhost client driver
+ * @ndev: NTB device created by NTB HW driver
+ *
+ * Free the initializations performed by ntb_vhost_probe().
+ */
+void ntb_vhost_free(struct ntb_client *client, struct ntb_dev *ndev)
+{
+ struct config_group *group;
+ struct vhost_dev *vdev;
+ struct ntb_vhost *ntb;
+
+ ntb = ndev->ctx;
+ vdev = &ntb->vdev;
+ group = ntb->group;
+
+ ntb_vhost_link_cleanup(ntb);
+ ntb_link_disable(ndev);
+ ntb_vhost_del_vqs(vdev);
+ vhost_cfs_remove_device_item(group);
+ if (device_is_registered(&vdev->dev))
+ vhost_unregister_device(vdev);
+}
+
+static struct ntb_client ntb_vhost_client = {
+ .ops = {
+ .probe = ntb_vhost_probe,
+ .remove = ntb_vhost_free,
+ },
+};
+
+static int __init ntb_vhost_init(void)
+{
+ int ret;
+
+ kntbvhost_workqueue = alloc_workqueue("kntbvhost", WQ_MEM_RECLAIM |
+ WQ_HIGHPRI, 0);
+ if (!kntbvhost_workqueue) {
+ pr_err("Failed to allocate kntbvhost_workqueue\n");
+ return -ENOMEM;
+ }
+
+ ret = ntb_register_client(&ntb_vhost_client);
+ if (ret) {
+ pr_err("Failed to register ntb vhost driver --> %d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+module_init(ntb_vhost_init);
+
+static void __exit ntb_vhost_exit(void)
+{
+ ntb_unregister_client(&ntb_vhost_client);
+ destroy_workqueue(kntbvhost_workqueue);
+}
+module_exit(ntb_vhost_exit);
+
+MODULE_DESCRIPTION("NTB VHOST Driver");
+MODULE_AUTHOR("Kishon Vijay Abraham I <[email protected]>");
+MODULE_LICENSE("GPL v2");
--
2.17.1

2020-07-02 08:25:59

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 16/22] samples/rpmsg: Wait for address to be bound to rpdev for sending message

rpmsg_client_sample can use either virtio_rpmsg_bus or vhost_rpmsg_bus
to send messages. In the case of vhost_rpmsg_bus, the destination
address of rpdev will be assigned only after receiving address service
notification from the remote virtio_rpmsg_bus. Wait for address to be
bound to rpdev (rpmsg_client_sample running in vhost system) before
sending messages to the remote virtio_rpmsg_bus.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
samples/rpmsg/rpmsg_client_sample.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/samples/rpmsg/rpmsg_client_sample.c b/samples/rpmsg/rpmsg_client_sample.c
index 514a51945d69..07149d7fbd0c 100644
--- a/samples/rpmsg/rpmsg_client_sample.c
+++ b/samples/rpmsg/rpmsg_client_sample.c
@@ -57,10 +57,14 @@ static void rpmsg_sample_send_msg_work(struct work_struct *work)
struct rpmsg_device *rpdev = idata->rpdev;
int ret;

- /* send a message to our remote processor */
- ret = rpmsg_send(rpdev->ept, MSG, strlen(MSG));
- if (ret)
- dev_err(&rpdev->dev, "rpmsg_send failed: %d\n", ret);
+ if (rpdev->dst != RPMSG_ADDR_ANY) {
+ /* send a message to our remote processor */
+ ret = rpmsg_send(rpdev->ept, MSG, strlen(MSG));
+ if (ret)
+ dev_err(&rpdev->dev, "rpmsg_send failed: %d\n", ret);
+ } else {
+ schedule_delayed_work(&idata->send_msg_work, msecs_to_jiffies(50));
+ }
}

static int rpmsg_sample_probe(struct rpmsg_device *rpdev)
--
2.17.1

2020-07-02 08:26:27

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 10/22] rpmsg: virtio_rpmsg_bus: Add Address Service Notification support

Add support to send address service notification message to the backend
rpmsg device. This informs the backend rpmsg device about the address
allocated to the channel that is created in response to the name service
announce message from backend rpmsg device. This is in preparation to
add backend rpmsg device using VHOST framework in Linux.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/rpmsg/virtio_rpmsg_bus.c | 92 +++++++++++++++++++++++++++++---
1 file changed, 85 insertions(+), 7 deletions(-)

diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index 2d0d42084ac0..19d930c9fc2c 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -71,6 +71,7 @@ struct virtproc_info {

/* The feature bitmap for virtio rpmsg */
#define VIRTIO_RPMSG_F_NS 0 /* RP supports name service notifications */
+#define VIRTIO_RPMSG_F_AS 1 /* RP supports address service notifications */

/**
* struct rpmsg_hdr - common header for all rpmsg messages
@@ -110,6 +111,26 @@ struct rpmsg_ns_msg {
u32 flags;
} __packed;

+/**
+ * struct rpmsg_as_msg - dynamic address service announcement message
+ * @name: name of the created channel
+ * @dst: destination address to be used by the backend rpdev
+ * @src: source address of the backend rpdev (the one that sent name service
+ * announcement message)
+ * @flags: indicates whether service is created or destroyed
+ *
+ * This message is sent (by virtio_rpmsg_bus) when a new channel is created
+ * in response to name service announcement message by backend rpdev to create
+ * a new channel. This sends the allocated source address for the channel
+ * (destination address for the backend rpdev) to the backend rpdev.
+ */
+struct rpmsg_as_msg {
+ char name[RPMSG_NAME_SIZE];
+ u32 dst;
+ u32 src;
+ u32 flags;
+} __packed;
+
/**
* enum rpmsg_ns_flags - dynamic name service announcement flags
*
@@ -119,6 +140,19 @@ struct rpmsg_ns_msg {
enum rpmsg_ns_flags {
RPMSG_NS_CREATE = 0,
RPMSG_NS_DESTROY = 1,
+ RPMSG_AS_ANNOUNCE = 2,
+};
+
+/**
+ * enum rpmsg_as_flags - dynamic address service announcement flags
+ *
+ * @RPMSG_AS_ASSIGN: address has been assigned to the newly created channel
+ * @RPMSG_AS_FREE: assigned address is freed from the channel and no longer can
+ * be used
+ */
+enum rpmsg_as_flags {
+ RPMSG_AS_ASSIGN = 1,
+ RPMSG_AS_FREE = 2,
};

/**
@@ -164,6 +198,9 @@ struct virtio_rpmsg_channel {
/* Address 53 is reserved for advertising remote services */
#define RPMSG_NS_ADDR (53)

+/* Address 54 is reserved for advertising address services */
+#define RPMSG_AS_ADDR (54)
+
static void virtio_rpmsg_destroy_ept(struct rpmsg_endpoint *ept);
static int virtio_rpmsg_send(struct rpmsg_endpoint *ept, void *data, int len);
static int virtio_rpmsg_sendto(struct rpmsg_endpoint *ept, void *data, int len,
@@ -329,9 +366,11 @@ static int virtio_rpmsg_announce_create(struct rpmsg_device *rpdev)
struct device *dev = &rpdev->dev;
int err = 0;

+ if (!rpdev->ept || !rpdev->announce)
+ return err;
+
/* need to tell remote processor's name service about this channel ? */
- if (rpdev->announce && rpdev->ept &&
- virtio_has_feature(vrp->vdev, VIRTIO_RPMSG_F_NS)) {
+ if (virtio_has_feature(vrp->vdev, VIRTIO_RPMSG_F_NS)) {
struct rpmsg_ns_msg nsm;

strncpy(nsm.name, rpdev->id.name, RPMSG_NAME_SIZE);
@@ -343,6 +382,23 @@ static int virtio_rpmsg_announce_create(struct rpmsg_device *rpdev)
dev_err(dev, "failed to announce service %d\n", err);
}

+ /*
+ * need to tell remote processor's address service about the address allocated
+ * to this channel
+ */
+ if (virtio_has_feature(vrp->vdev, VIRTIO_RPMSG_F_AS)) {
+ struct rpmsg_as_msg asmsg;
+
+ strncpy(asmsg.name, rpdev->id.name, RPMSG_NAME_SIZE);
+ asmsg.dst = rpdev->src;
+ asmsg.src = rpdev->dst;
+ asmsg.flags = RPMSG_AS_ASSIGN;
+
+ err = rpmsg_sendto(rpdev->ept, &asmsg, sizeof(asmsg), RPMSG_AS_ADDR);
+ if (err)
+ dev_err(dev, "failed to announce service %d\n", err);
+ }
+
return err;
}

@@ -353,9 +409,28 @@ static int virtio_rpmsg_announce_destroy(struct rpmsg_device *rpdev)
struct device *dev = &rpdev->dev;
int err = 0;

+ if (!rpdev->ept || !rpdev->announce)
+ return err;
+
+ /*
+ * need to tell remote processor's address service that we're freeing
+ * the address allocated to this channel
+ */
+ if (virtio_has_feature(vrp->vdev, VIRTIO_RPMSG_F_AS)) {
+ struct rpmsg_as_msg asmsg;
+
+ strncpy(asmsg.name, rpdev->id.name, RPMSG_NAME_SIZE);
+ asmsg.dst = rpdev->src;
+ asmsg.src = rpdev->dst;
+ asmsg.flags = RPMSG_AS_FREE;
+
+ err = rpmsg_sendto(rpdev->ept, &asmsg, sizeof(asmsg), RPMSG_AS_ADDR);
+ if (err)
+ dev_err(dev, "failed to announce service %d\n", err);
+ }
+
/* tell remote processor's name service we're removing this channel */
- if (rpdev->announce && rpdev->ept &&
- virtio_has_feature(vrp->vdev, VIRTIO_RPMSG_F_NS)) {
+ if (virtio_has_feature(vrp->vdev, VIRTIO_RPMSG_F_NS)) {
struct rpmsg_ns_msg nsm;

strncpy(nsm.name, rpdev->id.name, RPMSG_NAME_SIZE);
@@ -390,7 +465,8 @@ static void virtio_rpmsg_release_device(struct device *dev)
* channels.
*/
static struct rpmsg_device *rpmsg_create_channel(struct virtproc_info *vrp,
- struct rpmsg_channel_info *chinfo)
+ struct rpmsg_channel_info *chinfo,
+ bool announce)
{
struct virtio_rpmsg_channel *vch;
struct rpmsg_device *rpdev;
@@ -424,7 +500,8 @@ static struct rpmsg_device *rpmsg_create_channel(struct virtproc_info *vrp,
* rpmsg server channels has predefined local address (for now),
* and their existence needs to be announced remotely
*/
- rpdev->announce = rpdev->src != RPMSG_ADDR_ANY;
+ if (rpdev->src != RPMSG_ADDR_ANY || announce)
+ rpdev->announce = true;

strncpy(rpdev->id.name, chinfo->name, RPMSG_NAME_SIZE);

@@ -873,7 +950,7 @@ static int rpmsg_ns_cb(struct rpmsg_device *rpdev, void *data, int len,
if (ret)
dev_err(dev, "rpmsg_destroy_channel failed: %d\n", ret);
} else {
- newch = rpmsg_create_channel(vrp, &chinfo);
+ newch = rpmsg_create_channel(vrp, &chinfo, msg->flags & RPMSG_AS_ANNOUNCE);
if (!newch)
dev_err(dev, "rpmsg_create_channel failed\n");
}
@@ -1042,6 +1119,7 @@ static struct virtio_device_id id_table[] = {

static unsigned int features[] = {
VIRTIO_RPMSG_F_NS,
+ VIRTIO_RPMSG_F_AS,
};

static struct virtio_driver virtio_ipc_driver = {
--
2.17.1

2020-07-02 08:26:36

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 17/22] rpmsg.txt: Add Documentation to configure rpmsg using configfs

Add Documentation on how rpmsg device can be created using
configfs required for vhost_rpmsg_bus.c

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
Documentation/rpmsg.txt | 56 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)

diff --git a/Documentation/rpmsg.txt b/Documentation/rpmsg.txt
index 24b7a9e1a5f9..0e0a32b2cb66 100644
--- a/Documentation/rpmsg.txt
+++ b/Documentation/rpmsg.txt
@@ -339,3 +339,59 @@ by the bus, and can then start sending messages to the remote service.

The plan is also to add static creation of rpmsg channels via the virtio
config space, but it's not implemented yet.
+
+Configuring rpmsg using configfs
+================================
+
+Usually a rpmsg_device is created when the virtproc driver (virtio_rpmsg_bus.c)
+receives a name service notification from the remote core. However there could
+also be cases where the user should be given the ability to create rpmsg_device
+(like in the case of vhost_rpmsg_bus.c) where vhost_rpmsg_bus should be
+responsible for sending name service notification. For such cases, configfs
+provides an ability to the user for binding a rpmsg_client_driver with virtproc
+device in order to create rpmsg_device.
+
+Two configfs directories are added for configuring rpmsg
+::
+
+ # ls /sys/kernel/config/rpmsg/
+ channel virtproc
+
+channel: Whenever a new rpmsg_driver is registered with rpmsg core, a new
+sub-directory will be created for each entry provided in rpmsg_device_id
+table of rpmsg_driver.
+
+For instance when rpmsg_sample_client is installed, it'll create the following
+entry in the mounted configfs directory
+::
+
+ # ls /sys/kernel/config/rpmsg/channel/
+ rpmsg-client-sample
+
+virtproc: A virtproc device can choose to add an entry in this directory.
+Virtproc device adds an entry if it has to allow user to control creation of
+rpmsg device. (e.g vhost_rpmsg_bus.c)
+::
+
+ # ls /sys/kernel/config/rpmsg/virtproc/
+ vhost0
+
+
+The first step in allowing the user to create rpmsg device is to create a
+sub-directory rpmsg-client-sample. For each rpmsg_device, the user would like
+to create, a separate subdirectory has to be created.
+::
+
+ # mkdir /sys/kernel/config/rpmsg/channel/rpmsg-client-sample/c1
+
+The next step is to link the created sub-directory with virtproc device to
+create rpmsg device.
+::
+
+ # ln -s /sys/kernel/config/rpmsg/channel/rpmsg-client-sample/c1 \
+ /sys/kernel/config/rpmsg/virtproc/vhost0
+
+This will create rpmsg_device. However the driver will not register the
+rpmsg device until it receives the VIRTIO_CONFIG_S_DRIVER_OK (in the case
+of vhost_rpmsg_bus.c) as it can access virtio buffers only after
+VIRTIO_CONFIG_S_DRIVER_OK is set.
--
2.17.1

2020-07-02 08:26:47

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 18/22] virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint device

Add VIRTIO driver to support Linux VHOST on Configurable PCIe Endpoint
device in the backend.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/virtio/Kconfig | 9 +
drivers/virtio/Makefile | 1 +
drivers/virtio/virtio_pci_epf.c | 670 ++++++++++++++++++++++++++++++++
include/linux/virtio.h | 3 +
4 files changed, 683 insertions(+)
create mode 100644 drivers/virtio/virtio_pci_epf.c

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 69a32dfc318a..4b251f8482ae 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -43,6 +43,15 @@ config VIRTIO_PCI_LEGACY

If unsure, say Y.

+config VIRTIO_PCI_EPF
+ bool "Support for virtio device on configurable PCIe Endpoint"
+ depends on VIRTIO_PCI
+ help
+ Select this configuration option to enable the VIRTIO driver
+ for configurable PCIe Endpoint running Linux in backend.
+
+ If in doubt, say "N" to disable VIRTIO PCI EPF driver.
+
config VIRTIO_VDPA
tristate "vDPA driver for virtio devices"
depends on VDPA
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 29a1386ecc03..08a158365d5e 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
+virtio_pci-$(CONFIG_VIRTIO_PCI_EPF) += virtio_pci_epf.o
obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
diff --git a/drivers/virtio/virtio_pci_epf.c b/drivers/virtio/virtio_pci_epf.c
new file mode 100644
index 000000000000..20e53869f179
--- /dev/null
+++ b/drivers/virtio/virtio_pci_epf.c
@@ -0,0 +1,670 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Virtio PCI driver - Configurable PCIe Endpoint device Support
+ *
+ * The Configurable PCIe Endpoint device is present on an another system running
+ * Linux and configured using drivers/pci/endpoint/functions/pci-epf-vhost.c
+ *
+ * Copyright (C) 2020 Texas Instruments
+ * Author: Kishon Vijay Abraham I <[email protected]>
+ */
+
+#include <linux/delay.h>
+#include "virtio_pci_common.h"
+
+#define DRV_MODULE_NAME "virtio-pci-epf"
+
+#define HOST_FEATURES_LOWER 0x00
+#define HOST_FEATURES_UPPER 0x04
+#define GUEST_FEATURES_LOWER 0x08
+#define GUEST_FEATURES_UPPER 0x0C
+#define MSIX_CONFIG 0x10
+#define NUM_QUEUES 0x12
+#define DEVICE_STATUS 0x14
+#define CONFIG_GENERATION 0x15
+#define ISR 0x16
+#define HOST_CMD 0x1A
+enum host_cmd {
+ HOST_CMD_NONE,
+ HOST_CMD_SET_STATUS,
+ HOST_CMD_FINALIZE_FEATURES,
+ HOST_CMD_RESET,
+};
+
+#define HOST_CMD_STATUS 0x1B
+enum host_cmd_status {
+ HOST_CMD_STATUS_NONE,
+ HOST_CMD_STATUS_OKAY,
+ HOST_CMD_STATUS_ERROR,
+};
+
+#define QUEUE_BASE 0x1C
+#define CMD 0x00
+enum queue_cmd {
+ QUEUE_CMD_NONE,
+ QUEUE_CMD_ACTIVATE,
+ QUEUE_CMD_DEACTIVATE,
+ QUEUE_CMD_NOTIFY,
+};
+
+#define CMD_STATUS 0x01
+enum queue_cmd_status {
+ QUEUE_CMD_STATUS_NONE,
+ QUEUE_CMD_STATUS_OKAY,
+ QUEUE_CMD_STATUS_ERROR,
+};
+
+#define STATUS 0x2
+#define STATUS_ACTIVE BIT(0)
+
+#define NUM_BUFFERS 0x04
+#define MSIX_VECTOR 0x06
+#define ADDR_LOWER 0x08
+#define ADDR_UPPER 0x0C
+#define QUEUE_CMD(n) (QUEUE_BASE + CMD + (n) * 16)
+#define QUEUE_CMD_STATUS(n) (QUEUE_BASE + CMD_STATUS + (n) * 16)
+#define QUEUE_STATUS(n) (QUEUE_BASE + STATUS + (n) * 16)
+#define QUEUE_NUM_BUFFERS(n) (QUEUE_BASE + NUM_BUFFERS + (n) * 16)
+#define QUEUE_MSIX_VECTOR(n) (QUEUE_BASE + MSIX_VECTOR + (n) * 16)
+#define QUEUE_ADDR_LOWER(n) (QUEUE_BASE + ADDR_LOWER + (n) * 16)
+#define QUEUE_ADDR_UPPER(n) (QUEUE_BASE + ADDR_UPPER + (n) * 16)
+
+#define DEVICE_CFG_SPACE 0x110
+
+#define COMMAND_TIMEOUT 1000 /* 1 Sec */
+
+struct virtio_pci_epf {
+ /* mutex to protect sending commands to EPF vhost */
+ struct mutex lock;
+ struct virtio_pci_device vp_dev;
+};
+
+#define to_virtio_pci_epf(dev) container_of((dev), struct virtio_pci_epf, vp_dev)
+
+/* virtio_pci_epf_send_command - Send commands to the remote EPF device running
+ * vhost driver
+ * @vp_dev: Virtio PCIe device that communicates with the endpoint device
+ * @command: The command that has to be sent to the remote endpoint device
+ *
+ * Helper function to send commands to the remote endpoint function device
+ * running vhost driver.
+ */
+static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
+ u32 command)
+{
+ struct virtio_pci_epf *pci_epf;
+ void __iomem *ioaddr;
+ ktime_t timeout;
+ bool timedout;
+ int ret = 0;
+ u8 status;
+
+ pci_epf = to_virtio_pci_epf(vp_dev);
+ ioaddr = vp_dev->ioaddr;
+
+ mutex_lock(&pci_epf->lock);
+ writeb(command, ioaddr + HOST_CMD);
+ timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
+ while (1) {
+ timedout = ktime_after(ktime_get(), timeout);
+ status = readb(ioaddr + HOST_CMD_STATUS);
+
+ if (status == HOST_CMD_STATUS_ERROR) {
+ ret = -EINVAL;
+ break;
+ }
+
+ if (status == HOST_CMD_STATUS_OKAY)
+ break;
+
+ if (WARN_ON(timedout)) {
+ ret = -ETIMEDOUT;
+ break;
+ }
+
+ usleep_range(5, 10);
+ }
+
+ writeb(HOST_CMD_STATUS_NONE, ioaddr + HOST_CMD_STATUS);
+ mutex_unlock(&pci_epf->lock);
+
+ return ret;
+}
+
+/* virtio_pci_epf_send_queue_command - Send commands to the remote EPF device
+ * for configuring virtqueue
+ * @vp_dev: Virtio PCIe device that communicates with the endpoint device
+ * @vq: The virtqueue that has to be configured on the remote endpoint device
+ * @command: The command that has to be sent to the remote endpoint device
+ *
+ * Helper function to send commands to the remote endpoint function device for
+ * configuring the virtqueue.
+ */
+static int virtio_pci_epf_send_queue_command(struct virtio_pci_device *vp_dev,
+ struct virtqueue *vq, u8 command)
+{
+ void __iomem *ioaddr;
+ ktime_t timeout;
+ bool timedout;
+ int ret = 0;
+ u8 status;
+
+ ioaddr = vp_dev->ioaddr;
+
+ mutex_lock(&vq->lock);
+ writeb(command, ioaddr + QUEUE_CMD(vq->index));
+ timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
+ while (1) {
+ timedout = ktime_after(ktime_get(), timeout);
+ status = readb(ioaddr + QUEUE_CMD_STATUS(vq->index));
+
+ if (status == QUEUE_CMD_STATUS_ERROR) {
+ ret = -EINVAL;
+ break;
+ }
+
+ if (status == QUEUE_CMD_STATUS_OKAY)
+ break;
+
+ if (WARN_ON(timedout)) {
+ ret = -ETIMEDOUT;
+ break;
+ }
+
+ usleep_range(5, 10);
+ }
+
+ writeb(QUEUE_CMD_STATUS_NONE, ioaddr + QUEUE_CMD_STATUS(vq->index));
+ mutex_unlock(&vq->lock);
+
+ return ret;
+}
+
+/* virtio_pci_epf_get_features - virtio_config_ops to get EPF vhost device
+ * features
+ * @vdev: Virtio device that communicates with the remote EPF vhost device
+ *
+ * virtio_config_ops to get EPF vhost device features. The device features
+ * are accessed using BAR mapped registers.
+ */
+static u64 virtio_pci_epf_get_features(struct virtio_device *vdev)
+{
+ struct virtio_pci_device *vp_dev;
+ void __iomem *ioaddr;
+ u64 features;
+
+ vp_dev = to_vp_device(vdev);
+ ioaddr = vp_dev->ioaddr;
+
+ features = readl(ioaddr + HOST_FEATURES_UPPER);
+ features <<= 32;
+ features |= readl(ioaddr + HOST_FEATURES_LOWER);
+
+ return features;
+}
+
+/* virtio_pci_epf_finalize_features - virtio_config_ops to finalize features
+ * with remote EPF vhost device
+ * @vdev: Virtio device that communicates with the remote vhost device
+ *
+ * Indicate the negotiated features to the remote EPF vhost device by sending
+ * HOST_CMD_FINALIZE_FEATURES command.
+ */
+static int virtio_pci_epf_finalize_features(struct virtio_device *vdev)
+{
+ struct virtio_pci_device *vp_dev;
+ void __iomem *ioaddr;
+ struct device *dev;
+ int ret;
+
+ vp_dev = to_vp_device(vdev);
+ dev = &vp_dev->pci_dev->dev;
+ ioaddr = vp_dev->ioaddr;
+
+ /* Give virtio_ring a chance to accept features. */
+ vring_transport_features(vdev);
+
+ writel(lower_32_bits(vdev->features), ioaddr + GUEST_FEATURES_LOWER);
+ writel(upper_32_bits(vdev->features), ioaddr + GUEST_FEATURES_UPPER);
+
+ ret = virtio_pci_epf_send_command(vp_dev, HOST_CMD_FINALIZE_FEATURES);
+ if (ret) {
+ dev_err(dev, "Failed to set configuration event vector\n");
+ return VIRTIO_MSI_NO_VECTOR;
+ }
+
+ return 0;
+}
+
+/* virtio_pci_epf_get - Copy the device configuration space data from
+ * EPF device to buffer provided by virtio driver
+ * @vdev: Virtio device that communicates with the remote vhost device
+ * @offset: Offset in the device configuration space
+ * @buf: Buffer address from virtio driver where configuration space
+ * data has to be copied
+ * @len: Length of the data from device configuration space to be copied
+ *
+ * Copy the device configuration space data to buffer provided by virtio
+ * driver.
+ */
+static void virtio_pci_epf_get(struct virtio_device *vdev, unsigned int offset,
+ void *buf, unsigned int len)
+{
+ struct virtio_pci_device *vp_dev;
+ void __iomem *ioaddr;
+
+ vp_dev = to_vp_device(vdev);
+ ioaddr = vp_dev->ioaddr;
+
+ memcpy_fromio(buf, ioaddr + DEVICE_CFG_SPACE + offset, len);
+}
+
+/* virtio_pci_epf_set - Copy the device configuration space data from buffer
+ * provided by virtio driver to EPF device
+ * @vdev: Virtio device that communicates with the remote vhost device
+ * @offset: Offset in the device configuration space
+ * @buf: Buffer address provided by virtio driver which has the configuration
+ * space data to be copied
+ * @len: Length of the data from device configuration space to be copied
+ *
+ * Copy the device configuration space data from buffer provided by virtio
+ * driver to the EPF device.
+ */
+static void virtio_pci_epf_set(struct virtio_device *vdev, unsigned int offset,
+ const void *buf, unsigned int len)
+{
+ struct virtio_pci_device *vp_dev;
+ void __iomem *ioaddr;
+
+ vp_dev = to_vp_device(vdev);
+ ioaddr = vp_dev->ioaddr;
+
+ memcpy_toio(ioaddr + DEVICE_CFG_SPACE + offset, buf, len);
+}
+
+/* virtio_pci_epf_get_status - EPF virtio_config_ops to get device status
+ * @vdev: Virtio device that communicates with the remote vhost device
+ *
+ * EPF virtio_config_ops to get device status. The remote EPF vhost device
+ * populates the vhost device status in BAR mapped region.
+ */
+static u8 virtio_pci_epf_get_status(struct virtio_device *vdev)
+{
+ struct virtio_pci_device *vp_dev;
+ void __iomem *ioaddr;
+
+ vp_dev = to_vp_device(vdev);
+ ioaddr = vp_dev->ioaddr;
+
+ return readb(ioaddr + DEVICE_STATUS);
+}
+
+/* virtio_pci_epf_set_status - EPF virtio_config_ops to set device status
+ * @vdev: Virtio device that communicates with the remote vhost device
+ *
+ * EPF virtio_config_ops to set device status. This function updates the
+ * status in scratchpad register and sends a notification to the vhost
+ * device using HOST_CMD_SET_STATUS command.
+ */
+static void virtio_pci_epf_set_status(struct virtio_device *vdev, u8 status)
+{
+ struct virtio_pci_device *vp_dev;
+ void __iomem *ioaddr;
+ struct device *dev;
+ int ret;
+
+ vp_dev = to_vp_device(vdev);
+ dev = &vp_dev->pci_dev->dev;
+ ioaddr = vp_dev->ioaddr;
+
+ /* We should never be setting status to 0. */
+ if (WARN_ON(!status))
+ return;
+
+ writeb(status, ioaddr + DEVICE_STATUS);
+
+ ret = virtio_pci_epf_send_command(vp_dev, HOST_CMD_SET_STATUS);
+ if (ret)
+ dev_err(dev, "Failed to set device status\n");
+}
+
+/* virtio_pci_epf_reset - EPF virtio_config_ops to reset the device
+ * @vdev: Virtio device that communicates with the remote vhost device
+ *
+ * EPF virtio_config_ops to reset the device. This sends HOST_CMD_RESET
+ * command to reset the device.
+ */
+static void virtio_pci_epf_reset(struct virtio_device *vdev)
+{
+ struct virtio_pci_device *vp_dev;
+ void __iomem *ioaddr;
+ struct device *dev;
+ int ret;
+
+ vp_dev = to_vp_device(vdev);
+ dev = &vp_dev->pci_dev->dev;
+ ioaddr = vp_dev->ioaddr;
+
+ ret = virtio_pci_epf_send_command(vp_dev, HOST_CMD_RESET);
+ if (ret)
+ dev_err(dev, "Failed to reset device\n");
+}
+
+/* virtio_pci_epf_config_vector - virtio_pci_device ops to set config vector
+ * @vp_dev: Virtio PCI device managed by virtio_pci_common.c
+ *
+ * virtio_pci_device ops to set config vector. This writes the config MSI-X
+ * vector to the BAR mapped region.
+ */
+static u16 virtio_pci_epf_config_vector(struct virtio_pci_device *vp_dev,
+ u16 vector)
+{
+ void __iomem *ioaddr = vp_dev->ioaddr;
+
+ writew(vector, ioaddr + MSIX_CONFIG);
+
+ return readw(ioaddr + MSIX_CONFIG);
+}
+
+/* virtio_pci_epf_notify - Send notification to the remote vhost virtqueue
+ * @vq: The local virtio virtqueue corresponding to the remote vhost virtqueue
+ * where the notification has to be sent
+ *
+ * Send notification to the remote vhost virtqueue by using QUEUE_CMD_NOTIFY
+ * command.
+ */
+static bool virtio_pci_epf_notify(struct virtqueue *vq)
+{
+ struct virtio_pci_device *vp_dev;
+ struct device *dev;
+ int ret;
+
+ vp_dev = vq->priv;
+ dev = &vp_dev->pci_dev->dev;
+
+ ret = virtio_pci_epf_send_queue_command(vp_dev, vq, QUEUE_CMD_NOTIFY);
+ if (ret) {
+ dev_err(dev, "Notifying virtqueue: %d Failed\n", vq->index);
+ return false;
+ }
+
+ return true;
+}
+
+/* virtio_pci_epf_setup_vq - Configure virtqueue
+ * @vp_dev: Virtio PCI device managed by virtio_pci_common.c
+ * @info: Wrapper to virtqueue to maintain list of all queues
+ * @callback: Callback function that has to be associated with virtqueue
+ * @vq: The local virtio virtqueue corresponding to the remote vhost virtqueue
+ * where the notification has to be sent
+ *
+ * Configure virtqueue with the number of buffers provided by EPF vhost device
+ * and associate a callback function for each of these virtqueues.
+ */
+static struct virtqueue *
+virtio_pci_epf_setup_vq(struct virtio_pci_device *vp_dev,
+ struct virtio_pci_vq_info *info, unsigned int index,
+ void (*callback)(struct virtqueue *vq),
+ const char *name, bool ctx, u16 msix_vec)
+{
+ u16 queue_num_buffers;
+ void __iomem *ioaddr;
+ struct virtqueue *vq;
+ struct device *dev;
+ dma_addr_t vq_addr;
+ u16 status;
+ int err;
+
+ dev = &vp_dev->pci_dev->dev;
+ ioaddr = vp_dev->ioaddr;
+
+ status = readw(ioaddr + QUEUE_STATUS(index));
+ if (status & STATUS_ACTIVE) {
+ dev_err(dev, "Virtqueue %d is already active\n", index);
+ return ERR_PTR(-ENOENT);
+ }
+
+ queue_num_buffers = readw(ioaddr + QUEUE_NUM_BUFFERS(index));
+ if (!queue_num_buffers) {
+ dev_err(dev, "Virtqueue %d is not available\n", index);
+ return ERR_PTR(-ENOENT);
+ }
+
+ info->msix_vector = msix_vec;
+
+ vq = vring_create_virtqueue(index, queue_num_buffers,
+ VIRTIO_PCI_VRING_ALIGN, &vp_dev->vdev, true,
+ false, ctx, virtio_pci_epf_notify, callback,
+ name);
+ if (!vq) {
+ dev_err(dev, "Failed to create Virtqueue %d\n", index);
+ return ERR_PTR(-ENOMEM);
+ }
+ mutex_init(&vq->lock);
+
+ vq_addr = virtqueue_get_desc_addr(vq);
+ writel(lower_32_bits(vq_addr), ioaddr + QUEUE_ADDR_LOWER(index));
+ writel(upper_32_bits(vq_addr), ioaddr + QUEUE_ADDR_UPPER(index));
+
+ vq->priv = vp_dev;
+ writew(QUEUE_CMD_ACTIVATE, ioaddr + QUEUE_CMD(index));
+
+ err = virtio_pci_epf_send_queue_command(vp_dev, vq, QUEUE_CMD_ACTIVATE);
+ if (err) {
+ dev_err(dev, "Failed to activate Virtqueue %d\n", index);
+ goto out_del_vq;
+ }
+
+ if (msix_vec != VIRTIO_MSI_NO_VECTOR)
+ writew(msix_vec, ioaddr + QUEUE_MSIX_VECTOR(index));
+
+ return vq;
+
+out_del_vq:
+ vring_del_virtqueue(vq);
+
+ return ERR_PTR(err);
+}
+
+/* virtio_pci_epf_del_vq - Free memory allocated for virtio virtqueues
+ * @info: Wrapper to virtqueue to maintain list of all queues
+ *
+ * Free memory allocated for a virtqueue represented by @info
+ */
+static void virtio_pci_epf_del_vq(struct virtio_pci_vq_info *info)
+{
+ struct virtio_pci_device *vp_dev;
+ struct virtqueue *vq;
+ void __iomem *ioaddr;
+ unsigned int index;
+
+ vq = info->vq;
+ vp_dev = to_vp_device(vq->vdev);
+ ioaddr = vp_dev->ioaddr;
+ index = vq->index;
+
+ if (vp_dev->msix_enabled)
+ writew(VIRTIO_MSI_NO_VECTOR, ioaddr + QUEUE_MSIX_VECTOR(index));
+
+ writew(QUEUE_CMD_DEACTIVATE, ioaddr + QUEUE_CMD(index));
+ vring_del_virtqueue(vq);
+}
+
+static const struct virtio_config_ops virtio_pci_epf_config_ops = {
+ .get = virtio_pci_epf_get,
+ .set = virtio_pci_epf_set,
+ .get_status = virtio_pci_epf_get_status,
+ .set_status = virtio_pci_epf_set_status,
+ .reset = virtio_pci_epf_reset,
+ .find_vqs = vp_find_vqs,
+ .del_vqs = vp_del_vqs,
+ .get_features = virtio_pci_epf_get_features,
+ .finalize_features = virtio_pci_epf_finalize_features,
+ .bus_name = vp_bus_name,
+ .set_vq_affinity = vp_set_vq_affinity,
+ .get_vq_affinity = vp_get_vq_affinity,
+};
+
+/* virtio_pci_epf_release_dev - Callback function to free device
+ * @dev: Device in virtio_device that has to be freed
+ *
+ * Callback function from device core invoked to free the device after
+ * all references have been removed. This frees the allocated memory for
+ * struct virtio_pci_epf.
+ */
+static void virtio_pci_epf_release_dev(struct device *dev)
+{
+ struct virtio_pci_device *vp_dev;
+ struct virtio_pci_epf *pci_epf;
+ struct virtio_device *vdev;
+
+ vdev = dev_to_virtio(dev);
+ vp_dev = to_vp_device(vdev);
+ pci_epf = to_virtio_pci_epf(vp_dev);
+
+ kfree(pci_epf);
+}
+
+/* virtio_pci_epf_probe - Initialize struct virtio_pci_epf when a new PCIe
+ * device is created
+ * @pdev: The pci_dev that is created by the PCIe core during enumeration
+ * @id: pci_device_id of the @pdev
+ *
+ * Probe function to initialize struct virtio_pci_epf when a new PCIe device is
+ * created.
+ */
+static int virtio_pci_epf_probe(struct pci_dev *pdev,
+ const struct pci_device_id *id)
+{
+ struct virtio_pci_device *vp_dev, *reg_dev = NULL;
+ struct virtio_pci_epf *pci_epf;
+ struct device *dev;
+ int err;
+
+ if (pci_is_bridge(pdev))
+ return -ENODEV;
+
+ pci_epf = kzalloc(sizeof(*pci_epf), GFP_KERNEL);
+ if (!pci_epf)
+ return -ENOMEM;
+
+ dev = &pdev->dev;
+ vp_dev = &pci_epf->vp_dev;
+ vp_dev->vdev.dev.parent = dev;
+ vp_dev->vdev.dev.release = virtio_pci_epf_release_dev;
+ vp_dev->pci_dev = pdev;
+ INIT_LIST_HEAD(&vp_dev->virtqueues);
+ spin_lock_init(&vp_dev->lock);
+ mutex_init(&pci_epf->lock);
+
+ err = pci_enable_device(pdev);
+ if (err) {
+ dev_err(dev, "Cannot enable PCI device\n");
+ goto err_enable_device;
+ }
+
+ err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
+ if (err) {
+ dev_err(dev, "Cannot set DMA mask\n");
+ goto err_dma_set_mask;
+ }
+
+ err = pci_request_regions(pdev, DRV_MODULE_NAME);
+ if (err) {
+ dev_err(dev, "Cannot obtain PCI resources\n");
+ goto err_dma_set_mask;
+ }
+
+ pci_set_master(pdev);
+
+ vp_dev->ioaddr = pci_ioremap_bar(pdev, 0);
+ if (!vp_dev->ioaddr) {
+ dev_err(dev, "Failed to read BAR0\n");
+ goto err_ioremap;
+ }
+
+ pci_set_drvdata(pdev, vp_dev);
+ vp_dev->isr = vp_dev->ioaddr + ISR;
+
+ /*
+ * we use the subsystem vendor/device id as the virtio vendor/device
+ * id. this allows us to use the same PCI vendor/device id for all
+ * virtio devices and to identify the particular virtio driver by
+ * the subsystem ids
+ */
+ vp_dev->vdev.id.vendor = pdev->subsystem_vendor;
+ vp_dev->vdev.id.device = pdev->subsystem_device;
+
+ vp_dev->vdev.config = &virtio_pci_epf_config_ops;
+
+ vp_dev->config_vector = virtio_pci_epf_config_vector;
+ vp_dev->setup_vq = virtio_pci_epf_setup_vq;
+ vp_dev->del_vq = virtio_pci_epf_del_vq;
+
+ err = register_virtio_device(&vp_dev->vdev);
+ reg_dev = vp_dev;
+ if (err) {
+ dev_err(dev, "Failed to register VIRTIO device\n");
+ goto err_register_virtio;
+ }
+
+ return 0;
+
+err_register_virtio:
+ pci_iounmap(pdev, vp_dev->ioaddr);
+
+err_ioremap:
+ pci_release_regions(pdev);
+
+err_dma_set_mask:
+ pci_disable_device(pdev);
+
+err_enable_device:
+ if (reg_dev)
+ put_device(&vp_dev->vdev.dev);
+ else
+ kfree(vp_dev);
+
+ return err;
+}
+
+/* virtio_pci_epf_remove - Free the initializations performed by virtio_pci_epf_probe()
+ * @pdev: The pci_dev that is created by the PCIe core during enumeration
+ *
+ * Free the initializations performed by virtio_pci_epf_probe().
+ */
+void virtio_pci_epf_remove(struct pci_dev *pdev)
+{
+ struct virtio_pci_device *vp_dev;
+
+ vp_dev = pci_get_drvdata(pdev);
+
+ unregister_virtio_device(&vp_dev->vdev);
+ pci_iounmap(pdev, vp_dev->ioaddr);
+ pci_release_regions(pdev);
+ pci_disable_device(pdev);
+}
+
+static const struct pci_device_id virtio_pci_epf_table[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA74x),
+ },
+ { PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA72x),
+ },
+ { PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_J721E),
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(pci, virtio_pci_epf_table);
+
+static struct pci_driver virtio_pci_epf_driver = {
+ .name = DRV_MODULE_NAME,
+ .id_table = virtio_pci_epf_table,
+ .probe = virtio_pci_epf_probe,
+ .remove = virtio_pci_epf_remove,
+ .sriov_configure = pci_sriov_configure_simple,
+};
+module_pci_driver(virtio_pci_epf_driver);
+
+MODULE_DESCRIPTION("VIRTIO PCI EPF DRIVER");
+MODULE_AUTHOR("Kishon Vijay Abraham I <[email protected]>");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index a493eac08393..0eb51b31545d 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -19,6 +19,7 @@
* @priv: a pointer for the virtqueue implementation to use.
* @index: the zero-based ordinal number for this queue.
* @num_free: number of elements we expect to be able to fit.
+ * @mutex: mutex to protect concurrent access to the same queue
*
* A note on @num_free: with indirect buffers, each buffer needs one
* element in the queue, otherwise a buffer will need one element per
@@ -31,6 +32,8 @@ struct virtqueue {
struct virtio_device *vdev;
unsigned int index;
unsigned int num_free;
+ /* mutex to protect concurrent access to the queue while configuring */
+ struct mutex lock;
void *priv;
};

--
2.17.1

2020-07-02 08:26:50

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 08/22] rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when reading messages

Since rpmsg_recv_done() reads messages in a while loop, disable
callbacks until the while loop exits. This helps to get rid of the
annoying "uhm, incoming signal, but no used buffer ?" message.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/rpmsg/virtio_rpmsg_bus.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index 376ebbf880d6..2d0d42084ac0 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -777,6 +777,7 @@ static void rpmsg_recv_done(struct virtqueue *rvq)
return;
}

+ virtqueue_disable_cb(rvq);
while (msg) {
err = rpmsg_recv_single(vrp, dev, msg, len);
if (err)
@@ -786,6 +787,19 @@ static void rpmsg_recv_done(struct virtqueue *rvq)

msg = virtqueue_get_buf(rvq, &len);
}
+ virtqueue_enable_cb(rvq);
+
+ /*
+ * Try to read message one more time in case a new message is submitted
+ * after virtqueue_get_buf() inside the while loop but before enabling
+ * callbacks
+ */
+ msg = virtqueue_get_buf(rvq, &len);
+ if (msg) {
+ err = rpmsg_recv_single(vrp, dev, msg, len);
+ if (!err)
+ msgs_received++;
+ }

dev_dbg(dev, "Received %u messages\n", msgs_received);

--
2.17.1

2020-07-02 08:27:17

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO functionality

Add a new NTB client driver to implement VIRTIO functionality. When two
hosts are connected using NTB, one of the hosts should run NTB client
driver that implements VIRTIO functionality and the other host should
run NTB client implements VHOST functionality. This interfaces with
VIRTIO layer so that any virtio client driver can exchange data with
the remote vhost client driver.

Since each NTB host can only expose fewer contiguous memory range to
the remote NTB host (number of memory windows supported), reserve
contiguous memory range using dma_alloc_coherent() and then manage
this area using gen_pool for providing buffers to the virtio client
driver. The virtio client driver should only provide this buffer
region to the remote vhost driver.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/ntb/Kconfig | 9 +
drivers/ntb/Makefile | 1 +
drivers/ntb/ntb_virtio.c | 853 +++++++++++++++++++++++++++++++++++++++
drivers/ntb/ntb_virtio.h | 56 +++
4 files changed, 919 insertions(+)
create mode 100644 drivers/ntb/ntb_virtio.c
create mode 100644 drivers/ntb/ntb_virtio.h

diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index df16c755b4da..e171b3256f68 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -37,4 +37,13 @@ config NTB_TRANSPORT

If unsure, say N.

+config NTB_VIRTIO
+ tristate "NTB VIRTIO"
+ help
+ The NTB virtio driver sits between the NTB HW driver and the virtio
+ client driver and lets the virtio client driver to exchange data with
+ the remote vhost driver over the NTB hardware.
+
+ If unsure, say N.
+
endif # NTB
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 3a6fa181ff99..d37ab488bcbc 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_NTB) += ntb.o hw/ test/
obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
+obj-$(CONFIG_NTB_VIRTIO) += ntb_virtio.o

ntb-y := core.o
ntb-$(CONFIG_NTB_MSI) += msi.o
diff --git a/drivers/ntb/ntb_virtio.c b/drivers/ntb/ntb_virtio.c
new file mode 100644
index 000000000000..10fbe189ab8b
--- /dev/null
+++ b/drivers/ntb/ntb_virtio.c
@@ -0,0 +1,853 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * NTB Client Driver to implement VIRTIO functionality
+ *
+ * Copyright (C) 2020 Texas Instruments
+ * Author: Kishon Vijay Abraham I <[email protected]>
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/genalloc.h>
+#include <linux/module.h>
+#include <linux/ntb.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_pci.h>
+#include <linux/virtio_ring.h>
+#include <linux/vringh.h>
+
+#include "ntb_virtio.h"
+
+#define BUFFER_OFFSET 0x20000
+
+struct ntb_virtio_queue {
+ struct delayed_work db_handler;
+ struct virtqueue *vq;
+};
+
+struct ntb_virtio {
+ struct ntb_virtio_queue vqueue[MAX_VQS];
+ struct work_struct link_cleanup;
+ struct delayed_work link_work;
+ struct virtio_device vdev;
+ struct gen_pool *gen_pool;
+ dma_addr_t mw_phys_addr;
+ struct virtqueue **vqs;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ /* mutex to protect sending commands to ntb vhost */
+ struct mutex lock;
+ void *mw_addr;
+ u64 mw_size;
+};
+
+#define to_ntb_virtio(v) container_of((v), struct ntb_virtio, vdev)
+
+/* ntb_virtio_send_command - Send commands to the remote NTB vhost device
+ * @ntb: NTB virtio device that communicates with the remote vhost device
+ * @command: The command that has to be sent to the remote vhost device
+ *
+ * Helper function to send commands to the remote NTB vhost device.
+ */
+static int ntb_virtio_send_command(struct ntb_virtio *ntb, u32 command)
+{
+ struct ntb_dev *ndev;
+ ktime_t timeout;
+ bool timedout;
+ int ret = 0;
+ u8 status;
+
+ ndev = ntb->ndev;
+
+ mutex_lock(&ntb->lock);
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VHOST_COMMAND, command);
+ timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
+ while (1) {
+ timedout = ktime_after(ktime_get(), timeout);
+ status = ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX,
+ VHOST_COMMAND_STATUS);
+ if (status == HOST_CMD_STATUS_ERROR) {
+ ret = -EINVAL;
+ break;
+ }
+
+ if (status == HOST_CMD_STATUS_OKAY)
+ break;
+
+ if (WARN_ON(timedout)) {
+ ret = -ETIMEDOUT;
+ break;
+ }
+
+ usleep_range(5, 10);
+ }
+
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VHOST_COMMAND_STATUS,
+ HOST_CMD_STATUS_NONE);
+ mutex_unlock(&ntb->lock);
+
+ return ret;
+}
+
+/* ntb_virtio_get_features - virtio_config_ops to get vhost device features
+ * @vdev: Virtio device that communicates with remove vhost device
+ *
+ * virtio_config_ops to get vhost device features. The remote vhost device
+ * populates the vhost device features in scratchpad register.
+ */
+static u64 ntb_virtio_get_features(struct virtio_device *vdev)
+{
+ struct ntb_virtio *ntb;
+ struct ntb_dev *ndev;
+ u64 val;
+
+ ntb = to_ntb_virtio(vdev);
+ ndev = ntb->ndev;
+
+ val = ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX, VHOST_FEATURES_UPPER);
+ val <<= 32;
+ val |= ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX, VHOST_FEATURES_LOWER);
+
+ return val;
+}
+
+/* ntb_virtio_finalize_features - virtio_config_ops to finalize features with
+ * remote vhost device
+ * @vdev: Virtio device that communicates with remove vhost device
+ *
+ * Indicate the negotiated features to the remote vhost device by sending
+ * HOST_CMD_FINALIZE_FEATURES command.
+ */
+static int ntb_virtio_finalize_features(struct virtio_device *vdev)
+{
+ struct ntb_virtio *ntb;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ int ret;
+
+ ntb = to_ntb_virtio(vdev);
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ /* Give virtio_ring a chance to accept features. */
+ vring_transport_features(vdev);
+
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VIRTIO_FEATURES_LOWER,
+ lower_32_bits(vdev->features));
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VIRTIO_FEATURES_UPPER,
+ upper_32_bits(vdev->features));
+
+ ret = ntb_virtio_send_command(ntb, HOST_CMD_FINALIZE_FEATURES);
+ if (ret) {
+ dev_err(dev, "Failed to set configuration event vector\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/* ntb_virtio_get_status - virtio_config_ops to get device status
+ * @vdev: Virtio device that communicates with remove vhost device
+ *
+ * virtio_config_ops to get device status. The remote vhost device
+ * populates the vhost device status in scratchpad register.
+ */
+static u8 ntb_virtio_get_status(struct virtio_device *vdev)
+{
+ struct ntb_virtio *ntb;
+ struct ntb_dev *ndev;
+
+ ntb = to_ntb_virtio(vdev);
+ ndev = ntb->ndev;
+
+ return ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX, VHOST_DEVICE_STATUS);
+}
+
+/* ntb_virtio_set_status - virtio_config_ops to set device status
+ * @vdev: Virtio device that communicates with remove vhost device
+ *
+ * virtio_config_ops to set device status. This function updates the
+ * status in scratchpad register and sends a notification to the vhost
+ * device using HOST_CMD_SET_STATUS command.
+ */
+static void ntb_virtio_set_status(struct virtio_device *vdev, u8 status)
+{
+ struct ntb_virtio *ntb;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ int ret;
+
+ ntb = to_ntb_virtio(vdev);
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ /* We should never be setting status to 0. */
+ if (WARN_ON(!status))
+ return;
+
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VHOST_DEVICE_STATUS,
+ status);
+
+ ret = ntb_virtio_send_command(ntb, HOST_CMD_SET_STATUS);
+ if (ret)
+ dev_err(dev, "Failed to set device status\n");
+}
+
+/* ntb_virtio_vq_db_work - Handle doorbell event receive for a virtqueue
+ * @work: The work_struct holding the ntb_virtio_vq_db_work() function for every
+ * created virtqueue
+ *
+ * This function is invoked when the remote vhost driver sends a notification
+ * to the virtqueue. (vhost_virtqueue_kick() on the remote vhost driver). This
+ * function invokes the virtio client driver's virtqueue callback.
+ */
+static void ntb_virtio_vq_db_work(struct work_struct *work)
+{
+ struct ntb_virtio_queue *vqueue;
+ struct virtqueue *vq;
+
+ vqueue = container_of(work, struct ntb_virtio_queue, db_handler.work);
+ vq = vqueue->vq;
+
+ if (!vq->callback)
+ return;
+
+ vq->callback(vq);
+}
+
+/* ntb_virtio_notify - Send notification to the remote vhost virtqueue
+ * @vq: The local virtio virtqueue corresponding to the remote vhost virtqueue
+ * where the notification has to be sent
+ *
+ * Use NTB doorbell to send notification for the remote vhost virtqueue.
+ */
+bool ntb_virtio_notify(struct virtqueue *vq)
+{
+ struct ntb_virtio *ntb;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ int ret;
+
+ ntb = vq->priv;
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ ret = ntb_peer_db_set(ntb->ndev, vq->index);
+ if (ret) {
+ dev_err(dev, "Failed to notify remote virtqueue\n");
+ return false;
+ }
+
+ return true;
+}
+
+/* ntb_virtio_find_vq - Find a virtio virtqueue and instantiate it
+ * @vdev: Virtio device that communicates with remove vhost device
+ * @index: Index of the vhost virtqueue
+ * @callback: Callback function that has to be associated with the created
+ * virtqueue
+ *
+ * Create a new virtio virtqueue which will be used by the remote vhost
+ * to access this virtio device.
+ */
+static struct virtqueue *
+ntb_virtio_find_vq(struct virtio_device *vdev, unsigned int index,
+ void (*callback)(struct virtqueue *vq),
+ const char *name, bool ctx)
+{
+ struct ntb_virtio_queue *vqueue;
+ resource_size_t xlat_align_size;
+ unsigned int vq_size, offset;
+ resource_size_t xlat_align;
+ struct ntb_virtio *ntb;
+ u16 queue_num_buffers;
+ struct ntb_dev *ndev;
+ struct virtqueue *vq;
+ struct device *dev;
+ void *mw_addr;
+ void *vq_addr;
+ int ret;
+
+ ntb = to_ntb_virtio(vdev);
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+ mw_addr = ntb->mw_addr;
+
+ queue_num_buffers = ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX,
+ VHOST_QUEUE_NUM_BUFFERS(index));
+ if (!queue_num_buffers) {
+ dev_err(dev, "Invalid number of buffers\n");
+ return ERR_PTR(-EINVAL);
+ }
+
+ ret = ntb_mw_get_align(ndev, NTB_DEF_PEER_IDX, 0, &xlat_align,
+ &xlat_align_size, NULL);
+ if (ret) {
+ dev_err(dev, "Failed to get memory window align size\n");
+ return ERR_PTR(ret);
+ }
+
+ /* zero vring */
+ vq_size = vring_size(queue_num_buffers, xlat_align);
+ offset = index * vq_size;
+ if (offset + vq_size >= BUFFER_OFFSET) {
+ dev_err(dev, "Not enough memory for allocating vq\n");
+ return ERR_PTR(-ENOMEM);
+ }
+
+ vq_addr = mw_addr + offset;
+ memset(vq_addr, 0, vq_size);
+
+ /*
+ * Create the new vq, and tell virtio we're not interested in
+ * the 'weak' smp barriers, since we're talking with a real device.
+ */
+ vq = vring_new_virtqueue(index, queue_num_buffers, xlat_align, vdev,
+ false, ctx, vq_addr, ntb_virtio_notify,
+ callback, name);
+ if (!vq) {
+ dev_err(dev, "vring_new_virtqueue %s failed\n", name);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ vq->vdev = vdev;
+ vq->priv = ntb;
+
+ vqueue = &ntb->vqueue[index];
+ vqueue->vq = vq;
+
+ INIT_DELAYED_WORK(&vqueue->db_handler, ntb_virtio_vq_db_work);
+
+ return vq;
+}
+
+/* ntb_virtio_find_vqs - Find virtio virtqueues requested by virtio driver and
+ * instantiate them
+ * @vdev: Virtio device that communicates with remove vhost device
+ * @nvqs: The number of virtqueues to be created
+ * @vqs: Array of pointers to the created vhost virtqueues
+ * @callback: Array of callback function that has to be associated with
+ * each of the created virtqueues
+ * @names: Names that should be associated with each virtqueue
+ * @ctx: Context flag to find virtqueue
+ * @desc: Interrupt affinity descriptor
+ *
+ * Find virtio virtqueues requested by virtio driver and instantiate them. The
+ * number of buffers supported by the virtqueue is provided by the vhost
+ * device.
+ */
+static int
+ntb_virtio_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
+ struct virtqueue *vqs[], vq_callback_t *callbacks[],
+ const char * const names[], const bool *ctx,
+ struct irq_affinity *desc)
+{
+ struct ntb_virtio *ntb;
+ struct device *dev;
+ int queue_idx = 0;
+ int i;
+
+ ntb = to_ntb_virtio(vdev);
+ dev = ntb->dev;
+
+ for (i = 0; i < nvqs; ++i) {
+ if (!names[i]) {
+ vqs[i] = NULL;
+ continue;
+ }
+
+ vqs[i] = ntb_virtio_find_vq(vdev, queue_idx++, callbacks[i],
+ names[i], ctx ? ctx[i] : false);
+ if (IS_ERR(vqs[i])) {
+ dev_err(dev, "Failed to find virtqueue\n");
+ return PTR_ERR(vqs[i]);
+ }
+ }
+
+ return 0;
+}
+
+/* ntb_virtio_del_vqs - Free memory allocated for virtio virtqueues
+ * @vdev: Virtio device that communicates with remove vhost device
+ *
+ * Free memory allocated for virtio virtqueues.
+ */
+void ntb_virtio_del_vqs(struct virtio_device *vdev)
+{
+ struct ntb_virtio_queue *vqueue;
+ struct virtqueue *vq, *tmp;
+ struct ntb_virtio *ntb;
+ struct ntb_dev *ndev;
+ int index;
+
+ ntb = to_ntb_virtio(vdev);
+ ndev = ntb->ndev;
+
+ list_for_each_entry_safe(vq, tmp, &vdev->vqs, list) {
+ index = vq->index;
+ vqueue = &ntb->vqueue[index];
+ cancel_delayed_work_sync(&vqueue->db_handler);
+ vring_del_virtqueue(vq);
+ }
+}
+
+/* ntb_virtio_reset - virtio_config_ops to reset the device
+ * @vdev: Virtio device that communicates with remove vhost device
+ *
+ * virtio_config_ops to reset the device. This sends HOST_CMD_RESET
+ * command to reset the device.
+ */
+static void ntb_virtio_reset(struct virtio_device *vdev)
+{
+ struct ntb_virtio *ntb;
+ struct device *dev;
+ int ret;
+
+ ntb = to_ntb_virtio(vdev);
+ dev = ntb->dev;
+
+ ret = ntb_virtio_send_command(ntb, HOST_CMD_RESET);
+ if (ret)
+ dev_err(dev, "Failed to reset device\n");
+}
+
+/* ntb_virtio_get - Copy the device configuration space data to buffer
+ * from virtio driver
+ * @vdev: Virtio device that communicates with remove vhost device
+ * @offset: Offset in the device configuration space
+ * @buf: Buffer address from virtio driver where configuration space
+ * data has to be copied
+ * @len: Length of the data from device configuration space to be copied
+ *
+ * Copy the device configuration space data to buffer from virtio driver.
+ */
+static void ntb_virtio_get(struct virtio_device *vdev, unsigned int offset,
+ void *buf, unsigned int len)
+{
+ unsigned int cfg_offset;
+ struct ntb_virtio *ntb;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ int i, size;
+
+ ntb = to_ntb_virtio(vdev);
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ size = len / 4;
+ for (i = 0; i < size; i++) {
+ cfg_offset = VHOST_DEVICE_CFG_SPACE + i + offset;
+ *(u32 *)buf = ntb_spad_read(ndev, cfg_offset);
+ buf += 4;
+ }
+}
+
+/* ntb_virtio_set - Copy the device configuration space data from buffer
+ * provided by virtio driver
+ * @vdev: Virtio device that communicates with remove vhost device
+ * @offset: Offset in the device configuration space
+ * @buf: Buffer address provided by virtio driver which has the configuration
+ * space data to be copied
+ * @len: Length of the data from device configuration space to be copied
+ *
+ * Copy the device configuration space data from buffer provided by virtio
+ * driver to the device.
+ */
+static void ntb_virtio_set(struct virtio_device *vdev, unsigned int offset,
+ const void *buf, unsigned int len)
+{
+ struct ntb_virtio *ntb;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ int i, size;
+
+ ntb = to_ntb_virtio(vdev);
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ size = len / 4;
+ for (i = 0; i < size; i++) {
+ ntb_spad_write(ndev, VHOST_DEVICE_CFG_SPACE + i, *(u32 *)buf);
+ buf += 4;
+ }
+}
+
+/* ntb_virtio_alloc_buffer - Allocate buffers from specially reserved memory
+ * of virtio which can be accessed by both virtio and vhost
+ * @vdev: Virtio device that communicates with remove vhost device
+ * @size: The size of the memory that has to be allocated
+ *
+ * Allocate buffers from specially reserved memory of virtio which can be
+ * accessed by both virtio and vhost.
+ */
+static void *ntb_virtio_alloc_buffer(struct virtio_device *vdev, size_t size)
+{
+ struct ntb_virtio *ntb;
+ struct gen_pool *pool;
+ struct ntb_dev *ndev;
+ struct device *dev;
+ unsigned long addr;
+
+ ntb = to_ntb_virtio(vdev);
+ pool = ntb->gen_pool;
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ addr = gen_pool_alloc(pool, size);
+ if (!addr) {
+ dev_err(dev, "Failed to allocate memory\n");
+ return NULL;
+ }
+
+ return (void *)addr;
+}
+
+/* ntb_virtio_alloc_buffer - Free buffers allocated using
+ * ntb_virtio_alloc_buffer()
+ * @vdev: Virtio device that communicates with remove vhost device
+ * @addr: Address returned by ntb_virtio_alloc_buffer()
+ * @size: The size of the allocated memory
+ *
+ * Free buffers allocated using ntb_virtio_alloc_buffer().
+ */
+static void ntb_virtio_free_buffer(struct virtio_device *vdev, void *addr,
+ size_t size)
+{
+ struct ntb_virtio *ntb;
+ struct gen_pool *pool;
+ struct ntb_dev *ndev;
+ struct device *dev;
+
+ ntb = to_ntb_virtio(vdev);
+ pool = ntb->gen_pool;
+ ndev = ntb->ndev;
+ dev = ntb->dev;
+
+ gen_pool_free(pool, (unsigned long)addr, size);
+}
+
+static const struct virtio_config_ops ntb_virtio_config_ops = {
+ .get_features = ntb_virtio_get_features,
+ .finalize_features = ntb_virtio_finalize_features,
+ .find_vqs = ntb_virtio_find_vqs,
+ .del_vqs = ntb_virtio_del_vqs,
+ .reset = ntb_virtio_reset,
+ .set_status = ntb_virtio_set_status,
+ .get_status = ntb_virtio_get_status,
+ .get = ntb_virtio_get,
+ .set = ntb_virtio_set,
+ .alloc_buffer = ntb_virtio_alloc_buffer,
+ .free_buffer = ntb_virtio_free_buffer,
+};
+
+/* ntb_virtio_release - Callback function to free device
+ * @dev: Device in virtio_device that has to be freed
+ *
+ * Callback function from device core invoked to free the device after
+ * all references have been removed. This frees the allocated memory for
+ * struct ntb_virtio.
+ */
+static void ntb_virtio_release(struct device *dev)
+{
+ struct virtio_device *vdev;
+ struct ntb_virtio *ntb;
+
+ vdev = dev_to_virtio(dev);
+ ntb = to_ntb_virtio(vdev);
+
+ kfree(ntb);
+}
+
+/* ntb_virtio_link_cleanup - Cleanup once link to the remote host is lost
+ * @ntb: NTB virtio device that communicates with remove vhost device
+ *
+ * Performs the cleanup that has to be done once the link to the remote host
+ * is lost or when the NTB virtio driver is removed.
+ */
+static void ntb_virtio_link_cleanup(struct ntb_virtio *ntb)
+{
+ dma_addr_t mw_phys_addr;
+ struct gen_pool *pool;
+ struct ntb_dev *ndev;
+ struct pci_dev *pdev;
+ void *mw_addr;
+ u64 mw_size;
+
+ ndev = ntb->ndev;
+ pool = ntb->gen_pool;
+ pdev = ndev->pdev;
+ mw_size = ntb->mw_size;
+ mw_addr = ntb->mw_addr;
+ mw_phys_addr = ntb->mw_phys_addr;
+
+ ntb_mw_clear_trans(ndev, 0, 0);
+ gen_pool_destroy(pool);
+ dma_free_coherent(&pdev->dev, mw_size, mw_addr, mw_phys_addr);
+}
+
+/* ntb_virtio_link_cleanup_work - Cleanup once link to the remote host is lost
+ * @work: The work_struct holding the ntb_virtio_link_cleanup_work() function
+ * that is scheduled
+ *
+ * Performs the cleanup that has to be done once the link to the remote host
+ * is lost. This acts as a wrapper to ntb_virtio_link_cleanup() for the cleanup
+ * operation.
+ */
+static void ntb_virtio_link_cleanup_work(struct work_struct *work)
+{
+ struct ntb_virtio *ntb;
+
+ ntb = container_of(work, struct ntb_virtio, link_cleanup);
+ ntb_virtio_link_cleanup(ntb);
+}
+
+/* ntb_virtio_link_work - Initialization once link to the remote host is
+ * established
+ * @work: The work_struct holding the ntb_virtio_link_work() function that is
+ * scheduled
+ *
+ * Performs the NTB virtio initialization that has to be done once the link to
+ * the remote host is established. Reads the initialization data written by
+ * vhost driver (to get memory window size accessible by vhost) and reserves
+ * memory for virtqueues and buffers.
+ */
+static void ntb_virtio_link_work(struct work_struct *work)
+{
+ struct virtio_device *vdev;
+ dma_addr_t mw_phys_addr;
+ struct ntb_virtio *ntb;
+ u32 deviceid, vendorid;
+ struct gen_pool *pool;
+ struct ntb_dev *ndev;
+ struct pci_dev *pdev;
+ struct device *dev;
+ void *mw_addr;
+ u64 mw_size;
+ u32 type;
+ int ret;
+
+ ntb = container_of(work, struct ntb_virtio, link_work.work);
+ ndev = ntb->ndev;
+ pdev = ndev->pdev;
+ dev = &ndev->dev;
+
+ type = ntb_spad_read(ndev, VIRTIO_TYPE);
+ if (type != TYPE_VHOST)
+ goto out;
+
+ mw_size = ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX,
+ VHOST_MW0_SIZE_UPPER);
+ mw_size <<= 32;
+ mw_size |= ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX,
+ VHOST_MW0_SIZE_LOWER);
+ ntb->mw_size = mw_size;
+
+ mw_addr = dma_alloc_coherent(&pdev->dev, mw_size, &mw_phys_addr,
+ GFP_KERNEL);
+ if (!mw_addr)
+ return;
+
+ pool = gen_pool_create(PAGE_SHIFT, -1);
+ if (!pool) {
+ dev_err(dev, "Failed to create gen pool\n");
+ goto err_gen_pool;
+ }
+
+ ret = gen_pool_add_virt(pool, (unsigned long)mw_addr + BUFFER_OFFSET,
+ mw_phys_addr + BUFFER_OFFSET,
+ mw_size - BUFFER_OFFSET, -1);
+ if (ret) {
+ dev_err(dev, "Failed to add memory to the pool\n");
+ goto err_gen_pool_add_virt;
+ }
+
+ ret = ntb_mw_set_trans(ndev, 0, 0, mw_phys_addr, mw_size);
+ if (ret) {
+ dev_err(dev, "Failed to set memory window translation\n");
+ goto err_gen_pool_add_virt;
+ }
+
+ ntb->mw_phys_addr = mw_phys_addr;
+ ntb->mw_addr = mw_addr;
+ ntb->mw_size = mw_size;
+ ntb->gen_pool = pool;
+
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VIRTIO_MW0_LOWER_ADDR,
+ lower_32_bits(mw_phys_addr));
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VIRTIO_MW0_UPPER_ADDR,
+ upper_32_bits(mw_phys_addr));
+
+ ntb_peer_spad_write(ndev, NTB_DEF_PEER_IDX, VIRTIO_TYPE, TYPE_VIRTIO);
+
+ deviceid = ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX, VHOST_DEVICEID);
+ vendorid = ntb_peer_spad_read(ndev, NTB_DEF_PEER_IDX, VHOST_VENDORID);
+
+ vdev = &ntb->vdev;
+ vdev->id.device = deviceid;
+ vdev->id.vendor = vendorid;
+ vdev->config = &ntb_virtio_config_ops,
+ vdev->dev.parent = dev;
+ vdev->dev.release = ntb_virtio_release;
+
+ ret = register_virtio_device(vdev);
+ if (ret) {
+ dev_err(dev, "failed to register vdev: %d\n", ret);
+ goto err_register_virtio;
+ }
+
+ return;
+
+out:
+ if (ntb_link_is_up(ndev, NULL, NULL) == 1)
+ schedule_delayed_work(&ntb->link_work,
+ msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+ return;
+
+err_register_virtio:
+ ntb_mw_clear_trans(ndev, 0, 0);
+
+err_gen_pool_add_virt:
+ gen_pool_destroy(pool);
+
+err_gen_pool:
+ dma_free_coherent(&pdev->dev, mw_size, mw_addr, mw_phys_addr);
+}
+
+/* ntb_vhost_event_callback - Callback to link event interrupt
+ * @data: Private data specific to NTB virtio driver
+ *
+ * Callback function from NTB HW driver whenever both the hosts in the NTB
+ * setup has invoked ntb_link_enable().
+ */
+static void ntb_virtio_event_callback(void *data)
+{
+ struct ntb_virtio *ntb = data;
+
+ if (ntb_link_is_up(ntb->ndev, NULL, NULL) == 1)
+ schedule_delayed_work(&ntb->link_work, 0);
+ else
+ schedule_work(&ntb->link_cleanup);
+}
+
+/* ntb_virtio_vq_db_callback - Callback to doorbell interrupt to handle virtio
+ * virtqueue work
+ * @data: Private data specific to NTB virtio driver
+ * @vector: Doorbell vector on which interrupt is received
+ *
+ * Callback function from NTB HW driver whenever remote vhost driver has sent
+ * a notification using doorbell. This schedules work corresponding to the
+ * virtqueue for which notification has been received.
+ */
+static void ntb_virtio_vq_db_callback(void *data, int vector)
+{
+ struct ntb_virtio_queue *vqueue;
+ struct ntb_virtio *ntb;
+
+ ntb = data;
+ vqueue = &ntb->vqueue[vector - 1];
+
+ schedule_delayed_work(&vqueue->db_handler, 0);
+}
+
+static const struct ntb_ctx_ops ntb_virtio_ops = {
+ .link_event = ntb_virtio_event_callback,
+ .db_event = ntb_virtio_vq_db_callback,
+};
+
+/* ntb_virtio_probe - Initialize struct ntb_virtio when a new NTB device is
+ * created
+ * @client: struct ntb_client * representing the ntb virtio client driver
+ * @ndev: NTB device created by NTB HW driver
+ *
+ * Probe function to initialize struct ntb_virtio when a new NTB device is
+ * created.
+ */
+static int ntb_virtio_probe(struct ntb_client *self, struct ntb_dev *ndev)
+{
+ struct device *dev = &ndev->dev;
+ struct ntb_virtio *ntb;
+ int ret;
+
+ ntb = kzalloc(sizeof(*ntb), GFP_KERNEL);
+ if (!ntb)
+ return -ENOMEM;
+
+ ntb->ndev = ndev;
+ ntb->dev = dev;
+
+ mutex_init(&ntb->lock);
+ INIT_DELAYED_WORK(&ntb->link_work, ntb_virtio_link_work);
+ INIT_WORK(&ntb->link_cleanup, ntb_virtio_link_cleanup_work);
+
+ ret = ntb_set_ctx(ndev, ntb, &ntb_virtio_ops);
+ if (ret) {
+ dev_err(dev, "Failed to set NTB virtio context\n");
+ goto err;
+ }
+
+ ntb_link_enable(ndev, NTB_SPEED_AUTO, NTB_WIDTH_AUTO);
+
+ return 0;
+
+err:
+ kfree(ntb);
+
+ return ret;
+}
+
+/* ntb_virtio_free - Free the initializations performed by ntb_virtio_probe()
+ * @client: struct ntb_client * representing the ntb virtio client driver
+ * @ndev: NTB device created by NTB HW driver
+ *
+ * Free the initializations performed by ntb_virtio_probe().
+ */
+void ntb_virtio_free(struct ntb_client *client, struct ntb_dev *ndev)
+{
+ struct virtio_device *vdev;
+ struct ntb_virtio *ntb;
+
+ ntb = ndev->ctx;
+ vdev = &ntb->vdev;
+
+ ntb_virtio_link_cleanup(ntb);
+ cancel_work_sync(&ntb->link_cleanup);
+ cancel_delayed_work_sync(&ntb->link_work);
+ ntb_link_disable(ndev);
+
+ if (device_is_registered(&vdev->dev))
+ unregister_virtio_device(vdev);
+}
+
+static struct ntb_client ntb_virtio_client = {
+ .ops = {
+ .probe = ntb_virtio_probe,
+ .remove = ntb_virtio_free,
+ },
+};
+
+static int __init ntb_virtio_init(void)
+{
+ int ret;
+
+ ret = ntb_register_client(&ntb_virtio_client);
+ if (ret) {
+ pr_err("Failed to register ntb vhost driver --> %d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+module_init(ntb_virtio_init);
+
+static void __exit ntb_virtio_exit(void)
+{
+ ntb_unregister_client(&ntb_virtio_client);
+}
+module_exit(ntb_virtio_exit);
+
+MODULE_DESCRIPTION("NTB VIRTIO Driver");
+MODULE_AUTHOR("Kishon Vijay Abraham I <[email protected]>");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/ntb/ntb_virtio.h b/drivers/ntb/ntb_virtio.h
new file mode 100644
index 000000000000..bc68ca38f60b
--- /dev/null
+++ b/drivers/ntb/ntb_virtio.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/**
+ * NTB VIRTIO/VHOST Header
+ *
+ * Copyright (C) 2020 Texas Instruments
+ * Author: Kishon Vijay Abraham I <[email protected]>
+ */
+
+#ifndef __LINUX_NTB_VIRTIO_H
+#define __LINUX_NTB_VIRTIO_H
+
+#define VIRTIO_TYPE 0
+enum virtio_type {
+ TYPE_VIRTIO = 1,
+ TYPE_VHOST,
+};
+
+#define VHOST_VENDORID 1
+#define VHOST_DEVICEID 2
+#define VHOST_FEATURES_UPPER 3
+#define VHOST_FEATURES_LOWER 4
+#define VIRTIO_FEATURES_UPPER 5
+#define VIRTIO_FEATURES_LOWER 6
+#define VHOST_MW0_SIZE_LOWER 7
+#define VHOST_MW0_SIZE_UPPER 8
+#define VIRTIO_MW0_LOWER_ADDR 9
+#define VIRTIO_MW0_UPPER_ADDR 10
+#define VHOST_DEVICE_STATUS 11
+#define VHOST_CONFIG_GENERATION 12
+
+#define VHOST_COMMAND 13
+enum host_cmd {
+ HOST_CMD_NONE,
+ HOST_CMD_SET_STATUS,
+ HOST_CMD_FINALIZE_FEATURES,
+ HOST_CMD_RESET,
+};
+
+#define VHOST_COMMAND_STATUS 14
+enum host_cmd_status {
+ HOST_CMD_STATUS_NONE,
+ HOST_CMD_STATUS_OKAY,
+ HOST_CMD_STATUS_ERROR,
+};
+
+#define VHOST_QUEUE_BASE 15
+#define VHOST_QUEUE_NUM_BUFFERS(n) (VHOST_QUEUE_BASE + (n))
+
+#define VHOST_DEVICE_CFG_SPACE 23
+
+#define NTB_LINK_DOWN_TIMEOUT 10 /* 10 milli-sec */
+#define COMMAND_TIMEOUT 1000 /* 1 sec */
+
+#define MAX_VQS 8
+
+#endif /* __LINUX_NTB_VIRTIO_H */
--
2.17.1

2020-07-02 08:27:18

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 06/22] vhost: Introduce configfs entry for configuring VHOST

Create a configfs entry for each entry populated in
"struct vhost_device_id" in the VHOST driver and create a configfs entry
for each VHOST device. This is used to link VHOST driver to VHOST device
(by assigning deviceID and vendorID to the VHOST device) and register
VHOST device, thereby letting VHOST client driver to be selected in the
userspace.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
drivers/vhost/Makefile | 2 +-
drivers/vhost/vhost.c | 63 +++++++
drivers/vhost/vhost_cfs.c | 354 ++++++++++++++++++++++++++++++++++++++
include/linux/vhost.h | 11 ++
4 files changed, 429 insertions(+), 1 deletion(-)
create mode 100644 drivers/vhost/vhost_cfs.c

diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..6520c820c896 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_VHOST_RING) += vringh.o
obj-$(CONFIG_VHOST_VDPA) += vhost_vdpa.o
vhost_vdpa-y := vdpa.o

-obj-$(CONFIG_VHOST) += vhost.o
+obj-$(CONFIG_VHOST) += vhost.o vhost_cfs.o

obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
vhost_iotlb-y := iotlb.o
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8a3ad4698393..539619208783 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -3091,6 +3091,64 @@ static struct bus_type vhost_bus_type = {
.remove = vhost_dev_remove,
};

+/**
+ * vhost_remove_cfs() - Remove configfs directory for vhost driver
+ * @driver: Vhost driver for which configfs directory has to be removed
+ *
+ * Remove configfs directory for vhost driver.
+ */
+static void vhost_remove_cfs(struct vhost_driver *driver)
+{
+ struct config_group *driver_group, *group;
+ struct config_item *item, *tmp;
+
+ driver_group = driver->group;
+
+ list_for_each_entry_safe(item, tmp, &driver_group->cg_children,
+ ci_entry) {
+ group = to_config_group(item);
+ vhost_cfs_remove_driver_item(group);
+ }
+
+ vhost_cfs_remove_driver_group(driver_group);
+}
+
+/**
+ * vhost_add_cfs() - Add configfs directory for vhost driver
+ * @driver: Vhost driver for which configfs directory has to be added
+ *
+ * Add configfs directory for vhost driver.
+ */
+static int vhost_add_cfs(struct vhost_driver *driver)
+{
+ struct config_group *driver_group, *group;
+ const struct vhost_device_id *ids;
+ int ret, i;
+
+ driver_group = vhost_cfs_add_driver_group(driver->driver.name);
+ if (IS_ERR(driver_group))
+ return PTR_ERR(driver_group);
+
+ driver->group = driver_group;
+
+ ids = driver->id_table;
+ for (i = 0; ids[i].device; i++) {
+ group = vhost_cfs_add_driver_item(driver_group, ids[i].vendor,
+ ids[i].device);
+ if (IS_ERR(group)) {
+ ret = PTR_ERR(driver_group);
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ vhost_remove_cfs(driver);
+
+ return ret;
+}
+
/**
* vhost_register_driver() - Register a vhost driver
* @driver: Vhost driver that has to be registered
@@ -3107,6 +3165,10 @@ int vhost_register_driver(struct vhost_driver *driver)
if (ret)
return ret;

+ ret = vhost_add_cfs(driver);
+ if (ret)
+ return ret;
+
return 0;
}
EXPORT_SYMBOL_GPL(vhost_register_driver);
@@ -3119,6 +3181,7 @@ EXPORT_SYMBOL_GPL(vhost_register_driver);
*/
void vhost_unregister_driver(struct vhost_driver *driver)
{
+ vhost_remove_cfs(driver);
driver_unregister(&driver->driver);
}
EXPORT_SYMBOL_GPL(vhost_unregister_driver);
diff --git a/drivers/vhost/vhost_cfs.c b/drivers/vhost/vhost_cfs.c
new file mode 100644
index 000000000000..ae46e71968f1
--- /dev/null
+++ b/drivers/vhost/vhost_cfs.c
@@ -0,0 +1,354 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * configfs to configure VHOST
+ *
+ * Copyright (C) 2020 Texas Instruments
+ * Author: Kishon Vijay Abraham I <[email protected]>
+ */
+
+#include <linux/configfs.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/vhost.h>
+#include <linux/slab.h>
+
+/* VHOST driver like net, scsi etc., */
+static struct config_group *vhost_driver_group;
+
+/* VHOST device like PCIe EP, NTB etc., */
+static struct config_group *vhost_device_group;
+
+struct vhost_driver_item {
+ struct config_group group;
+ u32 vendor;
+ u32 device;
+};
+
+struct vhost_driver_group {
+ struct config_group group;
+};
+
+struct vhost_device_item {
+ struct config_group group;
+ struct vhost_dev *vdev;
+};
+
+static inline
+struct vhost_driver_item *to_vhost_driver_item(struct config_item *item)
+{
+ return container_of(to_config_group(item), struct vhost_driver_item,
+ group);
+}
+
+static inline
+struct vhost_device_item *to_vhost_device_item(struct config_item *item)
+{
+ return container_of(to_config_group(item), struct vhost_device_item,
+ group);
+}
+
+/**
+ * vhost_cfs_device_link() - Create softlink of driver directory to device
+ * directory
+ * @device_item: Represents configfs entry of vhost_dev
+ * @driver_item: Represents configfs of a particular entry of
+ * vhost_device_id table in vhost driver
+ *
+ * Bind a vhost driver to vhost device in order to assign a particular
+ * device ID and vendor ID
+ */
+static int vhost_cfs_device_link(struct config_item *device_item,
+ struct config_item *driver_item)
+{
+ struct vhost_driver_item *vdriver_item;
+ struct vhost_device_item *vdevice_item;
+ struct vhost_dev *vdev;
+ int ret;
+
+ vdriver_item = to_vhost_driver_item(driver_item);
+ vdevice_item = to_vhost_device_item(device_item);
+
+ vdev = vdevice_item->vdev;
+ vdev->id.device = vdriver_item->device;
+ vdev->id.vendor = vdriver_item->vendor;
+
+ ret = vhost_register_device(vdev);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/**
+ * vhost_cfs_device_unlink() - Delete softlink of driver directory from device
+ * directory
+ * @device_item: Represents configfs entry of vhost_dev
+ * @driver_item: Represents configfs of a particular entry of
+ * vhost_device_id table in vhost driver
+ *
+ * Un-bind vhost driver from vhost device.
+ */
+static void vhost_cfs_device_unlink(struct config_item *device_item,
+ struct config_item *driver_item)
+{
+ struct vhost_driver_item *vdriver_item;
+ struct vhost_device_item *vdevice_item;
+ struct vhost_dev *vdev;
+
+ vdriver_item = to_vhost_driver_item(driver_item);
+ vdevice_item = to_vhost_device_item(device_item);
+
+ vdev = vdevice_item->vdev;
+ vhost_unregister_device(vdev);
+}
+
+static struct configfs_item_operations vhost_cfs_device_item_ops = {
+ .allow_link = vhost_cfs_device_link,
+ .drop_link = vhost_cfs_device_unlink,
+};
+
+static const struct config_item_type vhost_cfs_device_item_type = {
+ .ct_item_ops = &vhost_cfs_device_item_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+/**
+ * vhost_cfs_add_device_item() - Create configfs directory for new vhost_dev
+ * @vdev: vhost device for which configfs directory has to be created
+ *
+ * Create configfs directory for new vhost device. Drivers that create
+ * vhost device can invoke this API if they require the vhost device to
+ * be assigned a device ID and vendorID by the user.
+ */
+struct config_group *vhost_cfs_add_device_item(struct vhost_dev *vdev)
+{
+ struct device *dev = &vdev->dev;
+ struct vhost_device_item *vdevice_item;
+ struct config_group *group;
+ const char *name;
+ int ret;
+
+ vdevice_item = kzalloc(sizeof(*vdevice_item), GFP_KERNEL);
+ if (!vdevice_item)
+ return ERR_PTR(-ENOMEM);
+
+ name = dev_name(dev->parent);
+ group = &vdevice_item->group;
+ config_group_init_type_name(group, name, &vhost_cfs_device_item_type);
+
+ ret = configfs_register_group(vhost_device_group, group);
+ if (ret)
+ return ERR_PTR(ret);
+
+ vdevice_item->vdev = vdev;
+
+ return group;
+}
+EXPORT_SYMBOL(vhost_cfs_add_device_item);
+
+/**
+ * vhost_cfs_remove_device_item() - Remove configfs directory for the vhost_dev
+ * @vdev: vhost device for which configfs directory has to be removed
+ *
+ * Remove configfs directory for the vhost device.
+ */
+void vhost_cfs_remove_device_item(struct config_group *group)
+{
+ struct vhost_device_item *vdevice_item;
+
+ if (!group)
+ return;
+
+ vdevice_item = container_of(group, struct vhost_device_item, group);
+ configfs_unregister_group(&vdevice_item->group);
+ kfree(vdevice_item);
+}
+EXPORT_SYMBOL(vhost_cfs_remove_device_item);
+
+static const struct config_item_type vhost_driver_item_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+/**
+ * vhost_cfs_add_driver_item() - Add configfs directory for an entry in
+ * vhost_device_id
+ * @driver_group: configfs directory corresponding to the vhost driver
+ * @vendor: vendor ID populated in vhost_device_id table by vhost driver
+ * @device: device ID populated in vhost_device_id table by vhost driver
+ *
+ * Add configfs directory for each entry in vhost_device_id populated by
+ * vhost driver. Store the device ID and vendor ID in a local data structure
+ * and use it when user links this directory with a vhost device configfs
+ * directory.
+ */
+struct config_group *
+vhost_cfs_add_driver_item(struct config_group *driver_group, u32 vendor,
+ u32 device)
+{
+ struct vhost_driver_item *vdriver_item;
+ struct config_group *group;
+ char name[20];
+ int ret;
+
+ vdriver_item = kzalloc(sizeof(*vdriver_item), GFP_KERNEL);
+ if (!vdriver_item)
+ return ERR_PTR(-ENOMEM);
+
+ vdriver_item->vendor = vendor;
+ vdriver_item->device = device;
+
+ snprintf(name, sizeof(name), "%08x:%08x", vendor, device);
+ group = &vdriver_item->group;
+
+ config_group_init_type_name(group, name, &vhost_driver_item_type);
+ ret = configfs_register_group(driver_group, group);
+ if (ret)
+ return ERR_PTR(ret);
+
+ return group;
+}
+EXPORT_SYMBOL(vhost_cfs_add_driver_item);
+
+/**
+ * vhost_cfs_remove_driver_item() - Remove configfs directory corresponding
+ * to an entry in vhost_device_id
+ * @group: Configfs group corresponding to an entry in vhost_device_id
+ *
+ * Remove configfs directory corresponding to an entry in vhost_device_id
+ */
+void vhost_cfs_remove_driver_item(struct config_group *group)
+{
+ struct vhost_driver_item *vdriver_item;
+
+ if (!group)
+ return;
+
+ vdriver_item = container_of(group, struct vhost_driver_item, group);
+ configfs_unregister_group(&vdriver_item->group);
+ kfree(vdriver_item);
+}
+EXPORT_SYMBOL(vhost_cfs_remove_driver_item);
+
+static const struct config_item_type vhost_driver_group_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+/**
+ * vhost_cfs_add_driver_group() - Add configfs directory for vhost driver
+ * @name: Name of the vhost driver as populated in driver structure
+ *
+ * Add configfs directory for vhost driver.
+ */
+struct config_group *vhost_cfs_add_driver_group(const char *name)
+{
+ struct vhost_driver_group *vdriver_group;
+ struct config_group *group;
+
+ vdriver_group = kzalloc(sizeof(*vdriver_group), GFP_KERNEL);
+ if (!vdriver_group)
+ return ERR_PTR(-ENOMEM);
+
+ group = &vdriver_group->group;
+
+ config_group_init_type_name(group, name, &vhost_driver_group_type);
+ configfs_register_group(vhost_driver_group, group);
+
+ return group;
+}
+EXPORT_SYMBOL(vhost_cfs_add_driver_group);
+
+/**
+ * vhost_cfs_remove_driver_group() - Remove configfs directory for vhost driver
+ * @group: Configfs group corresponding to the vhost driver
+ *
+ * Remove configfs directory for vhost driver.
+ */
+void vhost_cfs_remove_driver_group(struct config_group *group)
+{
+ if (IS_ERR_OR_NULL(group))
+ return;
+
+ configfs_unregister_default_group(group);
+}
+EXPORT_SYMBOL(vhost_cfs_remove_driver_group);
+
+static const struct config_item_type vhost_driver_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+static const struct config_item_type vhost_device_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+static const struct config_item_type vhost_type = {
+ .ct_owner = THIS_MODULE,
+};
+
+static struct configfs_subsystem vhost_cfs_subsys = {
+ .su_group = {
+ .cg_item = {
+ .ci_namebuf = "vhost",
+ .ci_type = &vhost_type,
+ },
+ },
+ .su_mutex = __MUTEX_INITIALIZER(vhost_cfs_subsys.su_mutex),
+};
+
+static int __init vhost_cfs_init(void)
+{
+ int ret;
+ struct config_group *root = &vhost_cfs_subsys.su_group;
+
+ config_group_init(root);
+
+ ret = configfs_register_subsystem(&vhost_cfs_subsys);
+ if (ret) {
+ pr_err("Error %d while registering subsystem %s\n",
+ ret, root->cg_item.ci_namebuf);
+ goto err;
+ }
+
+ vhost_driver_group =
+ configfs_register_default_group(root, "vhost-client",
+ &vhost_driver_type);
+ if (IS_ERR(vhost_driver_group)) {
+ ret = PTR_ERR(vhost_driver_group);
+ pr_err("Error %d while registering channel group\n",
+ ret);
+ goto err_vhost_driver_group;
+ }
+
+ vhost_device_group =
+ configfs_register_default_group(root, "vhost-transport",
+ &vhost_device_type);
+ if (IS_ERR(vhost_device_group)) {
+ ret = PTR_ERR(vhost_device_group);
+ pr_err("Error %d while registering virtproc group\n",
+ ret);
+ goto err_vhost_device_group;
+ }
+
+ return 0;
+
+err_vhost_device_group:
+ configfs_unregister_default_group(vhost_driver_group);
+
+err_vhost_driver_group:
+ configfs_unregister_subsystem(&vhost_cfs_subsys);
+
+err:
+ return ret;
+}
+module_init(vhost_cfs_init);
+
+static void __exit vhost_cfs_exit(void)
+{
+ configfs_unregister_default_group(vhost_device_group);
+ configfs_unregister_default_group(vhost_driver_group);
+ configfs_unregister_subsystem(&vhost_cfs_subsys);
+}
+module_exit(vhost_cfs_exit);
+
+MODULE_DESCRIPTION("PCI VHOST CONFIGFS");
+MODULE_AUTHOR("Kishon Vijay Abraham I <[email protected]>");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/vhost.h b/include/linux/vhost.h
index 8efb9829c1b1..be9341ffd266 100644
--- a/include/linux/vhost.h
+++ b/include/linux/vhost.h
@@ -2,6 +2,7 @@
#ifndef _VHOST_H
#define _VHOST_H

+#include <linux/configfs.h>
#include <linux/eventfd.h>
#include <linux/mm.h>
#include <linux/mutex.h>
@@ -181,6 +182,7 @@ struct vhost_config_ops {
struct vhost_driver {
struct device_driver driver;
struct vhost_device_id *id_table;
+ struct config_group *group;
int (*probe)(struct vhost_dev *dev);
int (*remove)(struct vhost_dev *dev);
};
@@ -233,6 +235,15 @@ void vhost_unregister_driver(struct vhost_driver *driver);
int vhost_register_device(struct vhost_dev *vdev);
void vhost_unregister_device(struct vhost_dev *vdev);

+struct config_group *vhost_cfs_add_driver_group(const char *name);
+void vhost_cfs_remove_driver_group(struct config_group *group);
+struct config_group *
+vhost_cfs_add_driver_item(struct config_group *driver_group, u32 vendor,
+ u32 device);
+void vhost_cfs_remove_driver_item(struct config_group *group);
+struct config_group *vhost_cfs_add_device_item(struct vhost_dev *vdev);
+void vhost_cfs_remove_device_item(struct config_group *group);
+
int vhost_create_vqs(struct vhost_dev *vdev, unsigned int nvqs,
unsigned int num_bufs, struct vhost_virtqueue *vqs[],
vhost_vq_callback_t *callbacks[],
--
2.17.1

2020-07-02 08:28:15

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: [RFC PATCH 15/22] samples/rpmsg: Setup delayed work to send message

Let us not send any message to the remote processor before
announce_create() callback has been invoked. Since announce_create() is
only invoked after ->probe() is completed, setup delayed work to start
sending message to the remote processor.

Signed-off-by: Kishon Vijay Abraham I <[email protected]>
---
samples/rpmsg/rpmsg_client_sample.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/samples/rpmsg/rpmsg_client_sample.c b/samples/rpmsg/rpmsg_client_sample.c
index ae5081662283..514a51945d69 100644
--- a/samples/rpmsg/rpmsg_client_sample.c
+++ b/samples/rpmsg/rpmsg_client_sample.c
@@ -20,6 +20,8 @@ module_param(count, int, 0644);

struct instance_data {
int rx_count;
+ struct delayed_work send_msg_work;
+ struct rpmsg_device *rpdev;
};

static int rpmsg_sample_cb(struct rpmsg_device *rpdev, void *data, int len,
@@ -48,9 +50,21 @@ static int rpmsg_sample_cb(struct rpmsg_device *rpdev, void *data, int len,
return 0;
}

-static int rpmsg_sample_probe(struct rpmsg_device *rpdev)
+static void rpmsg_sample_send_msg_work(struct work_struct *work)
{
+ struct instance_data *idata = container_of(work, struct instance_data,
+ send_msg_work.work);
+ struct rpmsg_device *rpdev = idata->rpdev;
int ret;
+
+ /* send a message to our remote processor */
+ ret = rpmsg_send(rpdev->ept, MSG, strlen(MSG));
+ if (ret)
+ dev_err(&rpdev->dev, "rpmsg_send failed: %d\n", ret);
+}
+
+static int rpmsg_sample_probe(struct rpmsg_device *rpdev)
+{
struct instance_data *idata;

dev_info(&rpdev->dev, "new channel: 0x%x -> 0x%x!\n",
@@ -62,18 +76,18 @@ static int rpmsg_sample_probe(struct rpmsg_device *rpdev)

dev_set_drvdata(&rpdev->dev, idata);

- /* send a message to our remote processor */
- ret = rpmsg_send(rpdev->ept, MSG, strlen(MSG));
- if (ret) {
- dev_err(&rpdev->dev, "rpmsg_send failed: %d\n", ret);
- return ret;
- }
+ idata->rpdev = rpdev;
+ INIT_DELAYED_WORK(&idata->send_msg_work, rpmsg_sample_send_msg_work);
+ schedule_delayed_work(&idata->send_msg_work, msecs_to_jiffies(500));

return 0;
}

static void rpmsg_sample_remove(struct rpmsg_device *rpdev)
{
+ struct instance_data *idata = dev_get_drvdata(&rpdev->dev);
+
+ cancel_delayed_work_sync(&idata->send_msg_work);
dev_info(&rpdev->dev, "rpmsg sample client driver is removed\n");
}

--
2.17.1

2020-07-02 10:13:08

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>> This series enhances Linux Vhost support to enable SoC-to-SoC
>> communication over MMIO. This series enables rpmsg communication between
>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>
>> 1) Modify vhost to use standard Linux driver model
>> 2) Add support in vring to access virtqueue over MMIO
>> 3) Add vhost client driver for rpmsg
>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>> rpmsg communication between two SoCs connected to each other
>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>> between two SoCs connected via NTB
>> 6) Add configfs to configure the components
>>
>> UseCase1 :
>>
>> VHOST RPMSG VIRTIO RPMSG
>> + +
>> | |
>> | |
>> | |
>> | |
>> +-----v------+ +------v-------+
>> | Linux | | Linux |
>> | Endpoint | | Root Complex |
>> | <-----------------> |
>> | | | |
>> | SOC1 | | SOC2 |
>> +------------+ +--------------+
>>
>> UseCase 2:
>>
>> VHOST RPMSG VIRTIO RPMSG
>> + +
>> | |
>> | |
>> | |
>> | |
>> +------v------+ +------v------+
>> | | | |
>> | HOST1 | | HOST2 |
>> | | | |
>> +------^------+ +------^------+
>> | |
>> | |
>> +---------------------------------------------------------------------+
>> | +------v------+ +------v------+ |
>> | | | | | |
>> | | EP | | EP | |
>> | | CONTROLLER1 | | CONTROLLER2 | |
>> | | <-----------------------------------> | |
>> | | | | | |
>> | | | | | |
>> | | | SoC With Multiple EP Instances | | |
>> | | | (Configured using NTB Function) | | |
>> | +-------------+ +-------------+ |
>> +---------------------------------------------------------------------+
>>
>> Software Layering:
>>
>> The high-level SW layering should look something like below. This series
>> adds support only for RPMSG VHOST, however something similar should be
>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>> device, user) can use any of the vhost client driver.
>>
>>
>> +----------------+ +-----------+ +------------+ +----------+
>> | RPMSG VHOST | | NET VHOST | | SCSI VHOST | | X |
>> +-------^--------+ +-----^-----+ +-----^------+ +----^-----+
>> | | | |
>> | | | |
>> | | | |
>> +-----------v-----------------v--------------v--------------v----------+
>> | VHOST CORE |
>> +--------^---------------^--------------------^------------------^-----+
>> | | | |
>> | | | |
>> | | | |
>> +--------v-------+ +----v------+ +----------v----------+ +----v-----+
>> | PCI EPF VHOST | | NTB VHOST | |PLATFORM DEVICE VHOST| | X |
>> +----------------+ +-----------+ +---------------------+ +----------+
>>
>> This was initially proposed here [1]
>>
>> [1] -> https://lore.kernel.org/r/[email protected]
>
> I find this very interesting. A huge patchset so will take a bit
> to review, but I certainly plan to do that. Thanks!


Yes, it would be better if there's a git branch for us to have a look.

Btw, I'm not sure I get the big picture, but I vaguely feel some of the
work is duplicated with vDPA (e.g the epf transport or vhost bus).

Have you considered to implement these through vDPA?

Thanks


>
>> Kishon Vijay Abraham I (22):
>> vhost: Make _feature_ bits a property of vhost device
>> vhost: Introduce standard Linux driver model in VHOST
>> vhost: Add ops for the VHOST driver to configure VHOST device
>> vringh: Add helpers to access vring in MMIO
>> vhost: Add MMIO helpers for operations on vhost virtqueue
>> vhost: Introduce configfs entry for configuring VHOST
>> virtio_pci: Use request_threaded_irq() instead of request_irq()
>> rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>> reading messages
>> rpmsg: Introduce configfs entry for configuring rpmsg
>> rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>> rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>> rpmsg_internal.h
>> virtio: Add ops to allocate and free buffer
>> rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>> virtio_free_buffer()
>> rpmsg: Add VHOST based remote processor messaging bus
>> samples/rpmsg: Setup delayed work to send message
>> samples/rpmsg: Wait for address to be bound to rpdev for sending
>> message
>> rpmsg.txt: Add Documentation to configure rpmsg using configfs
>> virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>> device
>> PCI: endpoint: Add EP function driver to provide VHOST interface
>> NTB: Add a new NTB client driver to implement VIRTIO functionality
>> NTB: Add a new NTB client driver to implement VHOST functionality
>> NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>
>> Documentation/driver-api/ntb.rst | 11 +
>> Documentation/rpmsg.txt | 56 +
>> drivers/ntb/Kconfig | 18 +
>> drivers/ntb/Makefile | 2 +
>> drivers/ntb/ntb_vhost.c | 776 +++++++++++
>> drivers/ntb/ntb_virtio.c | 853 ++++++++++++
>> drivers/ntb/ntb_virtio.h | 56 +
>> drivers/pci/endpoint/functions/Kconfig | 11 +
>> drivers/pci/endpoint/functions/Makefile | 1 +
>> .../pci/endpoint/functions/pci-epf-vhost.c | 1144 ++++++++++++++++
>> drivers/rpmsg/Kconfig | 10 +
>> drivers/rpmsg/Makefile | 3 +-
>> drivers/rpmsg/rpmsg_cfs.c | 394 ++++++
>> drivers/rpmsg/rpmsg_core.c | 7 +
>> drivers/rpmsg/rpmsg_internal.h | 136 ++
>> drivers/rpmsg/vhost_rpmsg_bus.c | 1151 +++++++++++++++++
>> drivers/rpmsg/virtio_rpmsg_bus.c | 184 ++-
>> drivers/vhost/Kconfig | 1 +
>> drivers/vhost/Makefile | 2 +-
>> drivers/vhost/net.c | 10 +-
>> drivers/vhost/scsi.c | 24 +-
>> drivers/vhost/test.c | 17 +-
>> drivers/vhost/vdpa.c | 2 +-
>> drivers/vhost/vhost.c | 730 ++++++++++-
>> drivers/vhost/vhost_cfs.c | 341 +++++
>> drivers/vhost/vringh.c | 332 +++++
>> drivers/vhost/vsock.c | 20 +-
>> drivers/virtio/Kconfig | 9 +
>> drivers/virtio/Makefile | 1 +
>> drivers/virtio/virtio_pci_common.c | 25 +-
>> drivers/virtio/virtio_pci_epf.c | 670 ++++++++++
>> include/linux/mod_devicetable.h | 6 +
>> include/linux/rpmsg.h | 6 +
>> {drivers/vhost => include/linux}/vhost.h | 132 +-
>> include/linux/virtio.h | 3 +
>> include/linux/virtio_config.h | 42 +
>> include/linux/vringh.h | 46 +
>> samples/rpmsg/rpmsg_client_sample.c | 32 +-
>> tools/virtio/virtio_test.c | 2 +-
>> 39 files changed, 7083 insertions(+), 183 deletions(-)
>> create mode 100644 drivers/ntb/ntb_vhost.c
>> create mode 100644 drivers/ntb/ntb_virtio.c
>> create mode 100644 drivers/ntb/ntb_virtio.h
>> create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>> create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>> create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>> create mode 100644 drivers/vhost/vhost_cfs.c
>> create mode 100644 drivers/virtio/virtio_pci_epf.c
>> rename {drivers/vhost => include/linux}/vhost.h (66%)
>>
>> --
>> 2.17.1
>>

2020-07-02 10:28:24

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Michael,

On 7/2/2020 3:21 PM, Michael S. Tsirkin wrote:
> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>> This series enhances Linux Vhost support to enable SoC-to-SoC
>> communication over MMIO. This series enables rpmsg communication between
>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>
>> 1) Modify vhost to use standard Linux driver model
>> 2) Add support in vring to access virtqueue over MMIO
>> 3) Add vhost client driver for rpmsg
>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>> rpmsg communication between two SoCs connected to each other
>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>> between two SoCs connected via NTB
>> 6) Add configfs to configure the components
>>
>> UseCase1 :
>>
>> VHOST RPMSG VIRTIO RPMSG
>> + +
>> | |
>> | |
>> | |
>> | |
>> +-----v------+ +------v-------+
>> | Linux | | Linux |
>> | Endpoint | | Root Complex |
>> | <-----------------> |
>> | | | |
>> | SOC1 | | SOC2 |
>> +------------+ +--------------+
>>
>> UseCase 2:
>>
>> VHOST RPMSG VIRTIO RPMSG
>> + +
>> | |
>> | |
>> | |
>> | |
>> +------v------+ +------v------+
>> | | | |
>> | HOST1 | | HOST2 |
>> | | | |
>> +------^------+ +------^------+
>> | |
>> | |
>> +---------------------------------------------------------------------+
>> | +------v------+ +------v------+ |
>> | | | | | |
>> | | EP | | EP | |
>> | | CONTROLLER1 | | CONTROLLER2 | |
>> | | <-----------------------------------> | |
>> | | | | | |
>> | | | | | |
>> | | | SoC With Multiple EP Instances | | |
>> | | | (Configured using NTB Function) | | |
>> | +-------------+ +-------------+ |
>> +---------------------------------------------------------------------+
>>
>> Software Layering:
>>
>> The high-level SW layering should look something like below. This series
>> adds support only for RPMSG VHOST, however something similar should be
>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>> device, user) can use any of the vhost client driver.
>>
>>
>> +----------------+ +-----------+ +------------+ +----------+
>> | RPMSG VHOST | | NET VHOST | | SCSI VHOST | | X |
>> +-------^--------+ +-----^-----+ +-----^------+ +----^-----+
>> | | | |
>> | | | |
>> | | | |
>> +-----------v-----------------v--------------v--------------v----------+
>> | VHOST CORE |
>> +--------^---------------^--------------------^------------------^-----+
>> | | | |
>> | | | |
>> | | | |
>> +--------v-------+ +----v------+ +----------v----------+ +----v-----+
>> | PCI EPF VHOST | | NTB VHOST | |PLATFORM DEVICE VHOST| | X |
>> +----------------+ +-----------+ +---------------------+ +----------+
>>
>> This was initially proposed here [1]
>>
>> [1] -> https://lore.kernel.org/r/[email protected]
>
>
> I find this very interesting. A huge patchset so will take a bit
> to review, but I certainly plan to do that. Thanks!

Great to hear! Thanks in advance for reviewing!

Regards
Kishon

>
>>
>> Kishon Vijay Abraham I (22):
>> vhost: Make _feature_ bits a property of vhost device
>> vhost: Introduce standard Linux driver model in VHOST
>> vhost: Add ops for the VHOST driver to configure VHOST device
>> vringh: Add helpers to access vring in MMIO
>> vhost: Add MMIO helpers for operations on vhost virtqueue
>> vhost: Introduce configfs entry for configuring VHOST
>> virtio_pci: Use request_threaded_irq() instead of request_irq()
>> rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>> reading messages
>> rpmsg: Introduce configfs entry for configuring rpmsg
>> rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>> rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>> rpmsg_internal.h
>> virtio: Add ops to allocate and free buffer
>> rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>> virtio_free_buffer()
>> rpmsg: Add VHOST based remote processor messaging bus
>> samples/rpmsg: Setup delayed work to send message
>> samples/rpmsg: Wait for address to be bound to rpdev for sending
>> message
>> rpmsg.txt: Add Documentation to configure rpmsg using configfs
>> virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>> device
>> PCI: endpoint: Add EP function driver to provide VHOST interface
>> NTB: Add a new NTB client driver to implement VIRTIO functionality
>> NTB: Add a new NTB client driver to implement VHOST functionality
>> NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>
>> Documentation/driver-api/ntb.rst | 11 +
>> Documentation/rpmsg.txt | 56 +
>> drivers/ntb/Kconfig | 18 +
>> drivers/ntb/Makefile | 2 +
>> drivers/ntb/ntb_vhost.c | 776 +++++++++++
>> drivers/ntb/ntb_virtio.c | 853 ++++++++++++
>> drivers/ntb/ntb_virtio.h | 56 +
>> drivers/pci/endpoint/functions/Kconfig | 11 +
>> drivers/pci/endpoint/functions/Makefile | 1 +
>> .../pci/endpoint/functions/pci-epf-vhost.c | 1144 ++++++++++++++++
>> drivers/rpmsg/Kconfig | 10 +
>> drivers/rpmsg/Makefile | 3 +-
>> drivers/rpmsg/rpmsg_cfs.c | 394 ++++++
>> drivers/rpmsg/rpmsg_core.c | 7 +
>> drivers/rpmsg/rpmsg_internal.h | 136 ++
>> drivers/rpmsg/vhost_rpmsg_bus.c | 1151 +++++++++++++++++
>> drivers/rpmsg/virtio_rpmsg_bus.c | 184 ++-
>> drivers/vhost/Kconfig | 1 +
>> drivers/vhost/Makefile | 2 +-
>> drivers/vhost/net.c | 10 +-
>> drivers/vhost/scsi.c | 24 +-
>> drivers/vhost/test.c | 17 +-
>> drivers/vhost/vdpa.c | 2 +-
>> drivers/vhost/vhost.c | 730 ++++++++++-
>> drivers/vhost/vhost_cfs.c | 341 +++++
>> drivers/vhost/vringh.c | 332 +++++
>> drivers/vhost/vsock.c | 20 +-
>> drivers/virtio/Kconfig | 9 +
>> drivers/virtio/Makefile | 1 +
>> drivers/virtio/virtio_pci_common.c | 25 +-
>> drivers/virtio/virtio_pci_epf.c | 670 ++++++++++
>> include/linux/mod_devicetable.h | 6 +
>> include/linux/rpmsg.h | 6 +
>> {drivers/vhost => include/linux}/vhost.h | 132 +-
>> include/linux/virtio.h | 3 +
>> include/linux/virtio_config.h | 42 +
>> include/linux/vringh.h | 46 +
>> samples/rpmsg/rpmsg_client_sample.c | 32 +-
>> tools/virtio/virtio_test.c | 2 +-
>> 39 files changed, 7083 insertions(+), 183 deletions(-)
>> create mode 100644 drivers/ntb/ntb_vhost.c
>> create mode 100644 drivers/ntb/ntb_virtio.c
>> create mode 100644 drivers/ntb/ntb_virtio.h
>> create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>> create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>> create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>> create mode 100644 drivers/vhost/vhost_cfs.c
>> create mode 100644 drivers/virtio/virtio_pci_epf.c
>> rename {drivers/vhost => include/linux}/vhost.h (66%)
>>
>> --
>> 2.17.1
>>
>

2020-07-02 13:38:48

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Jason,

On 7/2/2020 3:40 PM, Jason Wang wrote:
>
> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>> communication over MMIO. This series enables rpmsg communication between
>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>
>>> 1) Modify vhost to use standard Linux driver model
>>> 2) Add support in vring to access virtqueue over MMIO
>>> 3) Add vhost client driver for rpmsg
>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>     rpmsg communication between two SoCs connected to each other
>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>     between two SoCs connected via NTB
>>> 6) Add configfs to configure the components
>>>
>>> UseCase1 :
>>>
>>>   VHOST RPMSG                     VIRTIO RPMSG
>>>        +                               +
>>>        |                               |
>>>        |                               |
>>>        |                               |
>>>        |                               |
>>> +-----v------+                 +------v-------+
>>> |   Linux    |                 |     Linux    |
>>> |  Endpoint  |                 | Root Complex |
>>> |            <----------------->              |
>>> |            |                 |              |
>>> |    SOC1    |                 |     SOC2     |
>>> +------------+                 +--------------+
>>>
>>> UseCase 2:
>>>
>>>       VHOST RPMSG                                      VIRTIO RPMSG
>>>            +                                                 +
>>>            |                                                 |
>>>            |                                                 |
>>>            |                                                 |
>>>            |                                                 |
>>>     +------v------+                                   +------v------+
>>>     |             |                                   |             |
>>>     |    HOST1    |                                   |    HOST2    |
>>>     |             |                                   |             |
>>>     +------^------+                                   +------^------+
>>>            |                                                 |
>>>            |                                                 |
>>> +---------------------------------------------------------------------+
>>> |  +------v------+                                   +------v------+  |
>>> |  |             |                                   |             |  |
>>> |  |     EP      |                                   |     EP      |  |
>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>> |  |             <----------------------------------->             |  |
>>> |  |             |                                   |             |  |
>>> |  |             |                                   |             |  |
>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>> |  |             |  (Configured using NTB Function)  |             |  |
>>> |  +-------------+                                   +-------------+  |
>>> +---------------------------------------------------------------------+
>>>
>>> Software Layering:
>>>
>>> The high-level SW layering should look something like below. This series
>>> adds support only for RPMSG VHOST, however something similar should be
>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>> device, user) can use any of the vhost client driver.
>>>
>>>
>>>      +----------------+  +-----------+  +------------+  +----------+
>>>      |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>      +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>              |                 |              |              |
>>>              |                 |              |              |
>>>              |                 |              |              |
>>> +-----------v-----------------v--------------v--------------v----------+
>>> |                            VHOST CORE                                |
>>> +--------^---------------^--------------------^------------------^-----+
>>>           |               |                    |                  |
>>>           |               |                    |                  |
>>>           |               |                    |                  |
>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>
>>> This was initially proposed here [1]
>>>
>>> [1] -> https://lore.kernel.org/r/[email protected]
>>
>> I find this very interesting. A huge patchset so will take a bit
>> to review, but I certainly plan to do that. Thanks!
>
>
> Yes, it would be better if there's a git branch for us to have a look.

I've pushed the branch
https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>
> Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is
> duplicated with vDPA (e.g the epf transport or vhost bus).

This is about connecting two different HW systems both running Linux and
doesn't necessarily involve virtualization. So there is no guest or host as in
virtualization but two entirely different systems connected via PCIe cable, one
acting as guest and one as host. So one system will provide virtio
functionality reserving memory for virtqueues and the other provides vhost
functionality providing a way to access the virtqueues in virtio memory. One is
source and the other is sink and there is no intermediate entity. (vhost was
probably intermediate entity in virtualization?)

>
> Have you considered to implement these through vDPA?

IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver
or vhost net driver is not provided.

The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
(usecase2 above), all the boards run Linux. The middle board provides NTB
functionality and board on either side provides virtio/vhost functionality and
transfer data using rpmsg.

Thanks
Kishon

>
> Thanks
>
>
>>
>>> Kishon Vijay Abraham I (22):
>>>    vhost: Make _feature_ bits a property of vhost device
>>>    vhost: Introduce standard Linux driver model in VHOST
>>>    vhost: Add ops for the VHOST driver to configure VHOST device
>>>    vringh: Add helpers to access vring in MMIO
>>>    vhost: Add MMIO helpers for operations on vhost virtqueue
>>>    vhost: Introduce configfs entry for configuring VHOST
>>>    virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>    rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>      reading messages
>>>    rpmsg: Introduce configfs entry for configuring rpmsg
>>>    rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>    rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>      rpmsg_internal.h
>>>    virtio: Add ops to allocate and free buffer
>>>    rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>      virtio_free_buffer()
>>>    rpmsg: Add VHOST based remote processor messaging bus
>>>    samples/rpmsg: Setup delayed work to send message
>>>    samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>      message
>>>    rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>    virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>      device
>>>    PCI: endpoint: Add EP function driver to provide VHOST interface
>>>    NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>    NTB: Add a new NTB client driver to implement VHOST functionality
>>>    NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>
>>>   Documentation/driver-api/ntb.rst              |   11 +
>>>   Documentation/rpmsg.txt                       |   56 +
>>>   drivers/ntb/Kconfig                           |   18 +
>>>   drivers/ntb/Makefile                          |    2 +
>>>   drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>   drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>   drivers/ntb/ntb_virtio.h                      |   56 +
>>>   drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>   drivers/pci/endpoint/functions/Makefile       |    1 +
>>>   .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>   drivers/rpmsg/Kconfig                         |   10 +
>>>   drivers/rpmsg/Makefile                        |    3 +-
>>>   drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>   drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>   drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>   drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>   drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>   drivers/vhost/Kconfig                         |    1 +
>>>   drivers/vhost/Makefile                        |    2 +-
>>>   drivers/vhost/net.c                           |   10 +-
>>>   drivers/vhost/scsi.c                          |   24 +-
>>>   drivers/vhost/test.c                          |   17 +-
>>>   drivers/vhost/vdpa.c                          |    2 +-
>>>   drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>   drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>   drivers/vhost/vringh.c                        |  332 +++++
>>>   drivers/vhost/vsock.c                         |   20 +-
>>>   drivers/virtio/Kconfig                        |    9 +
>>>   drivers/virtio/Makefile                       |    1 +
>>>   drivers/virtio/virtio_pci_common.c            |   25 +-
>>>   drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>   include/linux/mod_devicetable.h               |    6 +
>>>   include/linux/rpmsg.h                         |    6 +
>>>   {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>   include/linux/virtio.h                        |    3 +
>>>   include/linux/virtio_config.h                 |   42 +
>>>   include/linux/vringh.h                        |   46 +
>>>   samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>   tools/virtio/virtio_test.c                    |    2 +-
>>>   39 files changed, 7083 insertions(+), 183 deletions(-)
>>>   create mode 100644 drivers/ntb/ntb_vhost.c
>>>   create mode 100644 drivers/ntb/ntb_virtio.c
>>>   create mode 100644 drivers/ntb/ntb_virtio.h
>>>   create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>   create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>   create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>   create mode 100644 drivers/vhost/vhost_cfs.c
>>>   create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>   rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>
>>> -- 
>>> 2.17.1
>>>
>

2020-07-02 17:32:05

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

On Thu, 2 Jul 2020 at 03:51, Michael S. Tsirkin <[email protected]> wrote:
>
> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
> > This series enhances Linux Vhost support to enable SoC-to-SoC
> > communication over MMIO. This series enables rpmsg communication between
> > two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >
> > 1) Modify vhost to use standard Linux driver model
> > 2) Add support in vring to access virtqueue over MMIO
> > 3) Add vhost client driver for rpmsg
> > 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
> > rpmsg communication between two SoCs connected to each other
> > 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
> > between two SoCs connected via NTB
> > 6) Add configfs to configure the components
> >
> > UseCase1 :
> >
> > VHOST RPMSG VIRTIO RPMSG
> > + +
> > | |
> > | |
> > | |
> > | |
> > +-----v------+ +------v-------+
> > | Linux | | Linux |
> > | Endpoint | | Root Complex |
> > | <-----------------> |
> > | | | |
> > | SOC1 | | SOC2 |
> > +------------+ +--------------+
> >
> > UseCase 2:
> >
> > VHOST RPMSG VIRTIO RPMSG
> > + +
> > | |
> > | |
> > | |
> > | |
> > +------v------+ +------v------+
> > | | | |
> > | HOST1 | | HOST2 |
> > | | | |
> > +------^------+ +------^------+
> > | |
> > | |
> > +---------------------------------------------------------------------+
> > | +------v------+ +------v------+ |
> > | | | | | |
> > | | EP | | EP | |
> > | | CONTROLLER1 | | CONTROLLER2 | |
> > | | <-----------------------------------> | |
> > | | | | | |
> > | | | | | |
> > | | | SoC With Multiple EP Instances | | |
> > | | | (Configured using NTB Function) | | |
> > | +-------------+ +-------------+ |
> > +---------------------------------------------------------------------+
> >
> > Software Layering:
> >
> > The high-level SW layering should look something like below. This series
> > adds support only for RPMSG VHOST, however something similar should be
> > done for net and scsi. With that any vhost device (PCI, NTB, Platform
> > device, user) can use any of the vhost client driver.
> >
> >
> > +----------------+ +-----------+ +------------+ +----------+
> > | RPMSG VHOST | | NET VHOST | | SCSI VHOST | | X |
> > +-------^--------+ +-----^-----+ +-----^------+ +----^-----+
> > | | | |
> > | | | |
> > | | | |
> > +-----------v-----------------v--------------v--------------v----------+
> > | VHOST CORE |
> > +--------^---------------^--------------------^------------------^-----+
> > | | | |
> > | | | |
> > | | | |
> > +--------v-------+ +----v------+ +----------v----------+ +----v-----+
> > | PCI EPF VHOST | | NTB VHOST | |PLATFORM DEVICE VHOST| | X |
> > +----------------+ +-----------+ +---------------------+ +----------+
> >
> > This was initially proposed here [1]
> >
> > [1] -> https://lore.kernel.org/r/[email protected]
>
>
> I find this very interesting. A huge patchset so will take a bit
> to review, but I certainly plan to do that. Thanks!

Same here - it will take time. This patchset is sizable and sits
behind a few others that are equally big.

>
> >
> > Kishon Vijay Abraham I (22):
> > vhost: Make _feature_ bits a property of vhost device
> > vhost: Introduce standard Linux driver model in VHOST
> > vhost: Add ops for the VHOST driver to configure VHOST device
> > vringh: Add helpers to access vring in MMIO
> > vhost: Add MMIO helpers for operations on vhost virtqueue
> > vhost: Introduce configfs entry for configuring VHOST
> > virtio_pci: Use request_threaded_irq() instead of request_irq()
> > rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
> > reading messages
> > rpmsg: Introduce configfs entry for configuring rpmsg
> > rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
> > rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
> > rpmsg_internal.h
> > virtio: Add ops to allocate and free buffer
> > rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
> > virtio_free_buffer()
> > rpmsg: Add VHOST based remote processor messaging bus
> > samples/rpmsg: Setup delayed work to send message
> > samples/rpmsg: Wait for address to be bound to rpdev for sending
> > message
> > rpmsg.txt: Add Documentation to configure rpmsg using configfs
> > virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
> > device
> > PCI: endpoint: Add EP function driver to provide VHOST interface
> > NTB: Add a new NTB client driver to implement VIRTIO functionality
> > NTB: Add a new NTB client driver to implement VHOST functionality
> > NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
> >
> > Documentation/driver-api/ntb.rst | 11 +
> > Documentation/rpmsg.txt | 56 +
> > drivers/ntb/Kconfig | 18 +
> > drivers/ntb/Makefile | 2 +
> > drivers/ntb/ntb_vhost.c | 776 +++++++++++
> > drivers/ntb/ntb_virtio.c | 853 ++++++++++++
> > drivers/ntb/ntb_virtio.h | 56 +
> > drivers/pci/endpoint/functions/Kconfig | 11 +
> > drivers/pci/endpoint/functions/Makefile | 1 +
> > .../pci/endpoint/functions/pci-epf-vhost.c | 1144 ++++++++++++++++
> > drivers/rpmsg/Kconfig | 10 +
> > drivers/rpmsg/Makefile | 3 +-
> > drivers/rpmsg/rpmsg_cfs.c | 394 ++++++
> > drivers/rpmsg/rpmsg_core.c | 7 +
> > drivers/rpmsg/rpmsg_internal.h | 136 ++
> > drivers/rpmsg/vhost_rpmsg_bus.c | 1151 +++++++++++++++++
> > drivers/rpmsg/virtio_rpmsg_bus.c | 184 ++-
> > drivers/vhost/Kconfig | 1 +
> > drivers/vhost/Makefile | 2 +-
> > drivers/vhost/net.c | 10 +-
> > drivers/vhost/scsi.c | 24 +-
> > drivers/vhost/test.c | 17 +-
> > drivers/vhost/vdpa.c | 2 +-
> > drivers/vhost/vhost.c | 730 ++++++++++-
> > drivers/vhost/vhost_cfs.c | 341 +++++
> > drivers/vhost/vringh.c | 332 +++++
> > drivers/vhost/vsock.c | 20 +-
> > drivers/virtio/Kconfig | 9 +
> > drivers/virtio/Makefile | 1 +
> > drivers/virtio/virtio_pci_common.c | 25 +-
> > drivers/virtio/virtio_pci_epf.c | 670 ++++++++++
> > include/linux/mod_devicetable.h | 6 +
> > include/linux/rpmsg.h | 6 +
> > {drivers/vhost => include/linux}/vhost.h | 132 +-
> > include/linux/virtio.h | 3 +
> > include/linux/virtio_config.h | 42 +
> > include/linux/vringh.h | 46 +
> > samples/rpmsg/rpmsg_client_sample.c | 32 +-
> > tools/virtio/virtio_test.c | 2 +-
> > 39 files changed, 7083 insertions(+), 183 deletions(-)
> > create mode 100644 drivers/ntb/ntb_vhost.c
> > create mode 100644 drivers/ntb/ntb_virtio.c
> > create mode 100644 drivers/ntb/ntb_virtio.h
> > create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
> > create mode 100644 drivers/rpmsg/rpmsg_cfs.c
> > create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
> > create mode 100644 drivers/vhost/vhost_cfs.c
> > create mode 100644 drivers/virtio/virtio_pci_epf.c
> > rename {drivers/vhost => include/linux}/vhost.h (66%)
> >
> > --
> > 2.17.1
> >
>

2020-07-03 06:18:55

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

+Alan, Haotian

On 7/2/2020 11:01 PM, Mathieu Poirier wrote:
> On Thu, 2 Jul 2020 at 03:51, Michael S. Tsirkin <[email protected]> wrote:
>>
>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>> communication over MMIO. This series enables rpmsg communication between
>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>
>>> 1) Modify vhost to use standard Linux driver model
>>> 2) Add support in vring to access virtqueue over MMIO
>>> 3) Add vhost client driver for rpmsg
>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>> rpmsg communication between two SoCs connected to each other
>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>> between two SoCs connected via NTB
>>> 6) Add configfs to configure the components
>>>
>>> UseCase1 :
>>>
>>> VHOST RPMSG VIRTIO RPMSG
>>> + +
>>> | |
>>> | |
>>> | |
>>> | |
>>> +-----v------+ +------v-------+
>>> | Linux | | Linux |
>>> | Endpoint | | Root Complex |
>>> | <-----------------> |
>>> | | | |
>>> | SOC1 | | SOC2 |
>>> +------------+ +--------------+
>>>
>>> UseCase 2:
>>>
>>> VHOST RPMSG VIRTIO RPMSG
>>> + +
>>> | |
>>> | |
>>> | |
>>> | |
>>> +------v------+ +------v------+
>>> | | | |
>>> | HOST1 | | HOST2 |
>>> | | | |
>>> +------^------+ +------^------+
>>> | |
>>> | |
>>> +---------------------------------------------------------------------+
>>> | +------v------+ +------v------+ |
>>> | | | | | |
>>> | | EP | | EP | |
>>> | | CONTROLLER1 | | CONTROLLER2 | |
>>> | | <-----------------------------------> | |
>>> | | | | | |
>>> | | | | | |
>>> | | | SoC With Multiple EP Instances | | |
>>> | | | (Configured using NTB Function) | | |
>>> | +-------------+ +-------------+ |
>>> +---------------------------------------------------------------------+
>>>
>>> Software Layering:
>>>
>>> The high-level SW layering should look something like below. This series
>>> adds support only for RPMSG VHOST, however something similar should be
>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>> device, user) can use any of the vhost client driver.
>>>
>>>
>>> +----------------+ +-----------+ +------------+ +----------+
>>> | RPMSG VHOST | | NET VHOST | | SCSI VHOST | | X |
>>> +-------^--------+ +-----^-----+ +-----^------+ +----^-----+
>>> | | | |
>>> | | | |
>>> | | | |
>>> +-----------v-----------------v--------------v--------------v----------+
>>> | VHOST CORE |
>>> +--------^---------------^--------------------^------------------^-----+
>>> | | | |
>>> | | | |
>>> | | | |
>>> +--------v-------+ +----v------+ +----------v----------+ +----v-----+
>>> | PCI EPF VHOST | | NTB VHOST | |PLATFORM DEVICE VHOST| | X |
>>> +----------------+ +-----------+ +---------------------+ +----------+
>>>
>>> This was initially proposed here [1]
>>>
>>> [1] -> https://lore.kernel.org/r/[email protected]
>>
>>
>> I find this very interesting. A huge patchset so will take a bit
>> to review, but I certainly plan to do that. Thanks!
>
> Same here - it will take time. This patchset is sizable and sits
> behind a few others that are equally big.
>
>>
>>>
>>> Kishon Vijay Abraham I (22):
>>> vhost: Make _feature_ bits a property of vhost device
>>> vhost: Introduce standard Linux driver model in VHOST
>>> vhost: Add ops for the VHOST driver to configure VHOST device
>>> vringh: Add helpers to access vring in MMIO
>>> vhost: Add MMIO helpers for operations on vhost virtqueue
>>> vhost: Introduce configfs entry for configuring VHOST
>>> virtio_pci: Use request_threaded_irq() instead of request_irq()
>>> rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>> reading messages
>>> rpmsg: Introduce configfs entry for configuring rpmsg
>>> rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>> rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>> rpmsg_internal.h
>>> virtio: Add ops to allocate and free buffer
>>> rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>> virtio_free_buffer()
>>> rpmsg: Add VHOST based remote processor messaging bus
>>> samples/rpmsg: Setup delayed work to send message
>>> samples/rpmsg: Wait for address to be bound to rpdev for sending
>>> message
>>> rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>> virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>> device
>>> PCI: endpoint: Add EP function driver to provide VHOST interface
>>> NTB: Add a new NTB client driver to implement VIRTIO functionality
>>> NTB: Add a new NTB client driver to implement VHOST functionality
>>> NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>
>>> Documentation/driver-api/ntb.rst | 11 +
>>> Documentation/rpmsg.txt | 56 +
>>> drivers/ntb/Kconfig | 18 +
>>> drivers/ntb/Makefile | 2 +
>>> drivers/ntb/ntb_vhost.c | 776 +++++++++++
>>> drivers/ntb/ntb_virtio.c | 853 ++++++++++++
>>> drivers/ntb/ntb_virtio.h | 56 +
>>> drivers/pci/endpoint/functions/Kconfig | 11 +
>>> drivers/pci/endpoint/functions/Makefile | 1 +
>>> .../pci/endpoint/functions/pci-epf-vhost.c | 1144 ++++++++++++++++
>>> drivers/rpmsg/Kconfig | 10 +
>>> drivers/rpmsg/Makefile | 3 +-
>>> drivers/rpmsg/rpmsg_cfs.c | 394 ++++++
>>> drivers/rpmsg/rpmsg_core.c | 7 +
>>> drivers/rpmsg/rpmsg_internal.h | 136 ++
>>> drivers/rpmsg/vhost_rpmsg_bus.c | 1151 +++++++++++++++++
>>> drivers/rpmsg/virtio_rpmsg_bus.c | 184 ++-
>>> drivers/vhost/Kconfig | 1 +
>>> drivers/vhost/Makefile | 2 +-
>>> drivers/vhost/net.c | 10 +-
>>> drivers/vhost/scsi.c | 24 +-
>>> drivers/vhost/test.c | 17 +-
>>> drivers/vhost/vdpa.c | 2 +-
>>> drivers/vhost/vhost.c | 730 ++++++++++-
>>> drivers/vhost/vhost_cfs.c | 341 +++++
>>> drivers/vhost/vringh.c | 332 +++++
>>> drivers/vhost/vsock.c | 20 +-
>>> drivers/virtio/Kconfig | 9 +
>>> drivers/virtio/Makefile | 1 +
>>> drivers/virtio/virtio_pci_common.c | 25 +-
>>> drivers/virtio/virtio_pci_epf.c | 670 ++++++++++
>>> include/linux/mod_devicetable.h | 6 +
>>> include/linux/rpmsg.h | 6 +
>>> {drivers/vhost => include/linux}/vhost.h | 132 +-
>>> include/linux/virtio.h | 3 +
>>> include/linux/virtio_config.h | 42 +
>>> include/linux/vringh.h | 46 +
>>> samples/rpmsg/rpmsg_client_sample.c | 32 +-
>>> tools/virtio/virtio_test.c | 2 +-
>>> 39 files changed, 7083 insertions(+), 183 deletions(-)
>>> create mode 100644 drivers/ntb/ntb_vhost.c
>>> create mode 100644 drivers/ntb/ntb_virtio.c
>>> create mode 100644 drivers/ntb/ntb_virtio.h
>>> create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>> create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>> create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>> create mode 100644 drivers/vhost/vhost_cfs.c
>>> create mode 100644 drivers/virtio/virtio_pci_epf.c
>>> rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>
>>> --
>>> 2.17.1
>>>
>>

2020-07-03 07:19:11

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/2/2020 3:40 PM, Jason Wang wrote:
>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>> communication over MMIO. This series enables rpmsg communication between
>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>
>>>> 1) Modify vhost to use standard Linux driver model
>>>> 2) Add support in vring to access virtqueue over MMIO
>>>> 3) Add vhost client driver for rpmsg
>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>     rpmsg communication between two SoCs connected to each other
>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>     between two SoCs connected via NTB
>>>> 6) Add configfs to configure the components
>>>>
>>>> UseCase1 :
>>>>
>>>>   VHOST RPMSG                     VIRTIO RPMSG
>>>>        +                               +
>>>>        |                               |
>>>>        |                               |
>>>>        |                               |
>>>>        |                               |
>>>> +-----v------+                 +------v-------+
>>>> |   Linux    |                 |     Linux    |
>>>> |  Endpoint  |                 | Root Complex |
>>>> |            <----------------->              |
>>>> |            |                 |              |
>>>> |    SOC1    |                 |     SOC2     |
>>>> +------------+                 +--------------+
>>>>
>>>> UseCase 2:
>>>>
>>>>       VHOST RPMSG                                      VIRTIO RPMSG
>>>>            +                                                 +
>>>>            |                                                 |
>>>>            |                                                 |
>>>>            |                                                 |
>>>>            |                                                 |
>>>>     +------v------+                                   +------v------+
>>>>     |             |                                   |             |
>>>>     |    HOST1    |                                   |    HOST2    |
>>>>     |             |                                   |             |
>>>>     +------^------+                                   +------^------+
>>>>            |                                                 |
>>>>            |                                                 |
>>>> +---------------------------------------------------------------------+
>>>> |  +------v------+                                   +------v------+  |
>>>> |  |             |                                   |             |  |
>>>> |  |     EP      |                                   |     EP      |  |
>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>> |  |             <----------------------------------->             |  |
>>>> |  |             |                                   |             |  |
>>>> |  |             |                                   |             |  |
>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>> |  +-------------+                                   +-------------+  |
>>>> +---------------------------------------------------------------------+
>>>>
>>>> Software Layering:
>>>>
>>>> The high-level SW layering should look something like below. This series
>>>> adds support only for RPMSG VHOST, however something similar should be
>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>> device, user) can use any of the vhost client driver.
>>>>
>>>>
>>>>      +----------------+  +-----------+  +------------+  +----------+
>>>>      |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>      +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>              |                 |              |              |
>>>>              |                 |              |              |
>>>>              |                 |              |              |
>>>> +-----------v-----------------v--------------v--------------v----------+
>>>> |                            VHOST CORE                                |
>>>> +--------^---------------^--------------------^------------------^-----+
>>>>           |               |                    |                  |
>>>>           |               |                    |                  |
>>>>           |               |                    |                  |
>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>
>>>> This was initially proposed here [1]
>>>>
>>>> [1] -> https://lore.kernel.org/r/[email protected]
>>> I find this very interesting. A huge patchset so will take a bit
>>> to review, but I certainly plan to do that. Thanks!
>>
>> Yes, it would be better if there's a git branch for us to have a look.
> I've pushed the branch
> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc


Thanks


>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is
>> duplicated with vDPA (e.g the epf transport or vhost bus).
> This is about connecting two different HW systems both running Linux and
> doesn't necessarily involve virtualization.


Right, this is something similar to VOP
(Documentation/misc-devices/mic/mic_overview.rst). The different is the
hardware I guess and VOP use userspace application to implement the device.


> So there is no guest or host as in
> virtualization but two entirely different systems connected via PCIe cable, one
> acting as guest and one as host. So one system will provide virtio
> functionality reserving memory for virtqueues and the other provides vhost
> functionality providing a way to access the virtqueues in virtio memory. One is
> source and the other is sink and there is no intermediate entity. (vhost was
> probably intermediate entity in virtualization?)


(Not a native English speaker) but "vhost" could introduce some
confusion for me since it was use for implementing virtio backend for
userspace drivers. I guess "vringh" could be better.


>
>> Have you considered to implement these through vDPA?
> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver
> or vhost net driver is not provided.
>
> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
> (usecase2 above),


I see.


> all the boards run Linux. The middle board provides NTB
> functionality and board on either side provides virtio/vhost functionality and
> transfer data using rpmsg.


So I wonder whether it's worthwhile for a new bus. Can we use the
existed virtio-bus/drivers? It might work as, except for the epf
transport, we can introduce a epf "vhost" transport driver.

It will have virtqueues but only used for the communication between
itself and uppter virtio driver. And it will have vringh queues which
will be probe by virtio epf transport drivers. And it needs to do
datacopy between virtqueue and vringh queues.

It works like:

virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
queue/epf>

The advantages is that there's no need for writing new buses and drivers.

Does this make sense?

Thanks


>
> Thanks
> Kishon
>
>> Thanks
>>
>>
>>>> Kishon Vijay Abraham I (22):
>>>>    vhost: Make _feature_ bits a property of vhost device
>>>>    vhost: Introduce standard Linux driver model in VHOST
>>>>    vhost: Add ops for the VHOST driver to configure VHOST device
>>>>    vringh: Add helpers to access vring in MMIO
>>>>    vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>    vhost: Introduce configfs entry for configuring VHOST
>>>>    virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>    rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>      reading messages
>>>>    rpmsg: Introduce configfs entry for configuring rpmsg
>>>>    rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>    rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>      rpmsg_internal.h
>>>>    virtio: Add ops to allocate and free buffer
>>>>    rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>      virtio_free_buffer()
>>>>    rpmsg: Add VHOST based remote processor messaging bus
>>>>    samples/rpmsg: Setup delayed work to send message
>>>>    samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>      message
>>>>    rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>    virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>      device
>>>>    PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>    NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>    NTB: Add a new NTB client driver to implement VHOST functionality
>>>>    NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>
>>>>   Documentation/driver-api/ntb.rst              |   11 +
>>>>   Documentation/rpmsg.txt                       |   56 +
>>>>   drivers/ntb/Kconfig                           |   18 +
>>>>   drivers/ntb/Makefile                          |    2 +
>>>>   drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>   drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>   drivers/ntb/ntb_virtio.h                      |   56 +
>>>>   drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>   drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>   .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>   drivers/rpmsg/Kconfig                         |   10 +
>>>>   drivers/rpmsg/Makefile                        |    3 +-
>>>>   drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>   drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>   drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>   drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>   drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>   drivers/vhost/Kconfig                         |    1 +
>>>>   drivers/vhost/Makefile                        |    2 +-
>>>>   drivers/vhost/net.c                           |   10 +-
>>>>   drivers/vhost/scsi.c                          |   24 +-
>>>>   drivers/vhost/test.c                          |   17 +-
>>>>   drivers/vhost/vdpa.c                          |    2 +-
>>>>   drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>   drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>   drivers/vhost/vringh.c                        |  332 +++++
>>>>   drivers/vhost/vsock.c                         |   20 +-
>>>>   drivers/virtio/Kconfig                        |    9 +
>>>>   drivers/virtio/Makefile                       |    1 +
>>>>   drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>   drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>   include/linux/mod_devicetable.h               |    6 +
>>>>   include/linux/rpmsg.h                         |    6 +
>>>>   {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>   include/linux/virtio.h                        |    3 +
>>>>   include/linux/virtio_config.h                 |   42 +
>>>>   include/linux/vringh.h                        |   46 +
>>>>   samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>   tools/virtio/virtio_test.c                    |    2 +-
>>>>   39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>   create mode 100644 drivers/ntb/ntb_vhost.c
>>>>   create mode 100644 drivers/ntb/ntb_virtio.c
>>>>   create mode 100644 drivers/ntb/ntb_virtio.h
>>>>   create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>   create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>   create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>   create mode 100644 drivers/vhost/vhost_cfs.c
>>>>   create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>   rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>
>>>> --
>>>> 2.17.1
>>>>

2020-07-06 09:34:56

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Jason,

On 7/3/2020 12:46 PM, Jason Wang wrote:
>
> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>
>>>>> 1) Modify vhost to use standard Linux driver model
>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>> 3) Add vhost client driver for rpmsg
>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>      rpmsg communication between two SoCs connected to each other
>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>      between two SoCs connected via NTB
>>>>> 6) Add configfs to configure the components
>>>>>
>>>>> UseCase1 :
>>>>>
>>>>>    VHOST RPMSG                     VIRTIO RPMSG
>>>>>         +                               +
>>>>>         |                               |
>>>>>         |                               |
>>>>>         |                               |
>>>>>         |                               |
>>>>> +-----v------+                 +------v-------+
>>>>> |   Linux    |                 |     Linux    |
>>>>> |  Endpoint  |                 | Root Complex |
>>>>> |            <----------------->              |
>>>>> |            |                 |              |
>>>>> |    SOC1    |                 |     SOC2     |
>>>>> +------------+                 +--------------+
>>>>>
>>>>> UseCase 2:
>>>>>
>>>>>        VHOST RPMSG                                      VIRTIO RPMSG
>>>>>             +                                                 +
>>>>>             |                                                 |
>>>>>             |                                                 |
>>>>>             |                                                 |
>>>>>             |                                                 |
>>>>>      +------v------+                                   +------v------+
>>>>>      |             |                                   |             |
>>>>>      |    HOST1    |                                   |    HOST2    |
>>>>>      |             |                                   |             |
>>>>>      +------^------+                                   +------^------+
>>>>>             |                                                 |
>>>>>             |                                                 |
>>>>> +---------------------------------------------------------------------+
>>>>> |  +------v------+                                   +------v------+  |
>>>>> |  |             |                                   |             |  |
>>>>> |  |     EP      |                                   |     EP      |  |
>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>> |  |             <----------------------------------->             |  |
>>>>> |  |             |                                   |             |  |
>>>>> |  |             |                                   |             |  |
>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>> |  +-------------+                                   +-------------+  |
>>>>> +---------------------------------------------------------------------+
>>>>>
>>>>> Software Layering:
>>>>>
>>>>> The high-level SW layering should look something like below. This series
>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>> device, user) can use any of the vhost client driver.
>>>>>
>>>>>
>>>>>       +----------------+  +-----------+  +------------+  +----------+
>>>>>       |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>       +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>               |                 |              |              |
>>>>>               |                 |              |              |
>>>>>               |                 |              |              |
>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>> |                            VHOST CORE                                |
>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>            |               |                    |                  |
>>>>>            |               |                    |                  |
>>>>>            |               |                    |                  |
>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>
>>>>> This was initially proposed here [1]
>>>>>
>>>>> [1] -> https://lore.kernel.org/r/[email protected]
>>>> I find this very interesting. A huge patchset so will take a bit
>>>> to review, but I certainly plan to do that. Thanks!
>>>
>>> Yes, it would be better if there's a git branch for us to have a look.
>> I've pushed the branch
>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>
>
> Thanks
>
>
>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is
>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>> This is about connecting two different HW systems both running Linux and
>> doesn't necessarily involve virtualization.
>
>
> Right, this is something similar to VOP
> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
> hardware I guess and VOP use userspace application to implement the device.

I'd also like to point out, this series tries to have communication between two
SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any
of the HW in NTB below should be able to use a virtio-vhost communication

#ls drivers/ntb/hw/
amd epf idt intel mscc

And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
function driver and hence any SoC that supports configurable PCIe endpoint can
use virtio-vhost communication

# ls drivers/pci/controller/dwc/*ep*
drivers/pci/controller/dwc/pcie-designware-ep.c
drivers/pci/controller/dwc/pcie-uniphier-ep.c
drivers/pci/controller/dwc/pci-layerscape-ep.c

>
>
>>   So there is no guest or host as in
>> virtualization but two entirely different systems connected via PCIe cable, one
>> acting as guest and one as host. So one system will provide virtio
>> functionality reserving memory for virtqueues and the other provides vhost
>> functionality providing a way to access the virtqueues in virtio memory. One is
>> source and the other is sink and there is no intermediate entity. (vhost was
>> probably intermediate entity in virtualization?)
>
>
> (Not a native English speaker) but "vhost" could introduce some confusion for
> me since it was use for implementing virtio backend for userspace drivers. I
> guess "vringh" could be better.

Initially I had named this vringh but later decided to choose vhost instead of
vringh. vhost is still a virtio backend (not necessarily userspace) though it
now resides in an entirely different system. Whatever virtio is for a frontend
system, vhost can be that for a backend system. vring can be for accessing
virtqueue and can be used either in frontend or backend.
>
>
>>
>>> Have you considered to implement these through vDPA?
>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver
>> or vhost net driver is not provided.
>>
>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>> (usecase2 above),
>
>
> I see.
>
>
>>   all the boards run Linux. The middle board provides NTB
>> functionality and board on either side provides virtio/vhost functionality and
>> transfer data using rpmsg.
>
>
> So I wonder whether it's worthwhile for a new bus. Can we use the existed
> virtio-bus/drivers? It might work as, except for the epf transport, we can
> introduce a epf "vhost" transport driver.

IMHO we'll need two buses one for frontend and other for backend because the
two components can then co-operate/interact with each other to provide a
functionality. Though both will seemingly provide similar callbacks, they are
both provide symmetrical or complimentary funcitonality and need not be same or
identical.

Having the same bus can also create sequencing issues.

If you look at virtio_dev_probe() of virtio_bus

device_features = dev->config->get_features(dev);

Now if we use same bus for both front-end and back-end, both will try to
get_features when there has been no set_features. Ideally vhost device should
be initialized first with the set of features it supports. Vhost and virtio
should use "status" and "features" complimentarily and not identically.

virtio device (or frontend) cannot be initialized before vhost device (or
backend) gets initialized with data such as features. Similarly vhost (backend)
cannot access virqueues or buffers before virtio (frontend) sets
VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
the physical memory for virtqueues are created by virtio (frontend).

>
> It will have virtqueues but only used for the communication between itself and
> uppter virtio driver. And it will have vringh queues which will be probe by
> virtio epf transport drivers. And it needs to do datacopy between virtqueue and
> vringh queues.
>
> It works like:
>
> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh queue/epf>
>
> The advantages is that there's no need for writing new buses and drivers.

I think this will work however there is an addtional copy between vringh queue
and virtqueue, in some cases adds latency because of forwarding interrupts
between vhost and virtio driver, vhost drivers providing features (which means
it has to be aware of which virtio driver will be connected).
virtio drivers (front end) generally access the buffers from it's local memory
but when in backend it can access over MMIO (like PCI EPF or NTB) or userspace.
>
> Does this make sense?

Two copies in my opinion is an issue but lets get others opinions as well.

Thanks for your suggestions!

Regards
Kishon

>
> Thanks
>
>
>>
>> Thanks
>> Kishon
>>
>>> Thanks
>>>
>>>
>>>>> Kishon Vijay Abraham I (22):
>>>>>     vhost: Make _feature_ bits a property of vhost device
>>>>>     vhost: Introduce standard Linux driver model in VHOST
>>>>>     vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>     vringh: Add helpers to access vring in MMIO
>>>>>     vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>     vhost: Introduce configfs entry for configuring VHOST
>>>>>     virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>     rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>       reading messages
>>>>>     rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>     rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>     rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>       rpmsg_internal.h
>>>>>     virtio: Add ops to allocate and free buffer
>>>>>     rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>       virtio_free_buffer()
>>>>>     rpmsg: Add VHOST based remote processor messaging bus
>>>>>     samples/rpmsg: Setup delayed work to send message
>>>>>     samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>       message
>>>>>     rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>     virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>>       device
>>>>>     PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>     NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>     NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>     NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>>
>>>>>    Documentation/driver-api/ntb.rst              |   11 +
>>>>>    Documentation/rpmsg.txt                       |   56 +
>>>>>    drivers/ntb/Kconfig                           |   18 +
>>>>>    drivers/ntb/Makefile                          |    2 +
>>>>>    drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>    drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>    drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>    drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>    drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>    .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>>    drivers/rpmsg/Kconfig                         |   10 +
>>>>>    drivers/rpmsg/Makefile                        |    3 +-
>>>>>    drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>    drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>    drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>    drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>>    drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>    drivers/vhost/Kconfig                         |    1 +
>>>>>    drivers/vhost/Makefile                        |    2 +-
>>>>>    drivers/vhost/net.c                           |   10 +-
>>>>>    drivers/vhost/scsi.c                          |   24 +-
>>>>>    drivers/vhost/test.c                          |   17 +-
>>>>>    drivers/vhost/vdpa.c                          |    2 +-
>>>>>    drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>    drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>    drivers/vhost/vringh.c                        |  332 +++++
>>>>>    drivers/vhost/vsock.c                         |   20 +-
>>>>>    drivers/virtio/Kconfig                        |    9 +
>>>>>    drivers/virtio/Makefile                       |    1 +
>>>>>    drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>    drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>    include/linux/mod_devicetable.h               |    6 +
>>>>>    include/linux/rpmsg.h                         |    6 +
>>>>>    {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>    include/linux/virtio.h                        |    3 +
>>>>>    include/linux/virtio_config.h                 |   42 +
>>>>>    include/linux/vringh.h                        |   46 +
>>>>>    samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>    tools/virtio/virtio_test.c                    |    2 +-
>>>>>    39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>    create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>    create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>    create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>    create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>    create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>    create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>    create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>    create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>    rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>
>>>>> -- 
>>>>> 2.17.1
>>>>>
>

2020-07-07 09:48:22

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/3/2020 12:46 PM, Jason Wang wrote:
>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>
>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>> 3) Add vhost client driver for rpmsg
>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>      rpmsg communication between two SoCs connected to each other
>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>      between two SoCs connected via NTB
>>>>>> 6) Add configfs to configure the components
>>>>>>
>>>>>> UseCase1 :
>>>>>>
>>>>>>    VHOST RPMSG                     VIRTIO RPMSG
>>>>>>         +                               +
>>>>>>         |                               |
>>>>>>         |                               |
>>>>>>         |                               |
>>>>>>         |                               |
>>>>>> +-----v------+                 +------v-------+
>>>>>> |   Linux    |                 |     Linux    |
>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>> |            <----------------->              |
>>>>>> |            |                 |              |
>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>> +------------+                 +--------------+
>>>>>>
>>>>>> UseCase 2:
>>>>>>
>>>>>>        VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>             +                                                 +
>>>>>>             |                                                 |
>>>>>>             |                                                 |
>>>>>>             |                                                 |
>>>>>>             |                                                 |
>>>>>>      +------v------+                                   +------v------+
>>>>>>      |             |                                   |             |
>>>>>>      |    HOST1    |                                   |    HOST2    |
>>>>>>      |             |                                   |             |
>>>>>>      +------^------+                                   +------^------+
>>>>>>             |                                                 |
>>>>>>             |                                                 |
>>>>>> +---------------------------------------------------------------------+
>>>>>> |  +------v------+                                   +------v------+  |
>>>>>> |  |             |                                   |             |  |
>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>> |  |             <----------------------------------->             |  |
>>>>>> |  |             |                                   |             |  |
>>>>>> |  |             |                                   |             |  |
>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>> |  +-------------+                                   +-------------+  |
>>>>>> +---------------------------------------------------------------------+
>>>>>>
>>>>>> Software Layering:
>>>>>>
>>>>>> The high-level SW layering should look something like below. This series
>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>> device, user) can use any of the vhost client driver.
>>>>>>
>>>>>>
>>>>>>       +----------------+  +-----------+  +------------+  +----------+
>>>>>>       |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>       +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>               |                 |              |              |
>>>>>>               |                 |              |              |
>>>>>>               |                 |              |              |
>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>> |                            VHOST CORE                                |
>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>            |               |                    |                  |
>>>>>>            |               |                    |                  |
>>>>>>            |               |                    |                  |
>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>
>>>>>> This was initially proposed here [1]
>>>>>>
>>>>>> [1] -> https://lore.kernel.org/r/[email protected]
>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>> to review, but I certainly plan to do that. Thanks!
>>>> Yes, it would be better if there's a git branch for us to have a look.
>>> I've pushed the branch
>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>
>> Thanks
>>
>>
>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is
>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>> This is about connecting two different HW systems both running Linux and
>>> doesn't necessarily involve virtualization.
>>
>> Right, this is something similar to VOP
>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>> hardware I guess and VOP use userspace application to implement the device.
> I'd also like to point out, this series tries to have communication between two
> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any
> of the HW in NTB below should be able to use a virtio-vhost communication
>
> #ls drivers/ntb/hw/
> amd epf idt intel mscc
>
> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
> function driver and hence any SoC that supports configurable PCIe endpoint can
> use virtio-vhost communication
>
> # ls drivers/pci/controller/dwc/*ep*
> drivers/pci/controller/dwc/pcie-designware-ep.c
> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> drivers/pci/controller/dwc/pci-layerscape-ep.c


Thanks for those backgrounds.


>
>>
>>>   So there is no guest or host as in
>>> virtualization but two entirely different systems connected via PCIe cable, one
>>> acting as guest and one as host. So one system will provide virtio
>>> functionality reserving memory for virtqueues and the other provides vhost
>>> functionality providing a way to access the virtqueues in virtio memory. One is
>>> source and the other is sink and there is no intermediate entity. (vhost was
>>> probably intermediate entity in virtualization?)
>>
>> (Not a native English speaker) but "vhost" could introduce some confusion for
>> me since it was use for implementing virtio backend for userspace drivers. I
>> guess "vringh" could be better.
> Initially I had named this vringh but later decided to choose vhost instead of
> vringh. vhost is still a virtio backend (not necessarily userspace) though it
> now resides in an entirely different system. Whatever virtio is for a frontend
> system, vhost can be that for a backend system. vring can be for accessing
> virtqueue and can be used either in frontend or backend.


Ok.


>>
>>>> Have you considered to implement these through vDPA?
>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver
>>> or vhost net driver is not provided.
>>>
>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>> (usecase2 above),
>>
>> I see.
>>
>>
>>>   all the boards run Linux. The middle board provides NTB
>>> functionality and board on either side provides virtio/vhost functionality and
>>> transfer data using rpmsg.
>>
>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>> introduce a epf "vhost" transport driver.
> IMHO we'll need two buses one for frontend and other for backend because the
> two components can then co-operate/interact with each other to provide a
> functionality. Though both will seemingly provide similar callbacks, they are
> both provide symmetrical or complimentary funcitonality and need not be same or
> identical.
>
> Having the same bus can also create sequencing issues.
>
> If you look at virtio_dev_probe() of virtio_bus
>
> device_features = dev->config->get_features(dev);
>
> Now if we use same bus for both front-end and back-end, both will try to
> get_features when there has been no set_features. Ideally vhost device should
> be initialized first with the set of features it supports. Vhost and virtio
> should use "status" and "features" complimentarily and not identically.


Yes, but there's no need for doing status/features passthrough in epf
vhost drivers.b


>
> virtio device (or frontend) cannot be initialized before vhost device (or
> backend) gets initialized with data such as features. Similarly vhost (backend)
> cannot access virqueues or buffers before virtio (frontend) sets
> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
> the physical memory for virtqueues are created by virtio (frontend).


epf vhost drivers need to implement two devices: vhost(vringh) device
and virtio device (which is a mediated device). The vhost(vringh) device
is doing feature negotiation with the virtio device via RC/EP or NTB.
The virtio device is doing feature negotiation with local virtio
drivers. If there're feature mismatch, epf vhost drivers and do
mediation between them.


>
>> It will have virtqueues but only used for the communication between itself and
>> uppter virtio driver. And it will have vringh queues which will be probe by
>> virtio epf transport drivers. And it needs to do datacopy between virtqueue and
>> vringh queues.
>>
>> It works like:
>>
>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh queue/epf>
>>
>> The advantages is that there's no need for writing new buses and drivers.
> I think this will work however there is an addtional copy between vringh queue
> and virtqueue,


I think not? E.g in use case 1), if we stick to virtio bus, we will have:

virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio
ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)

What epf vhost driver did is to read from virtio ring(1) about the
buffer len and addr and them DMA to Linux(RC)?


> in some cases adds latency because of forwarding interrupts
> between vhost and virtio driver, vhost drivers providing features (which means
> it has to be aware of which virtio driver will be connected).
> virtio drivers (front end) generally access the buffers from it's local memory
> but when in backend it can access over MMIO (like PCI EPF or NTB) or userspace.
>> Does this make sense?
> Two copies in my opinion is an issue but lets get others opinions as well.


Sure.


>
> Thanks for your suggestions!


You're welcome.

Thanks


>
> Regards
> Kishon
>
>> Thanks
>>
>>
>>> Thanks
>>> Kishon
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> Kishon Vijay Abraham I (22):
>>>>>>     vhost: Make _feature_ bits a property of vhost device
>>>>>>     vhost: Introduce standard Linux driver model in VHOST
>>>>>>     vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>     vringh: Add helpers to access vring in MMIO
>>>>>>     vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>     vhost: Introduce configfs entry for configuring VHOST
>>>>>>     virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>     rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>       reading messages
>>>>>>     rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>     rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>     rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>       rpmsg_internal.h
>>>>>>     virtio: Add ops to allocate and free buffer
>>>>>>     rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>       virtio_free_buffer()
>>>>>>     rpmsg: Add VHOST based remote processor messaging bus
>>>>>>     samples/rpmsg: Setup delayed work to send message
>>>>>>     samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>       message
>>>>>>     rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>     virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>>>       device
>>>>>>     PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>     NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>     NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>     NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>>>
>>>>>>    Documentation/driver-api/ntb.rst              |   11 +
>>>>>>    Documentation/rpmsg.txt                       |   56 +
>>>>>>    drivers/ntb/Kconfig                           |   18 +
>>>>>>    drivers/ntb/Makefile                          |    2 +
>>>>>>    drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>    drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>    drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>    drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>    drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>    .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>>>    drivers/rpmsg/Kconfig                         |   10 +
>>>>>>    drivers/rpmsg/Makefile                        |    3 +-
>>>>>>    drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>    drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>    drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>    drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>>>    drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>    drivers/vhost/Kconfig                         |    1 +
>>>>>>    drivers/vhost/Makefile                        |    2 +-
>>>>>>    drivers/vhost/net.c                           |   10 +-
>>>>>>    drivers/vhost/scsi.c                          |   24 +-
>>>>>>    drivers/vhost/test.c                          |   17 +-
>>>>>>    drivers/vhost/vdpa.c                          |    2 +-
>>>>>>    drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>    drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>    drivers/vhost/vringh.c                        |  332 +++++
>>>>>>    drivers/vhost/vsock.c                         |   20 +-
>>>>>>    drivers/virtio/Kconfig                        |    9 +
>>>>>>    drivers/virtio/Makefile                       |    1 +
>>>>>>    drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>    drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>    include/linux/mod_devicetable.h               |    6 +
>>>>>>    include/linux/rpmsg.h                         |    6 +
>>>>>>    {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>    include/linux/virtio.h                        |    3 +
>>>>>>    include/linux/virtio_config.h                 |   42 +
>>>>>>    include/linux/vringh.h                        |   46 +
>>>>>>    samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>    tools/virtio/virtio_test.c                    |    2 +-
>>>>>>    39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>    create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>    create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>    create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>    create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>    create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>    create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>    create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>    create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>    rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>
>>>>>> --
>>>>>> 2.17.1
>>>>>>

2020-07-07 14:47:16

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Jason,

On 7/7/2020 3:17 PM, Jason Wang wrote:
>
> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>> Hi Jason,
>>>>
>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>
>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>       rpmsg communication between two SoCs connected to each other
>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>       between two SoCs connected via NTB
>>>>>>> 6) Add configfs to configure the components
>>>>>>>
>>>>>>> UseCase1 :
>>>>>>>
>>>>>>>     VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>          +                               +
>>>>>>>          |                               |
>>>>>>>          |                               |
>>>>>>>          |                               |
>>>>>>>          |                               |
>>>>>>> +-----v------+                 +------v-------+
>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>> |            <----------------->              |
>>>>>>> |            |                 |              |
>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>> +------------+                 +--------------+
>>>>>>>
>>>>>>> UseCase 2:
>>>>>>>
>>>>>>>         VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>              +                                                 +
>>>>>>>              |                                                 |
>>>>>>>              |                                                 |
>>>>>>>              |                                                 |
>>>>>>>              |                                                 |
>>>>>>>       +------v------+                                   +------v------+
>>>>>>>       |             |                                   |             |
>>>>>>>       |    HOST1    |                                   |    HOST2    |
>>>>>>>       |             |                                   |             |
>>>>>>>       +------^------+                                   +------^------+
>>>>>>>              |                                                 |
>>>>>>>              |                                                 |
>>>>>>> +---------------------------------------------------------------------+
>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>> |  |             |                                   |             |  |
>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>> |  |             |                                   |             |  |
>>>>>>> |  |             |                                   |             |  |
>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>> +---------------------------------------------------------------------+
>>>>>>>
>>>>>>> Software Layering:
>>>>>>>
>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>
>>>>>>>
>>>>>>>        +----------------+  +-----------+  +------------+  +----------+
>>>>>>>        |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>        +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>                |                 |              |              |
>>>>>>>                |                 |              |              |
>>>>>>>                |                 |              |              |
>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>> |                            VHOST CORE                                |
>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>             |               |                    |                  |
>>>>>>>             |               |                    |                  |
>>>>>>>             |               |                    |                  |
>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>
>>>>>>> This was initially proposed here [1]
>>>>>>>
>>>>>>> [1] ->
>>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>> I've pushed the branch
>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>
>>> Thanks
>>>
>>>
>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>> work is
>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>> This is about connecting two different HW systems both running Linux and
>>>> doesn't necessarily involve virtualization.
>>>
>>> Right, this is something similar to VOP
>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>> hardware I guess and VOP use userspace application to implement the device.
>> I'd also like to point out, this series tries to have communication between two
>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any
>> of the HW in NTB below should be able to use a virtio-vhost communication
>>
>> #ls drivers/ntb/hw/
>> amd  epf  idt  intel  mscc
>>
>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>> function driver and hence any SoC that supports configurable PCIe endpoint can
>> use virtio-vhost communication
>>
>> # ls drivers/pci/controller/dwc/*ep*
>> drivers/pci/controller/dwc/pcie-designware-ep.c
>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>
>
> Thanks for those backgrounds.
>
>
>>
>>>
>>>>    So there is no guest or host as in
>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>> one
>>>> acting as guest and one as host. So one system will provide virtio
>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>> One is
>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>> probably intermediate entity in virtualization?)
>>>
>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>> me since it was use for implementing virtio backend for userspace drivers. I
>>> guess "vringh" could be better.
>> Initially I had named this vringh but later decided to choose vhost instead of
>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>> now resides in an entirely different system. Whatever virtio is for a frontend
>> system, vhost can be that for a backend system. vring can be for accessing
>> virtqueue and can be used either in frontend or backend.
>
>
> Ok.
>
>
>>>
>>>>> Have you considered to implement these through vDPA?
>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>> driver
>>>> or vhost net driver is not provided.
>>>>
>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>> (usecase2 above),
>>>
>>> I see.
>>>
>>>
>>>>    all the boards run Linux. The middle board provides NTB
>>>> functionality and board on either side provides virtio/vhost functionality and
>>>> transfer data using rpmsg.
>>>
>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>> introduce a epf "vhost" transport driver.
>> IMHO we'll need two buses one for frontend and other for backend because the
>> two components can then co-operate/interact with each other to provide a
>> functionality. Though both will seemingly provide similar callbacks, they are
>> both provide symmetrical or complimentary funcitonality and need not be same or
>> identical.
>>
>> Having the same bus can also create sequencing issues.
>>
>> If you look at virtio_dev_probe() of virtio_bus
>>
>> device_features = dev->config->get_features(dev);
>>
>> Now if we use same bus for both front-end and back-end, both will try to
>> get_features when there has been no set_features. Ideally vhost device should
>> be initialized first with the set of features it supports. Vhost and virtio
>> should use "status" and "features" complimentarily and not identically.
>
>
> Yes, but there's no need for doing status/features passthrough in epf vhost
> drivers.b
>
>
>>
>> virtio device (or frontend) cannot be initialized before vhost device (or
>> backend) gets initialized with data such as features. Similarly vhost (backend)
>> cannot access virqueues or buffers before virtio (frontend) sets
>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>> the physical memory for virtqueues are created by virtio (frontend).
>
>
> epf vhost drivers need to implement two devices: vhost(vringh) device and
> virtio device (which is a mediated device). The vhost(vringh) device is doing
> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
> is doing feature negotiation with local virtio drivers. If there're feature
> mismatch, epf vhost drivers and do mediation between them.

Here epf vhost should be initialized with a set of features for it to negotiate
either as vhost device or virtio device no? Where should the initial feature
set for epf vhost come from?
>
>
>>
>>> It will have virtqueues but only used for the communication between itself and
>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>> virtio epf transport drivers. And it needs to do datacopy between virtqueue and
>>> vringh queues.
>>>
>>> It works like:
>>>
>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>> queue/epf>
>>>
>>> The advantages is that there's no need for writing new buses and drivers.
>> I think this will work however there is an addtional copy between vringh queue
>> and virtqueue,
>
>
> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>
> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
> -> virtio pci (RC) <-> virtio rpmsg (RC)

IIUC epf vhost driver (EP) will access virtio ring(2) using vringh? And virtio
ring(2) is created by virtio pci (RC).
>
> What epf vhost driver did is to read from virtio ring(1) about the buffer len
> and addr and them DMA to Linux(RC)?

okay, I made some optimization here where vhost-rpmsg using a helper writes a
buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
were it has to be first written to virtio ring (1).

Thinking how this would look for NTB
virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2) <- virtio
ring(2) -> virtio-rpmsg (HOST2)

Here the NTB(HOST1) will access the virtio ring(2) using vringh?

Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?

I'd like to get clarity on two things in the approach you suggested, one is
features (since epf vhost should ideally be transparent to any virtio driver)
and the other is how certain inputs to virtio device such as number of buffers
be determined.

Thanks again for your suggestions!

Regards
Kishon

>
>
>> in some cases adds latency because of forwarding interrupts
>> between vhost and virtio driver, vhost drivers providing features (which means
>> it has to be aware of which virtio driver will be connected).
>> virtio drivers (front end) generally access the buffers from it's local memory
>> but when in backend it can access over MMIO (like PCI EPF or NTB) or userspace.
>>> Does this make sense?
>> Two copies in my opinion is an issue but lets get others opinions as well.
>
>
> Sure.
>
>
>>
>> Thanks for your suggestions!
>
>
> You're welcome.
>
> Thanks
>
>
>>
>> Regards
>> Kishon
>>
>>> Thanks
>>>
>>>
>>>> Thanks
>>>> Kishon
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>      vhost: Make _feature_ bits a property of vhost device
>>>>>>>      vhost: Introduce standard Linux driver model in VHOST
>>>>>>>      vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>      vringh: Add helpers to access vring in MMIO
>>>>>>>      vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>      vhost: Introduce configfs entry for configuring VHOST
>>>>>>>      virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>      rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>        reading messages
>>>>>>>      rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>      rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>      rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>        rpmsg_internal.h
>>>>>>>      virtio: Add ops to allocate and free buffer
>>>>>>>      rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>        virtio_free_buffer()
>>>>>>>      rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>      samples/rpmsg: Setup delayed work to send message
>>>>>>>      samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>        message
>>>>>>>      rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>      virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>>>>        device
>>>>>>>      PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>      NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>      NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>      NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>>>>
>>>>>>>     Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>     Documentation/rpmsg.txt                       |   56 +
>>>>>>>     drivers/ntb/Kconfig                           |   18 +
>>>>>>>     drivers/ntb/Makefile                          |    2 +
>>>>>>>     drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>     drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>     drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>     drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>     drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>     .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>>>>     drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>     drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>     drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>     drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>     drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>     drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>>>>     drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>     drivers/vhost/Kconfig                         |    1 +
>>>>>>>     drivers/vhost/Makefile                        |    2 +-
>>>>>>>     drivers/vhost/net.c                           |   10 +-
>>>>>>>     drivers/vhost/scsi.c                          |   24 +-
>>>>>>>     drivers/vhost/test.c                          |   17 +-
>>>>>>>     drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>     drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>     drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>     drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>     drivers/vhost/vsock.c                         |   20 +-
>>>>>>>     drivers/virtio/Kconfig                        |    9 +
>>>>>>>     drivers/virtio/Makefile                       |    1 +
>>>>>>>     drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>     drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>     include/linux/mod_devicetable.h               |    6 +
>>>>>>>     include/linux/rpmsg.h                         |    6 +
>>>>>>>     {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>     include/linux/virtio.h                        |    3 +
>>>>>>>     include/linux/virtio_config.h                 |   42 +
>>>>>>>     include/linux/vringh.h                        |   46 +
>>>>>>>     samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>     tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>     39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>     create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>     create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>     create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>     create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>     create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>     create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>     create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>     create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>     rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>
>>>>>>> -- 
>>>>>>> 2.17.1
>>>>>>>
>

2020-07-08 11:23:13

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/7/2020 3:17 PM, Jason Wang wrote:
>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>
>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>       rpmsg communication between two SoCs connected to each other
>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>       between two SoCs connected via NTB
>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>
>>>>>>>> UseCase1 :
>>>>>>>>
>>>>>>>>     VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>          +                               +
>>>>>>>>          |                               |
>>>>>>>>          |                               |
>>>>>>>>          |                               |
>>>>>>>>          |                               |
>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>> |            <----------------->              |
>>>>>>>> |            |                 |              |
>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>> +------------+                 +--------------+
>>>>>>>>
>>>>>>>> UseCase 2:
>>>>>>>>
>>>>>>>>         VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>              +                                                 +
>>>>>>>>              |                                                 |
>>>>>>>>              |                                                 |
>>>>>>>>              |                                                 |
>>>>>>>>              |                                                 |
>>>>>>>>       +------v------+                                   +------v------+
>>>>>>>>       |             |                                   |             |
>>>>>>>>       |    HOST1    |                                   |    HOST2    |
>>>>>>>>       |             |                                   |             |
>>>>>>>>       +------^------+                                   +------^------+
>>>>>>>>              |                                                 |
>>>>>>>>              |                                                 |
>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>> |  |             |                                   |             |  |
>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>> |  |             |                                   |             |  |
>>>>>>>> |  |             |                                   |             |  |
>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>
>>>>>>>> Software Layering:
>>>>>>>>
>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>
>>>>>>>>
>>>>>>>>        +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>        |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>        +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>                |                 |              |              |
>>>>>>>>                |                 |              |              |
>>>>>>>>                |                 |              |              |
>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>> |                            VHOST CORE                                |
>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>             |               |                    |                  |
>>>>>>>>             |               |                    |                  |
>>>>>>>>             |               |                    |                  |
>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>>
>>>>>>>> This was initially proposed here [1]
>>>>>>>>
>>>>>>>> [1] ->
>>>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>> I've pushed the branch
>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>> Thanks
>>>>
>>>>
>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>> work is
>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>> This is about connecting two different HW systems both running Linux and
>>>>> doesn't necessarily involve virtualization.
>>>> Right, this is something similar to VOP
>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>> hardware I guess and VOP use userspace application to implement the device.
>>> I'd also like to point out, this series tries to have communication between two
>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any
>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>
>>> #ls drivers/ntb/hw/
>>> amd  epf  idt  intel  mscc
>>>
>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>> use virtio-vhost communication
>>>
>>> # ls drivers/pci/controller/dwc/*ep*
>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>
>> Thanks for those backgrounds.
>>
>>
>>>>>    So there is no guest or host as in
>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>> one
>>>>> acting as guest and one as host. So one system will provide virtio
>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>> One is
>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>> probably intermediate entity in virtualization?)
>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>> guess "vringh" could be better.
>>> Initially I had named this vringh but later decided to choose vhost instead of
>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>> system, vhost can be that for a backend system. vring can be for accessing
>>> virtqueue and can be used either in frontend or backend.
>>
>> Ok.
>>
>>
>>>>>> Have you considered to implement these through vDPA?
>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>> driver
>>>>> or vhost net driver is not provided.
>>>>>
>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>> (usecase2 above),
>>>> I see.
>>>>
>>>>
>>>>>    all the boards run Linux. The middle board provides NTB
>>>>> functionality and board on either side provides virtio/vhost functionality and
>>>>> transfer data using rpmsg.
>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>>> introduce a epf "vhost" transport driver.
>>> IMHO we'll need two buses one for frontend and other for backend because the
>>> two components can then co-operate/interact with each other to provide a
>>> functionality. Though both will seemingly provide similar callbacks, they are
>>> both provide symmetrical or complimentary funcitonality and need not be same or
>>> identical.
>>>
>>> Having the same bus can also create sequencing issues.
>>>
>>> If you look at virtio_dev_probe() of virtio_bus
>>>
>>> device_features = dev->config->get_features(dev);
>>>
>>> Now if we use same bus for both front-end and back-end, both will try to
>>> get_features when there has been no set_features. Ideally vhost device should
>>> be initialized first with the set of features it supports. Vhost and virtio
>>> should use "status" and "features" complimentarily and not identically.
>>
>> Yes, but there's no need for doing status/features passthrough in epf vhost
>> drivers.b
>>
>>
>>> virtio device (or frontend) cannot be initialized before vhost device (or
>>> backend) gets initialized with data such as features. Similarly vhost (backend)
>>> cannot access virqueues or buffers before virtio (frontend) sets
>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>>> the physical memory for virtqueues are created by virtio (frontend).
>>
>> epf vhost drivers need to implement two devices: vhost(vringh) device and
>> virtio device (which is a mediated device). The vhost(vringh) device is doing
>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
>> is doing feature negotiation with local virtio drivers. If there're feature
>> mismatch, epf vhost drivers and do mediation between them.
> Here epf vhost should be initialized with a set of features for it to negotiate
> either as vhost device or virtio device no? Where should the initial feature
> set for epf vhost come from?


I think it can work as:

1) Having an initial features (hard coded in the code) set X in epf vhost
2) Using this X for both virtio device and vhost(vringh) device
3) local virtio driver will negotiate with virtio device with feature set Y
4) remote virtio driver will negotiate with vringh device with feature set Z
5) mediate between feature Y and feature Z since both Y and Z are a
subset of X


>>
>>>> It will have virtqueues but only used for the communication between itself and
>>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>>> virtio epf transport drivers. And it needs to do datacopy between virtqueue and
>>>> vringh queues.
>>>>
>>>> It works like:
>>>>
>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>>> queue/epf>
>>>>
>>>> The advantages is that there's no need for writing new buses and drivers.
>>> I think this will work however there is an addtional copy between vringh queue
>>> and virtqueue,
>>
>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>>
>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
>> -> virtio pci (RC) <-> virtio rpmsg (RC)
> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?


Yes.


> And virtio
> ring(2) is created by virtio pci (RC).


Yes.


>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
>> and addr and them DMA to Linux(RC)?
> okay, I made some optimization here where vhost-rpmsg using a helper writes a
> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
> were it has to be first written to virtio ring (1).
>
> Thinking how this would look for NTB
> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2) <- virtio
> ring(2) -> virtio-rpmsg (HOST2)
>
> Here the NTB(HOST1) will access the virtio ring(2) using vringh?


Yes, I think so it needs to use vring to access virtio ring (1) as well.


>
> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?


Yes.


>
> I'd like to get clarity on two things in the approach you suggested, one is
> features (since epf vhost should ideally be transparent to any virtio driver)


We can have have an array of pre-defined features indexed by virtio
device id in the code.


> and the other is how certain inputs to virtio device such as number of buffers
> be determined.


We can start from hard coded the value like 256, or introduce some API
for user to change the value.


>
> Thanks again for your suggestions!


You're welcome.

Note that I just want to check whether or not we can reuse the virtio
bus/driver. It's something similar to what you proposed in Software
Layering but we just replace "vhost core" with "virtio bus" and move the
vhost core below epf/ntb/platform transport.

Thanks


>
> Regards
> Kishon
>
>>
>>> in some cases adds latency because of forwarding interrupts
>>> between vhost and virtio driver, vhost drivers providing features (which means
>>> it has to be aware of which virtio driver will be connected).
>>> virtio drivers (front end) generally access the buffers from it's local memory
>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or userspace.
>>>> Does this make sense?
>>> Two copies in my opinion is an issue but lets get others opinions as well.
>>
>> Sure.
>>
>>
>>> Thanks for your suggestions!
>>
>> You're welcome.
>>
>> Thanks
>>
>>
>>> Regards
>>> Kishon
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks
>>>>> Kishon
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>>      vhost: Make _feature_ bits a property of vhost device
>>>>>>>>      vhost: Introduce standard Linux driver model in VHOST
>>>>>>>>      vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>>      vringh: Add helpers to access vring in MMIO
>>>>>>>>      vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>>      vhost: Introduce configfs entry for configuring VHOST
>>>>>>>>      virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>>      rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>>        reading messages
>>>>>>>>      rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>>      rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>>      rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>>        rpmsg_internal.h
>>>>>>>>      virtio: Add ops to allocate and free buffer
>>>>>>>>      rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>>        virtio_free_buffer()
>>>>>>>>      rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>>      samples/rpmsg: Setup delayed work to send message
>>>>>>>>      samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>>        message
>>>>>>>>      rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>>      virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>>>>>        device
>>>>>>>>      PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>>      NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>>      NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>>      NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>>>>>
>>>>>>>>     Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>>     Documentation/rpmsg.txt                       |   56 +
>>>>>>>>     drivers/ntb/Kconfig                           |   18 +
>>>>>>>>     drivers/ntb/Makefile                          |    2 +
>>>>>>>>     drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>>     drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>>     drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>>     drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>>     drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>>     .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>>>>>     drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>>     drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>>     drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>>     drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>>     drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>>     drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>>>>>     drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>>     drivers/vhost/Kconfig                         |    1 +
>>>>>>>>     drivers/vhost/Makefile                        |    2 +-
>>>>>>>>     drivers/vhost/net.c                           |   10 +-
>>>>>>>>     drivers/vhost/scsi.c                          |   24 +-
>>>>>>>>     drivers/vhost/test.c                          |   17 +-
>>>>>>>>     drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>>     drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>>     drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>>     drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>>     drivers/vhost/vsock.c                         |   20 +-
>>>>>>>>     drivers/virtio/Kconfig                        |    9 +
>>>>>>>>     drivers/virtio/Makefile                       |    1 +
>>>>>>>>     drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>>     drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>>     include/linux/mod_devicetable.h               |    6 +
>>>>>>>>     include/linux/rpmsg.h                         |    6 +
>>>>>>>>     {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>>     include/linux/virtio.h                        |    3 +
>>>>>>>>     include/linux/virtio_config.h                 |   42 +
>>>>>>>>     include/linux/vringh.h                        |   46 +
>>>>>>>>     samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>>     tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>>     39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>>     create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>>     create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>>     create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>>     create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>>     create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>>     create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>>     create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>>     create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>>     rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>>

2020-07-08 13:15:25

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Jason,

On 7/8/2020 4:52 PM, Jason Wang wrote:
>
> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>> Hi Jason,
>>>>
>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>
>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>>        rpmsg communication between two SoCs connected to each other
>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>>        between two SoCs connected via NTB
>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>
>>>>>>>>> UseCase1 :
>>>>>>>>>
>>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>>           +                               +
>>>>>>>>>           |                               |
>>>>>>>>>           |                               |
>>>>>>>>>           |                               |
>>>>>>>>>           |                               |
>>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>> |            <----------------->              |
>>>>>>>>> |            |                 |              |
>>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>>> +------------+                 +--------------+
>>>>>>>>>
>>>>>>>>> UseCase 2:
>>>>>>>>>
>>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>>               +                                                 +
>>>>>>>>>               |                                                 |
>>>>>>>>>               |                                                 |
>>>>>>>>>               |                                                 |
>>>>>>>>>               |                                                 |
>>>>>>>>>        +------v------+                                   +------v------+
>>>>>>>>>        |             |                                   |             |
>>>>>>>>>        |    HOST1    |                                   |    HOST2    |
>>>>>>>>>        |             |                                   |             |
>>>>>>>>>        +------^------+                                   +------^------+
>>>>>>>>>               |                                                 |
>>>>>>>>>               |                                                 |
>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>
>>>>>>>>> Software Layering:
>>>>>>>>>
>>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>>                 |                 |              |              |
>>>>>>>>>                 |                 |              |              |
>>>>>>>>>                 |                 |              |              |
>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>> |                            VHOST CORE                                |
>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>              |               |                    |                  |
>>>>>>>>>              |               |                    |                  |
>>>>>>>>>              |               |                    |                  |
>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>>>
>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>
>>>>>>>>> [1] ->
>>>>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>>> I've pushed the branch
>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>> Thanks
>>>>>
>>>>>
>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>>> work is
>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>> This is about connecting two different HW systems both running Linux and
>>>>>> doesn't necessarily involve virtualization.
>>>>> Right, this is something similar to VOP
>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>>> hardware I guess and VOP use userspace application to implement the device.
>>>> I'd also like to point out, this series tries to have communication between
>>>> two
>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
>>>> any
>>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>>
>>>> #ls drivers/ntb/hw/
>>>> amd  epf  idt  intel  mscc
>>>>
>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>>> use virtio-vhost communication
>>>>
>>>> # ls drivers/pci/controller/dwc/*ep*
>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>
>>> Thanks for those backgrounds.
>>>
>>>
>>>>>>     So there is no guest or host as in
>>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>>> one
>>>>>> acting as guest and one as host. So one system will provide virtio
>>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>>> One is
>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>>> probably intermediate entity in virtualization?)
>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>>> guess "vringh" could be better.
>>>> Initially I had named this vringh but later decided to choose vhost instead of
>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>>> system, vhost can be that for a backend system. vring can be for accessing
>>>> virtqueue and can be used either in frontend or backend.
>>>
>>> Ok.
>>>
>>>
>>>>>>> Have you considered to implement these through vDPA?
>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>>> driver
>>>>>> or vhost net driver is not provided.
>>>>>>
>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>>> (usecase2 above),
>>>>> I see.
>>>>>
>>>>>
>>>>>>     all the boards run Linux. The middle board provides NTB
>>>>>> functionality and board on either side provides virtio/vhost
>>>>>> functionality and
>>>>>> transfer data using rpmsg.
>>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>>>> introduce a epf "vhost" transport driver.
>>>> IMHO we'll need two buses one for frontend and other for backend because the
>>>> two components can then co-operate/interact with each other to provide a
>>>> functionality. Though both will seemingly provide similar callbacks, they are
>>>> both provide symmetrical or complimentary funcitonality and need not be
>>>> same or
>>>> identical.
>>>>
>>>> Having the same bus can also create sequencing issues.
>>>>
>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>
>>>> device_features = dev->config->get_features(dev);
>>>>
>>>> Now if we use same bus for both front-end and back-end, both will try to
>>>> get_features when there has been no set_features. Ideally vhost device should
>>>> be initialized first with the set of features it supports. Vhost and virtio
>>>> should use "status" and "features" complimentarily and not identically.
>>>
>>> Yes, but there's no need for doing status/features passthrough in epf vhost
>>> drivers.b
>>>
>>>
>>>> virtio device (or frontend) cannot be initialized before vhost device (or
>>>> backend) gets initialized with data such as features. Similarly vhost
>>>> (backend)
>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>>>> the physical memory for virtqueues are created by virtio (frontend).
>>>
>>> epf vhost drivers need to implement two devices: vhost(vringh) device and
>>> virtio device (which is a mediated device). The vhost(vringh) device is doing
>>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
>>> is doing feature negotiation with local virtio drivers. If there're feature
>>> mismatch, epf vhost drivers and do mediation between them.
>> Here epf vhost should be initialized with a set of features for it to negotiate
>> either as vhost device or virtio device no? Where should the initial feature
>> set for epf vhost come from?
>
>
> I think it can work as:
>
> 1) Having an initial features (hard coded in the code) set X in epf vhost
> 2) Using this X for both virtio device and vhost(vringh) device
> 3) local virtio driver will negotiate with virtio device with feature set Y
> 4) remote virtio driver will negotiate with vringh device with feature set Z
> 5) mediate between feature Y and feature Z since both Y and Z are a subset of X
>
>

okay. I'm also thinking if we could have configfs for configuring this. Anyways
we could find different approaches of configuring this.
>>>
>>>>> It will have virtqueues but only used for the communication between itself
>>>>> and
>>>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>>>> virtio epf transport drivers. And it needs to do datacopy between
>>>>> virtqueue and
>>>>> vringh queues.
>>>>>
>>>>> It works like:
>>>>>
>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>>>> queue/epf>
>>>>>
>>>>> The advantages is that there's no need for writing new buses and drivers.
>>>> I think this will work however there is an addtional copy between vringh queue
>>>> and virtqueue,
>>>
>>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>>>
>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
>>> -> virtio pci (RC) <-> virtio rpmsg (RC)
>> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?
>
>
> Yes.
>
>
>> And virtio
>> ring(2) is created by virtio pci (RC).
>
>
> Yes.
>
>
>>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
>>> and addr and them DMA to Linux(RC)?
>> okay, I made some optimization here where vhost-rpmsg using a helper writes a
>> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
>> were it has to be first written to virtio ring (1).
>>
>> Thinking how this would look for NTB
>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
>> ring(2) -> virtio-rpmsg (HOST2)
>>
>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>
>
> Yes, I think so it needs to use vring to access virtio ring (1) as well.

NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to
use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.
>
>
>>
>> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?
>
>
> Yes.

okay, I haven't looked at this but the backend of virtio_blk should access an
actual storage device no?
>
>
>>
>> I'd like to get clarity on two things in the approach you suggested, one is
>> features (since epf vhost should ideally be transparent to any virtio driver)
>
>
> We can have have an array of pre-defined features indexed by virtio device id
> in the code.
>
>
>> and the other is how certain inputs to virtio device such as number of buffers
>> be determined.
>
>
> We can start from hard coded the value like 256, or introduce some API for user
> to change the value.
>
>
>>
>> Thanks again for your suggestions!
>
>
> You're welcome.
>
> Note that I just want to check whether or not we can reuse the virtio
> bus/driver. It's something similar to what you proposed in Software Layering
> but we just replace "vhost core" with "virtio bus" and move the vhost core
> below epf/ntb/platform transport.

Got it. My initial design was based on my understanding of your comments [1].

I'll try to create something based on your proposed design here.

Regards
Kishon

[1] ->
https://lore.kernel.org/linux-pci/[email protected]/
>
> Thanks
>
>
>>
>> Regards
>> Kishon
>>
>>>
>>>> in some cases adds latency because of forwarding interrupts
>>>> between vhost and virtio driver, vhost drivers providing features (which means
>>>> it has to be aware of which virtio driver will be connected).
>>>> virtio drivers (front end) generally access the buffers from it's local memory
>>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or
>>>> userspace.
>>>>> Does this make sense?
>>>> Two copies in my opinion is an issue but lets get others opinions as well.
>>>
>>> Sure.
>>>
>>>
>>>> Thanks for your suggestions!
>>>
>>> You're welcome.
>>>
>>> Thanks
>>>
>>>
>>>> Regards
>>>> Kishon
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> Thanks
>>>>>> Kishon
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>>>       vhost: Make _feature_ bits a property of vhost device
>>>>>>>>>       vhost: Introduce standard Linux driver model in VHOST
>>>>>>>>>       vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>>>       vringh: Add helpers to access vring in MMIO
>>>>>>>>>       vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>>>       vhost: Introduce configfs entry for configuring VHOST
>>>>>>>>>       virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>>>         reading messages
>>>>>>>>>       rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>>>         rpmsg_internal.h
>>>>>>>>>       virtio: Add ops to allocate and free buffer
>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>>>         virtio_free_buffer()
>>>>>>>>>       rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>>>       samples/rpmsg: Setup delayed work to send message
>>>>>>>>>       samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>>>         message
>>>>>>>>>       rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>>>       virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe
>>>>>>>>> Endpoint
>>>>>>>>>         device
>>>>>>>>>       PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>>>       NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>>>       NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>>>       NTB: Describe the ntb_virtio and ntb_vhost client in the
>>>>>>>>> documentation
>>>>>>>>>
>>>>>>>>>      Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>>>      Documentation/rpmsg.txt                       |   56 +
>>>>>>>>>      drivers/ntb/Kconfig                           |   18 +
>>>>>>>>>      drivers/ntb/Makefile                          |    2 +
>>>>>>>>>      drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>>>      drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>>>      drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>>>      drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>>>      drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>>>      .../pci/endpoint/functions/pci-epf-vhost.c    | 1144
>>>>>>>>> ++++++++++++++++
>>>>>>>>>      drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>>>      drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>>>      drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>>>      drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>>>      drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>>>      drivers/rpmsg/vhost_rpmsg_bus.c               | 1151
>>>>>>>>> +++++++++++++++++
>>>>>>>>>      drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>>>      drivers/vhost/Kconfig                         |    1 +
>>>>>>>>>      drivers/vhost/Makefile                        |    2 +-
>>>>>>>>>      drivers/vhost/net.c                           |   10 +-
>>>>>>>>>      drivers/vhost/scsi.c                          |   24 +-
>>>>>>>>>      drivers/vhost/test.c                          |   17 +-
>>>>>>>>>      drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>>>      drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>>>      drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>>>      drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>>>      drivers/vhost/vsock.c                         |   20 +-
>>>>>>>>>      drivers/virtio/Kconfig                        |    9 +
>>>>>>>>>      drivers/virtio/Makefile                       |    1 +
>>>>>>>>>      drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>>>      drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>>>      include/linux/mod_devicetable.h               |    6 +
>>>>>>>>>      include/linux/rpmsg.h                         |    6 +
>>>>>>>>>      {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>>>      include/linux/virtio.h                        |    3 +
>>>>>>>>>      include/linux/virtio_config.h                 |   42 +
>>>>>>>>>      include/linux/vringh.h                        |   46 +
>>>>>>>>>      samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>>>      tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>>>      39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>>>      create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>>>      create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>>>      create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>>>      create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>>>      create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>>>      create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>>>      rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> 2.17.1
>>>>>>>>>
>

2020-07-09 06:28:40

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/8/2020 4:52 PM, Jason Wang wrote:
>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>
>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>>>        rpmsg communication between two SoCs connected to each other
>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>>>        between two SoCs connected via NTB
>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>
>>>>>>>>>> UseCase1 :
>>>>>>>>>>
>>>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>>>           +                               +
>>>>>>>>>>           |                               |
>>>>>>>>>>           |                               |
>>>>>>>>>>           |                               |
>>>>>>>>>>           |                               |
>>>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>> |            <----------------->              |
>>>>>>>>>> |            |                 |              |
>>>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>>>> +------------+                 +--------------+
>>>>>>>>>>
>>>>>>>>>> UseCase 2:
>>>>>>>>>>
>>>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>>>               +                                                 +
>>>>>>>>>>               |                                                 |
>>>>>>>>>>               |                                                 |
>>>>>>>>>>               |                                                 |
>>>>>>>>>>               |                                                 |
>>>>>>>>>>        +------v------+                                   +------v------+
>>>>>>>>>>        |             |                                   |             |
>>>>>>>>>>        |    HOST1    |                                   |    HOST2    |
>>>>>>>>>>        |             |                                   |             |
>>>>>>>>>>        +------^------+                                   +------^------+
>>>>>>>>>>               |                                                 |
>>>>>>>>>>               |                                                 |
>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>
>>>>>>>>>> Software Layering:
>>>>>>>>>>
>>>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>> |                            VHOST CORE                                |
>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>>>>
>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>
>>>>>>>>>> [1] ->
>>>>>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>>>> I've pushed the branch
>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>>>> work is
>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>> This is about connecting two different HW systems both running Linux and
>>>>>>> doesn't necessarily involve virtualization.
>>>>>> Right, this is something similar to VOP
>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>>>> hardware I guess and VOP use userspace application to implement the device.
>>>>> I'd also like to point out, this series tries to have communication between
>>>>> two
>>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
>>>>> any
>>>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>>>
>>>>> #ls drivers/ntb/hw/
>>>>> amd  epf  idt  intel  mscc
>>>>>
>>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>>>> use virtio-vhost communication
>>>>>
>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>> Thanks for those backgrounds.
>>>>
>>>>
>>>>>>>     So there is no guest or host as in
>>>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>>>> one
>>>>>>> acting as guest and one as host. So one system will provide virtio
>>>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>>>> One is
>>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>>>> probably intermediate entity in virtualization?)
>>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>>>> guess "vringh" could be better.
>>>>> Initially I had named this vringh but later decided to choose vhost instead of
>>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>>>> system, vhost can be that for a backend system. vring can be for accessing
>>>>> virtqueue and can be used either in frontend or backend.
>>>> Ok.
>>>>
>>>>
>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>>>> driver
>>>>>>> or vhost net driver is not provided.
>>>>>>>
>>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>>>> (usecase2 above),
>>>>>> I see.
>>>>>>
>>>>>>
>>>>>>>     all the boards run Linux. The middle board provides NTB
>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>> functionality and
>>>>>>> transfer data using rpmsg.
>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>>>>> introduce a epf "vhost" transport driver.
>>>>> IMHO we'll need two buses one for frontend and other for backend because the
>>>>> two components can then co-operate/interact with each other to provide a
>>>>> functionality. Though both will seemingly provide similar callbacks, they are
>>>>> both provide symmetrical or complimentary funcitonality and need not be
>>>>> same or
>>>>> identical.
>>>>>
>>>>> Having the same bus can also create sequencing issues.
>>>>>
>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>
>>>>> device_features = dev->config->get_features(dev);
>>>>>
>>>>> Now if we use same bus for both front-end and back-end, both will try to
>>>>> get_features when there has been no set_features. Ideally vhost device should
>>>>> be initialized first with the set of features it supports. Vhost and virtio
>>>>> should use "status" and "features" complimentarily and not identically.
>>>> Yes, but there's no need for doing status/features passthrough in epf vhost
>>>> drivers.b
>>>>
>>>>
>>>>> virtio device (or frontend) cannot be initialized before vhost device (or
>>>>> backend) gets initialized with data such as features. Similarly vhost
>>>>> (backend)
>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>>>>> the physical memory for virtqueues are created by virtio (frontend).
>>>> epf vhost drivers need to implement two devices: vhost(vringh) device and
>>>> virtio device (which is a mediated device). The vhost(vringh) device is doing
>>>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
>>>> is doing feature negotiation with local virtio drivers. If there're feature
>>>> mismatch, epf vhost drivers and do mediation between them.
>>> Here epf vhost should be initialized with a set of features for it to negotiate
>>> either as vhost device or virtio device no? Where should the initial feature
>>> set for epf vhost come from?
>>
>> I think it can work as:
>>
>> 1) Having an initial features (hard coded in the code) set X in epf vhost
>> 2) Using this X for both virtio device and vhost(vringh) device
>> 3) local virtio driver will negotiate with virtio device with feature set Y
>> 4) remote virtio driver will negotiate with vringh device with feature set Z
>> 5) mediate between feature Y and feature Z since both Y and Z are a subset of X
>>
>>
> okay. I'm also thinking if we could have configfs for configuring this. Anyways
> we could find different approaches of configuring this.


Yes, and I think some management API is needed even in the design of
your "Software Layering". In that figure, rpmsg vhost need some pre-set
or hard-coded features.


>>>>>> It will have virtqueues but only used for the communication between itself
>>>>>> and
>>>>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>>>>> virtio epf transport drivers. And it needs to do datacopy between
>>>>>> virtqueue and
>>>>>> vringh queues.
>>>>>>
>>>>>> It works like:
>>>>>>
>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>>>>> queue/epf>
>>>>>>
>>>>>> The advantages is that there's no need for writing new buses and drivers.
>>>>> I think this will work however there is an addtional copy between vringh queue
>>>>> and virtqueue,
>>>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>>>>
>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
>>>> -> virtio pci (RC) <-> virtio rpmsg (RC)
>>> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?
>>
>> Yes.
>>
>>
>>> And virtio
>>> ring(2) is created by virtio pci (RC).
>>
>> Yes.
>>
>>
>>>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
>>>> and addr and them DMA to Linux(RC)?
>>> okay, I made some optimization here where vhost-rpmsg using a helper writes a
>>> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
>>> were it has to be first written to virtio ring (1).
>>>
>>> Thinking how this would look for NTB
>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
>>> ring(2) -> virtio-rpmsg (HOST2)
>>>
>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>
>> Yes, I think so it needs to use vring to access virtio ring (1) as well.
> NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to
> use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.


Right.


>>
>>> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?
>>
>> Yes.
> okay, I haven't looked at this but the backend of virtio_blk should access an
> actual storage device no?


Good point, for non-peer device like storage. There's probably no need
for it to be registered on the virtio bus and it might be better to
behave as you proposed.

Just to make sure I understand the design, how is VHOST SCSI expected to
work in your proposal, does it have a device for file as a backend?


>>
>>> I'd like to get clarity on two things in the approach you suggested, one is
>>> features (since epf vhost should ideally be transparent to any virtio driver)
>>
>> We can have have an array of pre-defined features indexed by virtio device id
>> in the code.
>>
>>
>>> and the other is how certain inputs to virtio device such as number of buffers
>>> be determined.
>>
>> We can start from hard coded the value like 256, or introduce some API for user
>> to change the value.
>>
>>
>>> Thanks again for your suggestions!
>>
>> You're welcome.
>>
>> Note that I just want to check whether or not we can reuse the virtio
>> bus/driver. It's something similar to what you proposed in Software Layering
>> but we just replace "vhost core" with "virtio bus" and move the vhost core
>> below epf/ntb/platform transport.
> Got it. My initial design was based on my understanding of your comments [1].


Yes, but that's just for a networking device. If we want something more
generic, it may require more thought (bus etc).


>
> I'll try to create something based on your proposed design here.


Sure, but for coding, we'd better wait for other's opinion here.

Thanks


>
> Regards
> Kishon
>
> [1] ->
> https://lore.kernel.org/linux-pci/[email protected]/
>> Thanks
>>
>>
>>> Regards
>>> Kishon
>>>
>>>>> in some cases adds latency because of forwarding interrupts
>>>>> between vhost and virtio driver, vhost drivers providing features (which means
>>>>> it has to be aware of which virtio driver will be connected).
>>>>> virtio drivers (front end) generally access the buffers from it's local memory
>>>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or
>>>>> userspace.
>>>>>> Does this make sense?
>>>>> Two copies in my opinion is an issue but lets get others opinions as well.
>>>> Sure.
>>>>
>>>>
>>>>> Thanks for your suggestions!
>>>> You're welcome.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> Regards
>>>>> Kishon
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>> Kishon
>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>>>>       vhost: Make _feature_ bits a property of vhost device
>>>>>>>>>>       vhost: Introduce standard Linux driver model in VHOST
>>>>>>>>>>       vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>>>>       vringh: Add helpers to access vring in MMIO
>>>>>>>>>>       vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>>>>       vhost: Introduce configfs entry for configuring VHOST
>>>>>>>>>>       virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>>>>         reading messages
>>>>>>>>>>       rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>>>>         rpmsg_internal.h
>>>>>>>>>>       virtio: Add ops to allocate and free buffer
>>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>>>>         virtio_free_buffer()
>>>>>>>>>>       rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>>>>       samples/rpmsg: Setup delayed work to send message
>>>>>>>>>>       samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>>>>         message
>>>>>>>>>>       rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>>>>       virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe
>>>>>>>>>> Endpoint
>>>>>>>>>>         device
>>>>>>>>>>       PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>>>>       NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>>>>       NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>>>>       NTB: Describe the ntb_virtio and ntb_vhost client in the
>>>>>>>>>> documentation
>>>>>>>>>>
>>>>>>>>>>      Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>>>>      Documentation/rpmsg.txt                       |   56 +
>>>>>>>>>>      drivers/ntb/Kconfig                           |   18 +
>>>>>>>>>>      drivers/ntb/Makefile                          |    2 +
>>>>>>>>>>      drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>>>>      drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>>>>      drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>>>>      drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>>>>      drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>>>>      .../pci/endpoint/functions/pci-epf-vhost.c    | 1144
>>>>>>>>>> ++++++++++++++++
>>>>>>>>>>      drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>>>>      drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>>>>      drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>>>>      drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>>>>      drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>>>>      drivers/rpmsg/vhost_rpmsg_bus.c               | 1151
>>>>>>>>>> +++++++++++++++++
>>>>>>>>>>      drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>>>>      drivers/vhost/Kconfig                         |    1 +
>>>>>>>>>>      drivers/vhost/Makefile                        |    2 +-
>>>>>>>>>>      drivers/vhost/net.c                           |   10 +-
>>>>>>>>>>      drivers/vhost/scsi.c                          |   24 +-
>>>>>>>>>>      drivers/vhost/test.c                          |   17 +-
>>>>>>>>>>      drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>>>>      drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>>>>      drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>>>>      drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>>>>      drivers/vhost/vsock.c                         |   20 +-
>>>>>>>>>>      drivers/virtio/Kconfig                        |    9 +
>>>>>>>>>>      drivers/virtio/Makefile                       |    1 +
>>>>>>>>>>      drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>>>>      drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>>>>      include/linux/mod_devicetable.h               |    6 +
>>>>>>>>>>      include/linux/rpmsg.h                         |    6 +
>>>>>>>>>>      {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>>>>      include/linux/virtio.h                        |    3 +
>>>>>>>>>>      include/linux/virtio_config.h                 |   42 +
>>>>>>>>>>      include/linux/vringh.h                        |   46 +
>>>>>>>>>>      samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>>>>      tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>>>>      39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>>>>      create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>>>>      create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>>>>      create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>>>>      create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>>>>      create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>>>>      create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>>>>      rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> 2.17.1
>>>>>>>>>>

2020-07-15 17:15:53

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hey Kishon,

On Wed, Jul 08, 2020 at 06:43:45PM +0530, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/8/2020 4:52 PM, Jason Wang wrote:
> >
> > On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
> >> Hi Jason,
> >>
> >> On 7/7/2020 3:17 PM, Jason Wang wrote:
> >>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
> >>>> Hi Jason,
> >>>>
> >>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
> >>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
> >>>>>> Hi Jason,
> >>>>>>
> >>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
> >>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
> >>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
> >>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>> communication over MMIO. This series enables rpmsg communication between
> >>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>
> >>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
> >>>>>>>>>        rpmsg communication between two SoCs connected to each other
> >>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
> >>>>>>>>>        between two SoCs connected via NTB
> >>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>
> >>>>>>>>> UseCase1 :
> >>>>>>>>>
> >>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
> >>>>>>>>>           +                               +
> >>>>>>>>>           |                               |
> >>>>>>>>>           |                               |
> >>>>>>>>>           |                               |
> >>>>>>>>>           |                               |
> >>>>>>>>> +-----v------+                 +------v-------+
> >>>>>>>>> |   Linux    |                 |     Linux    |
> >>>>>>>>> |  Endpoint  |                 | Root Complex |
> >>>>>>>>> |            <----------------->              |
> >>>>>>>>> |            |                 |              |
> >>>>>>>>> |    SOC1    |                 |     SOC2     |
> >>>>>>>>> +------------+                 +--------------+
> >>>>>>>>>
> >>>>>>>>> UseCase 2:
> >>>>>>>>>
> >>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
> >>>>>>>>>               +                                                 +
> >>>>>>>>>               |                                                 |
> >>>>>>>>>               |                                                 |
> >>>>>>>>>               |                                                 |
> >>>>>>>>>               |                                                 |
> >>>>>>>>>        +------v------+                                   +------v------+
> >>>>>>>>>        |             |                                   |             |
> >>>>>>>>>        |    HOST1    |                                   |    HOST2    |
> >>>>>>>>>        |             |                                   |             |
> >>>>>>>>>        +------^------+                                   +------^------+
> >>>>>>>>>               |                                                 |
> >>>>>>>>>               |                                                 |
> >>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>> |  +------v------+                                   +------v------+  |
> >>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>> |  |     EP      |                                   |     EP      |  |
> >>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
> >>>>>>>>> |  |             <----------------------------------->             |  |
> >>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
> >>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
> >>>>>>>>> |  +-------------+                                   +-------------+  |
> >>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>>
> >>>>>>>>> Software Layering:
> >>>>>>>>>
> >>>>>>>>> The high-level SW layering should look something like below. This series
> >>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
> >>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
> >>>>>>>>> device, user) can use any of the vhost client driver.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
> >>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
> >>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
> >>>>>>>>>                 |                 |              |              |
> >>>>>>>>>                 |                 |              |              |
> >>>>>>>>>                 |                 |              |              |
> >>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
> >>>>>>>>> |                            VHOST CORE                                |
> >>>>>>>>> +--------^---------------^--------------------^------------------^-----+
> >>>>>>>>>              |               |                    |                  |
> >>>>>>>>>              |               |                    |                  |
> >>>>>>>>>              |               |                    |                  |
> >>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
> >>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
> >>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
> >>>>>>>>>
> >>>>>>>>> This was initially proposed here [1]
> >>>>>>>>>
> >>>>>>>>> [1] ->
> >>>>>>>>> https://lore.kernel.org/r/[email protected]
> >>>>>>>> I find this very interesting. A huge patchset so will take a bit
> >>>>>>>> to review, but I certainly plan to do that. Thanks!
> >>>>>>> Yes, it would be better if there's a git branch for us to have a look.
> >>>>>> I've pushed the branch
> >>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
> >>>>>>> work is
> >>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
> >>>>>> This is about connecting two different HW systems both running Linux and
> >>>>>> doesn't necessarily involve virtualization.
> >>>>> Right, this is something similar to VOP
> >>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
> >>>>> hardware I guess and VOP use userspace application to implement the device.
> >>>> I'd also like to point out, this series tries to have communication between
> >>>> two
> >>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
> >>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
> >>>> any
> >>>> of the HW in NTB below should be able to use a virtio-vhost communication
> >>>>
> >>>> #ls drivers/ntb/hw/
> >>>> amd  epf  idt  intel  mscc
> >>>>
> >>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
> >>>> function driver and hence any SoC that supports configurable PCIe endpoint can
> >>>> use virtio-vhost communication
> >>>>
> >>>> # ls drivers/pci/controller/dwc/*ep*
> >>>> drivers/pci/controller/dwc/pcie-designware-ep.c
> >>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> >>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>>
> >>> Thanks for those backgrounds.
> >>>
> >>>
> >>>>>>     So there is no guest or host as in
> >>>>>> virtualization but two entirely different systems connected via PCIe cable,
> >>>>>> one
> >>>>>> acting as guest and one as host. So one system will provide virtio
> >>>>>> functionality reserving memory for virtqueues and the other provides vhost
> >>>>>> functionality providing a way to access the virtqueues in virtio memory.
> >>>>>> One is
> >>>>>> source and the other is sink and there is no intermediate entity. (vhost was
> >>>>>> probably intermediate entity in virtualization?)
> >>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
> >>>>> me since it was use for implementing virtio backend for userspace drivers. I
> >>>>> guess "vringh" could be better.
> >>>> Initially I had named this vringh but later decided to choose vhost instead of
> >>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
> >>>> now resides in an entirely different system. Whatever virtio is for a frontend
> >>>> system, vhost can be that for a backend system. vring can be for accessing
> >>>> virtqueue and can be used either in frontend or backend.
> >>>
> >>> Ok.
> >>>
> >>>
> >>>>>>> Have you considered to implement these through vDPA?
> >>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
> >>>>>> driver
> >>>>>> or vhost net driver is not provided.
> >>>>>>
> >>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
> >>>>>> (usecase2 above),
> >>>>> I see.
> >>>>>
> >>>>>
> >>>>>>     all the boards run Linux. The middle board provides NTB
> >>>>>> functionality and board on either side provides virtio/vhost
> >>>>>> functionality and
> >>>>>> transfer data using rpmsg.
> >>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
> >>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
> >>>>> introduce a epf "vhost" transport driver.
> >>>> IMHO we'll need two buses one for frontend and other for backend because the
> >>>> two components can then co-operate/interact with each other to provide a
> >>>> functionality. Though both will seemingly provide similar callbacks, they are
> >>>> both provide symmetrical or complimentary funcitonality and need not be
> >>>> same or
> >>>> identical.
> >>>>
> >>>> Having the same bus can also create sequencing issues.
> >>>>
> >>>> If you look at virtio_dev_probe() of virtio_bus
> >>>>
> >>>> device_features = dev->config->get_features(dev);
> >>>>
> >>>> Now if we use same bus for both front-end and back-end, both will try to
> >>>> get_features when there has been no set_features. Ideally vhost device should
> >>>> be initialized first with the set of features it supports. Vhost and virtio
> >>>> should use "status" and "features" complimentarily and not identically.
> >>>
> >>> Yes, but there's no need for doing status/features passthrough in epf vhost
> >>> drivers.b
> >>>
> >>>
> >>>> virtio device (or frontend) cannot be initialized before vhost device (or
> >>>> backend) gets initialized with data such as features. Similarly vhost
> >>>> (backend)
> >>>> cannot access virqueues or buffers before virtio (frontend) sets
> >>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
> >>>> the physical memory for virtqueues are created by virtio (frontend).
> >>>
> >>> epf vhost drivers need to implement two devices: vhost(vringh) device and
> >>> virtio device (which is a mediated device). The vhost(vringh) device is doing
> >>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
> >>> is doing feature negotiation with local virtio drivers. If there're feature
> >>> mismatch, epf vhost drivers and do mediation between them.
> >> Here epf vhost should be initialized with a set of features for it to negotiate
> >> either as vhost device or virtio device no? Where should the initial feature
> >> set for epf vhost come from?
> >
> >
> > I think it can work as:
> >
> > 1) Having an initial features (hard coded in the code) set X in epf vhost
> > 2) Using this X for both virtio device and vhost(vringh) device
> > 3) local virtio driver will negotiate with virtio device with feature set Y
> > 4) remote virtio driver will negotiate with vringh device with feature set Z
> > 5) mediate between feature Y and feature Z since both Y and Z are a subset of X
> >
> >
>
> okay. I'm also thinking if we could have configfs for configuring this. Anyways
> we could find different approaches of configuring this.
> >>>
> >>>>> It will have virtqueues but only used for the communication between itself
> >>>>> and
> >>>>> uppter virtio driver. And it will have vringh queues which will be probe by
> >>>>> virtio epf transport drivers. And it needs to do datacopy between
> >>>>> virtqueue and
> >>>>> vringh queues.
> >>>>>
> >>>>> It works like:
> >>>>>
> >>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
> >>>>> queue/epf>
> >>>>>
> >>>>> The advantages is that there's no need for writing new buses and drivers.
> >>>> I think this will work however there is an addtional copy between vringh queue
> >>>> and virtqueue,
> >>>
> >>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
> >>>
> >>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
> >>> -> virtio pci (RC) <-> virtio rpmsg (RC)
> >> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?
> >
> >
> > Yes.
> >
> >
> >> And virtio
> >> ring(2) is created by virtio pci (RC).
> >
> >
> > Yes.
> >
> >
> >>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
> >>> and addr and them DMA to Linux(RC)?
> >> okay, I made some optimization here where vhost-rpmsg using a helper writes a
> >> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
> >> were it has to be first written to virtio ring (1).
> >>
> >> Thinking how this would look for NTB
> >> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
> >> ring(2) -> virtio-rpmsg (HOST2)
> >>
> >> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
> >
> >
> > Yes, I think so it needs to use vring to access virtio ring (1) as well.
>
> NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to
> use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.
> >
> >
> >>
> >> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?
> >
> >
> > Yes.
>
> okay, I haven't looked at this but the backend of virtio_blk should access an
> actual storage device no?
> >
> >
> >>
> >> I'd like to get clarity on two things in the approach you suggested, one is
> >> features (since epf vhost should ideally be transparent to any virtio driver)
> >
> >
> > We can have have an array of pre-defined features indexed by virtio device id
> > in the code.
> >
> >
> >> and the other is how certain inputs to virtio device such as number of buffers
> >> be determined.
> >
> >
> > We can start from hard coded the value like 256, or introduce some API for user
> > to change the value.
> >
> >
> >>
> >> Thanks again for your suggestions!
> >
> >
> > You're welcome.
> >
> > Note that I just want to check whether or not we can reuse the virtio
> > bus/driver. It's something similar to what you proposed in Software Layering
> > but we just replace "vhost core" with "virtio bus" and move the vhost core
> > below epf/ntb/platform transport.
>
> Got it. My initial design was based on my understanding of your comments [1].
>
> I'll try to create something based on your proposed design here.

Based on the above conversation it seems like I should wait for another revision
of this set before reviewing the RPMSG part. Please confirm that my
understanding is correct.

Thanks,
Mathieu

>
> Regards
> Kishon
>
> [1] ->
> https://lore.kernel.org/linux-pci/[email protected]/
> >
> > Thanks
> >
> >
> >>
> >> Regards
> >> Kishon
> >>
> >>>
> >>>> in some cases adds latency because of forwarding interrupts
> >>>> between vhost and virtio driver, vhost drivers providing features (which means
> >>>> it has to be aware of which virtio driver will be connected).
> >>>> virtio drivers (front end) generally access the buffers from it's local memory
> >>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or
> >>>> userspace.
> >>>>> Does this make sense?
> >>>> Two copies in my opinion is an issue but lets get others opinions as well.
> >>>
> >>> Sure.
> >>>
> >>>
> >>>> Thanks for your suggestions!
> >>>
> >>> You're welcome.
> >>>
> >>> Thanks
> >>>
> >>>
> >>>> Regards
> >>>> Kishon
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>> Thanks
> >>>>>> Kishon
> >>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>
> >>>>>>>>> Kishon Vijay Abraham I (22):
> >>>>>>>>>       vhost: Make _feature_ bits a property of vhost device
> >>>>>>>>>       vhost: Introduce standard Linux driver model in VHOST
> >>>>>>>>>       vhost: Add ops for the VHOST driver to configure VHOST device
> >>>>>>>>>       vringh: Add helpers to access vring in MMIO
> >>>>>>>>>       vhost: Add MMIO helpers for operations on vhost virtqueue
> >>>>>>>>>       vhost: Introduce configfs entry for configuring VHOST
> >>>>>>>>>       virtio_pci: Use request_threaded_irq() instead of request_irq()
> >>>>>>>>>       rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
> >>>>>>>>>         reading messages
> >>>>>>>>>       rpmsg: Introduce configfs entry for configuring rpmsg
> >>>>>>>>>       rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
> >>>>>>>>>       rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
> >>>>>>>>>         rpmsg_internal.h
> >>>>>>>>>       virtio: Add ops to allocate and free buffer
> >>>>>>>>>       rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
> >>>>>>>>>         virtio_free_buffer()
> >>>>>>>>>       rpmsg: Add VHOST based remote processor messaging bus
> >>>>>>>>>       samples/rpmsg: Setup delayed work to send message
> >>>>>>>>>       samples/rpmsg: Wait for address to be bound to rpdev for sending
> >>>>>>>>>         message
> >>>>>>>>>       rpmsg.txt: Add Documentation to configure rpmsg using configfs
> >>>>>>>>>       virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe
> >>>>>>>>> Endpoint
> >>>>>>>>>         device
> >>>>>>>>>       PCI: endpoint: Add EP function driver to provide VHOST interface
> >>>>>>>>>       NTB: Add a new NTB client driver to implement VIRTIO functionality
> >>>>>>>>>       NTB: Add a new NTB client driver to implement VHOST functionality
> >>>>>>>>>       NTB: Describe the ntb_virtio and ntb_vhost client in the
> >>>>>>>>> documentation
> >>>>>>>>>
> >>>>>>>>>      Documentation/driver-api/ntb.rst              |   11 +
> >>>>>>>>>      Documentation/rpmsg.txt                       |   56 +
> >>>>>>>>>      drivers/ntb/Kconfig                           |   18 +
> >>>>>>>>>      drivers/ntb/Makefile                          |    2 +
> >>>>>>>>>      drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
> >>>>>>>>>      drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
> >>>>>>>>>      drivers/ntb/ntb_virtio.h                      |   56 +
> >>>>>>>>>      drivers/pci/endpoint/functions/Kconfig        |   11 +
> >>>>>>>>>      drivers/pci/endpoint/functions/Makefile       |    1 +
> >>>>>>>>>      .../pci/endpoint/functions/pci-epf-vhost.c    | 1144
> >>>>>>>>> ++++++++++++++++
> >>>>>>>>>      drivers/rpmsg/Kconfig                         |   10 +
> >>>>>>>>>      drivers/rpmsg/Makefile                        |    3 +-
> >>>>>>>>>      drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
> >>>>>>>>>      drivers/rpmsg/rpmsg_core.c                    |    7 +
> >>>>>>>>>      drivers/rpmsg/rpmsg_internal.h                |  136 ++
> >>>>>>>>>      drivers/rpmsg/vhost_rpmsg_bus.c               | 1151
> >>>>>>>>> +++++++++++++++++
> >>>>>>>>>      drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
> >>>>>>>>>      drivers/vhost/Kconfig                         |    1 +
> >>>>>>>>>      drivers/vhost/Makefile                        |    2 +-
> >>>>>>>>>      drivers/vhost/net.c                           |   10 +-
> >>>>>>>>>      drivers/vhost/scsi.c                          |   24 +-
> >>>>>>>>>      drivers/vhost/test.c                          |   17 +-
> >>>>>>>>>      drivers/vhost/vdpa.c                          |    2 +-
> >>>>>>>>>      drivers/vhost/vhost.c                         |  730 ++++++++++-
> >>>>>>>>>      drivers/vhost/vhost_cfs.c                     |  341 +++++
> >>>>>>>>>      drivers/vhost/vringh.c                        |  332 +++++
> >>>>>>>>>      drivers/vhost/vsock.c                         |   20 +-
> >>>>>>>>>      drivers/virtio/Kconfig                        |    9 +
> >>>>>>>>>      drivers/virtio/Makefile                       |    1 +
> >>>>>>>>>      drivers/virtio/virtio_pci_common.c            |   25 +-
> >>>>>>>>>      drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
> >>>>>>>>>      include/linux/mod_devicetable.h               |    6 +
> >>>>>>>>>      include/linux/rpmsg.h                         |    6 +
> >>>>>>>>>      {drivers/vhost => include/linux}/vhost.h      |  132 +-
> >>>>>>>>>      include/linux/virtio.h                        |    3 +
> >>>>>>>>>      include/linux/virtio_config.h                 |   42 +
> >>>>>>>>>      include/linux/vringh.h                        |   46 +
> >>>>>>>>>      samples/rpmsg/rpmsg_client_sample.c           |   32 +-
> >>>>>>>>>      tools/virtio/virtio_test.c                    |    2 +-
> >>>>>>>>>      39 files changed, 7083 insertions(+), 183 deletions(-)
> >>>>>>>>>      create mode 100644 drivers/ntb/ntb_vhost.c
> >>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.c
> >>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.h
> >>>>>>>>>      create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
> >>>>>>>>>      create mode 100644 drivers/rpmsg/rpmsg_cfs.c
> >>>>>>>>>      create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
> >>>>>>>>>      create mode 100644 drivers/vhost/vhost_cfs.c
> >>>>>>>>>      create mode 100644 drivers/virtio/virtio_pci_epf.c
> >>>>>>>>>      rename {drivers/vhost => include/linux}/vhost.h (66%)
> >>>>>>>>>
> >>>>>>>>> -- 
> >>>>>>>>> 2.17.1
> >>>>>>>>>
> >

2020-08-28 10:39:00

by Cornelia Huck

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

On Thu, 9 Jul 2020 14:26:53 +0800
Jason Wang <[email protected]> wrote:

[Let me note right at the beginning that I first noted this while
listening to Kishon's talk at LPC on Wednesday. I might be very
confused about the background here, so let me apologize beforehand for
any confusion I might spread.]

> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
> > Hi Jason,
> >
> > On 7/8/2020 4:52 PM, Jason Wang wrote:
> >> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
> >>> Hi Jason,
> >>>
> >>> On 7/7/2020 3:17 PM, Jason Wang wrote:
> >>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
> >>>>> Hi Jason,
> >>>>>
> >>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
> >>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
> >>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
> >>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
> >>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
> >>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>>
> >>>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
> >>>>>>>>>>        rpmsg communication between two SoCs connected to each other
> >>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
> >>>>>>>>>>        between two SoCs connected via NTB
> >>>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>>
> >>>>>>>>>> UseCase1 :
> >>>>>>>>>>
> >>>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
> >>>>>>>>>>           +                               +
> >>>>>>>>>>           |                               |
> >>>>>>>>>>           |                               |
> >>>>>>>>>>           |                               |
> >>>>>>>>>>           |                               |
> >>>>>>>>>> +-----v------+                 +------v-------+
> >>>>>>>>>> |   Linux    |                 |     Linux    |
> >>>>>>>>>> |  Endpoint  |                 | Root Complex |
> >>>>>>>>>> |            <----------------->              |
> >>>>>>>>>> |            |                 |              |
> >>>>>>>>>> |    SOC1    |                 |     SOC2     |
> >>>>>>>>>> +------------+                 +--------------+
> >>>>>>>>>>
> >>>>>>>>>> UseCase 2:
> >>>>>>>>>>
> >>>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
> >>>>>>>>>>               +                                                 +
> >>>>>>>>>>               |                                                 |
> >>>>>>>>>>               |                                                 |
> >>>>>>>>>>               |                                                 |
> >>>>>>>>>>               |                                                 |
> >>>>>>>>>>        +------v------+                                   +------v------+
> >>>>>>>>>>        |             |                                   |             |
> >>>>>>>>>>        |    HOST1    |                                   |    HOST2    |
> >>>>>>>>>>        |             |                                   |             |
> >>>>>>>>>>        +------^------+                                   +------^------+
> >>>>>>>>>>               |                                                 |
> >>>>>>>>>>               |                                                 |
> >>>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>>> |  +------v------+                                   +------v------+  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |     EP      |                                   |     EP      |  |
> >>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
> >>>>>>>>>> |  |             <----------------------------------->             |  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
> >>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
> >>>>>>>>>> |  +-------------+                                   +-------------+  |
> >>>>>>>>>> +---------------------------------------------------------------------+

First of all, to clarify the terminology:
Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
virtqueues + the exiting vhost interfaces?

> >>>>>>>>>>
> >>>>>>>>>> Software Layering:
> >>>>>>>>>>
> >>>>>>>>>> The high-level SW layering should look something like below. This series
> >>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
> >>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
> >>>>>>>>>> device, user) can use any of the vhost client driver.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
> >>>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
> >>>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
> >>>>>>>>>>                 |                 |              |              |
> >>>>>>>>>>                 |                 |              |              |
> >>>>>>>>>>                 |                 |              |              |
> >>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
> >>>>>>>>>> |                            VHOST CORE                                |
> >>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
> >>>>>>>>>>              |               |                    |                  |
> >>>>>>>>>>              |               |                    |                  |
> >>>>>>>>>>              |               |                    |                  |
> >>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
> >>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
> >>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+

So, the upper half is basically various functionality types, e.g. a net
device. What is the lower half, a hardware interface? Would it be
equivalent to e.g. a normal PCI device?

> >>>>>>>>>>
> >>>>>>>>>> This was initially proposed here [1]
> >>>>>>>>>>
> >>>>>>>>>> [1] ->
> >>>>>>>>>> https://lore.kernel.org/r/[email protected]
> >>>>>>>>> I find this very interesting. A huge patchset so will take a bit
> >>>>>>>>> to review, but I certainly plan to do that. Thanks!
> >>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
> >>>>>>> I've pushed the branch
> >>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
> >>>>>>>> work is
> >>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
> >>>>>>> This is about connecting two different HW systems both running Linux and
> >>>>>>> doesn't necessarily involve virtualization.
> >>>>>> Right, this is something similar to VOP
> >>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
> >>>>>> hardware I guess and VOP use userspace application to implement the device.
> >>>>> I'd also like to point out, this series tries to have communication between
> >>>>> two
> >>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
> >>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
> >>>>> any
> >>>>> of the HW in NTB below should be able to use a virtio-vhost communication
> >>>>>
> >>>>> #ls drivers/ntb/hw/
> >>>>> amd  epf  idt  intel  mscc
> >>>>>
> >>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
> >>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
> >>>>> use virtio-vhost communication
> >>>>>
> >>>>> # ls drivers/pci/controller/dwc/*ep*
> >>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
> >>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> >>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>>> Thanks for those backgrounds.
> >>>>
> >>>>
> >>>>>>>     So there is no guest or host as in
> >>>>>>> virtualization but two entirely different systems connected via PCIe cable,
> >>>>>>> one
> >>>>>>> acting as guest and one as host. So one system will provide virtio
> >>>>>>> functionality reserving memory for virtqueues and the other provides vhost
> >>>>>>> functionality providing a way to access the virtqueues in virtio memory.
> >>>>>>> One is
> >>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
> >>>>>>> probably intermediate entity in virtualization?)
> >>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
> >>>>>> me since it was use for implementing virtio backend for userspace drivers. I
> >>>>>> guess "vringh" could be better.
> >>>>> Initially I had named this vringh but later decided to choose vhost instead of
> >>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
> >>>>> now resides in an entirely different system. Whatever virtio is for a frontend
> >>>>> system, vhost can be that for a backend system. vring can be for accessing
> >>>>> virtqueue and can be used either in frontend or backend.

I guess that clears up at least some of my questions from above...

> >>>> Ok.
> >>>>
> >>>>
> >>>>>>>> Have you considered to implement these through vDPA?
> >>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
> >>>>>>> driver
> >>>>>>> or vhost net driver is not provided.
> >>>>>>>
> >>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
> >>>>>>> (usecase2 above),
> >>>>>> I see.
> >>>>>>
> >>>>>>
> >>>>>>>     all the boards run Linux. The middle board provides NTB
> >>>>>>> functionality and board on either side provides virtio/vhost
> >>>>>>> functionality and
> >>>>>>> transfer data using rpmsg.

This setup looks really interesting (sometimes, it's really hard to
imagine this in the abstract.)

> >>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
> >>>>>> the existed virtio-bus/drivers? It might work as, except for
> >>>>>> the epf transport, we can introduce a epf "vhost" transport
> >>>>>> driver.
> >>>>> IMHO we'll need two buses one for frontend and other for
> >>>>> backend because the two components can then co-operate/interact
> >>>>> with each other to provide a functionality. Though both will
> >>>>> seemingly provide similar callbacks, they are both provide
> >>>>> symmetrical or complimentary funcitonality and need not be same
> >>>>> or identical.
> >>>>>
> >>>>> Having the same bus can also create sequencing issues.
> >>>>>
> >>>>> If you look at virtio_dev_probe() of virtio_bus
> >>>>>
> >>>>> device_features = dev->config->get_features(dev);
> >>>>>
> >>>>> Now if we use same bus for both front-end and back-end, both
> >>>>> will try to get_features when there has been no set_features.
> >>>>> Ideally vhost device should be initialized first with the set
> >>>>> of features it supports. Vhost and virtio should use "status"
> >>>>> and "features" complimentarily and not identically.
> >>>> Yes, but there's no need for doing status/features passthrough
> >>>> in epf vhost drivers.b
> >>>>
> >>>>
> >>>>> virtio device (or frontend) cannot be initialized before vhost
> >>>>> device (or backend) gets initialized with data such as
> >>>>> features. Similarly vhost (backend)
> >>>>> cannot access virqueues or buffers before virtio (frontend) sets
> >>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
> >>>>> for virtio as the physical memory for virtqueues are created by
> >>>>> virtio (frontend).
> >>>> epf vhost drivers need to implement two devices: vhost(vringh)
> >>>> device and virtio device (which is a mediated device). The
> >>>> vhost(vringh) device is doing feature negotiation with the
> >>>> virtio device via RC/EP or NTB. The virtio device is doing
> >>>> feature negotiation with local virtio drivers. If there're
> >>>> feature mismatch, epf vhost drivers and do mediation between
> >>>> them.
> >>> Here epf vhost should be initialized with a set of features for
> >>> it to negotiate either as vhost device or virtio device no? Where
> >>> should the initial feature set for epf vhost come from?
> >>
> >> I think it can work as:
> >>
> >> 1) Having an initial features (hard coded in the code) set X in
> >> epf vhost 2) Using this X for both virtio device and vhost(vringh)
> >> device 3) local virtio driver will negotiate with virtio device
> >> with feature set Y 4) remote virtio driver will negotiate with
> >> vringh device with feature set Z 5) mediate between feature Y and
> >> feature Z since both Y and Z are a subset of X
> >>
> >>
> > okay. I'm also thinking if we could have configfs for configuring
> > this. Anyways we could find different approaches of configuring
> > this.
>
>
> Yes, and I think some management API is needed even in the design of
> your "Software Layering". In that figure, rpmsg vhost need some
> pre-set or hard-coded features.

When I saw the plumbers talk, my first idea was "this needs to be a new
transport". You have some hard-coded or pre-configured features, and
then features are negotiated via a transport-specific means in the
usual way. There's basically an extra/extended layer for this (and
status, and whatever).

Does that make any sense?

>
>
> >>>>>> It will have virtqueues but only used for the communication
> >>>>>> between itself and
> >>>>>> uppter virtio driver. And it will have vringh queues which
> >>>>>> will be probe by virtio epf transport drivers. And it needs to
> >>>>>> do datacopy between virtqueue and
> >>>>>> vringh queues.
> >>>>>>
> >>>>>> It works like:
> >>>>>>
> >>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
> >>>>>> vringh queue/epf>
> >>>>>>
> >>>>>> The advantages is that there's no need for writing new buses
> >>>>>> and drivers.
> >>>>> I think this will work however there is an addtional copy
> >>>>> between vringh queue and virtqueue,
> >>>> I think not? E.g in use case 1), if we stick to virtio bus, we
> >>>> will have:
> >>>>
> >>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
> >>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
> >>> IIUC epf vhost driver (EP) will access virtio ring(2) using
> >>> vringh?
> >>
> >> Yes.
> >>
> >>
> >>> And virtio
> >>> ring(2) is created by virtio pci (RC).
> >>
> >> Yes.
> >>
> >>
> >>>> What epf vhost driver did is to read from virtio ring(1) about
> >>>> the buffer len and addr and them DMA to Linux(RC)?
> >>> okay, I made some optimization here where vhost-rpmsg using a
> >>> helper writes a buffer from rpmsg's upper layer directly to
> >>> remote Linux (RC) as against here were it has to be first written
> >>> to virtio ring (1).
> >>>
> >>> Thinking how this would look for NTB
> >>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
> >>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
> >>>
> >>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
> >>
> >> Yes, I think so it needs to use vring to access virtio ring (1) as
> >> well.
> > NTB(HOST1) and virtio ring(1) will be in the same system. So it
> > doesn't have to use vring. virtio ring(1) is by the virtio device
> > the NTB(HOST1) creates.
>
>
> Right.
>
>
> >>
> >>> Do you also think this will work seamlessly with virtio_net.c,
> >>> virtio_blk.c?
> >>
> >> Yes.
> > okay, I haven't looked at this but the backend of virtio_blk should
> > access an actual storage device no?
>
>
> Good point, for non-peer device like storage. There's probably no
> need for it to be registered on the virtio bus and it might be better
> to behave as you proposed.

I might be missing something; but if you expose something as a block
device, it should have something it can access with block reads/writes,
shouldn't it? Of course, that can be a variety of things.

>
> Just to make sure I understand the design, how is VHOST SCSI expected
> to work in your proposal, does it have a device for file as a backend?
>
>
> >>
> >>> I'd like to get clarity on two things in the approach you
> >>> suggested, one is features (since epf vhost should ideally be
> >>> transparent to any virtio driver)
> >>
> >> We can have have an array of pre-defined features indexed by
> >> virtio device id in the code.
> >>
> >>
> >>> and the other is how certain inputs to virtio device such as
> >>> number of buffers be determined.
> >>
> >> We can start from hard coded the value like 256, or introduce some
> >> API for user to change the value.
> >>
> >>
> >>> Thanks again for your suggestions!
> >>
> >> You're welcome.
> >>
> >> Note that I just want to check whether or not we can reuse the
> >> virtio bus/driver. It's something similar to what you proposed in
> >> Software Layering but we just replace "vhost core" with "virtio
> >> bus" and move the vhost core below epf/ntb/platform transport.
> > Got it. My initial design was based on my understanding of your
> > comments [1].
>
>
> Yes, but that's just for a networking device. If we want something
> more generic, it may require more thought (bus etc).

I believe that we indeed need something bus-like to be able to support
a variety of devices.

>
>
> >
> > I'll try to create something based on your proposed design here.
>
>
> Sure, but for coding, we'd better wait for other's opinion here.

Please tell me if my thoughts above make any sense... I have just
started looking at that, so I might be completely off.

2020-09-01 04:45:21

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Mathieu,

On 15/07/20 10:45 pm, Mathieu Poirier wrote:
> Hey Kishon,
>
> On Wed, Jul 08, 2020 at 06:43:45PM +0530, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 7/8/2020 4:52 PM, Jason Wang wrote:
>>>
>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>>> Hi Jason,
>>>>
>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>>
>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>>>>        rpmsg communication between two SoCs connected to each other
>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>>>>        between two SoCs connected via NTB
>>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>>
>>>>>>>>>>> UseCase1 :
>>>>>>>>>>>
>>>>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>>>>           +                               +
>>>>>>>>>>>           |                               |
>>>>>>>>>>>           |                               |
>>>>>>>>>>>           |                               |
>>>>>>>>>>>           |                               |
>>>>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>>> |            <----------------->              |
>>>>>>>>>>> |            |                 |              |
>>>>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>>>>> +------------+                 +--------------+
>>>>>>>>>>>
>>>>>>>>>>> UseCase 2:
>>>>>>>>>>>
>>>>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>>>>               +                                                 +
>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>        +------v------+                                   +------v------+
>>>>>>>>>>>        |             |                                   |             |
>>>>>>>>>>>        |    HOST1    |                                   |    HOST2    |
>>>>>>>>>>>        |             |                                   |             |
>>>>>>>>>>>        +------^------+                                   +------^------+
>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>               |                                                 |
>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>
>>>>>>>>>>> Software Layering:
>>>>>>>>>>>
>>>>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>>> |                            VHOST CORE                                |
>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>>>>>
>>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>>
>>>>>>>>>>> [1] ->
>>>>>>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>>>>> I've pushed the branch
>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>>>>> work is
>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>>> This is about connecting two different HW systems both running Linux and
>>>>>>>> doesn't necessarily involve virtualization.
>>>>>>> Right, this is something similar to VOP
>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>>>>> hardware I guess and VOP use userspace application to implement the device.
>>>>>> I'd also like to point out, this series tries to have communication between
>>>>>> two
>>>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
>>>>>> any
>>>>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>>>>
>>>>>> #ls drivers/ntb/hw/
>>>>>> amd  epf  idt  intel  mscc
>>>>>>
>>>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>>>>> use virtio-vhost communication
>>>>>>
>>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>>>
>>>>> Thanks for those backgrounds.
>>>>>
>>>>>
>>>>>>>>     So there is no guest or host as in
>>>>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>>>>> one
>>>>>>>> acting as guest and one as host. So one system will provide virtio
>>>>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>>>>> One is
>>>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>>>>> probably intermediate entity in virtualization?)
>>>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>>>>> guess "vringh" could be better.
>>>>>> Initially I had named this vringh but later decided to choose vhost instead of
>>>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>>>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>>>>> system, vhost can be that for a backend system. vring can be for accessing
>>>>>> virtqueue and can be used either in frontend or backend.
>>>>>
>>>>> Ok.
>>>>>
>>>>>
>>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>>>>> driver
>>>>>>>> or vhost net driver is not provided.
>>>>>>>>
>>>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>>>>> (usecase2 above),
>>>>>>> I see.
>>>>>>>
>>>>>>>
>>>>>>>>     all the boards run Linux. The middle board provides NTB
>>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>>> functionality and
>>>>>>>> transfer data using rpmsg.
>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>>>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>>>>>> introduce a epf "vhost" transport driver.
>>>>>> IMHO we'll need two buses one for frontend and other for backend because the
>>>>>> two components can then co-operate/interact with each other to provide a
>>>>>> functionality. Though both will seemingly provide similar callbacks, they are
>>>>>> both provide symmetrical or complimentary funcitonality and need not be
>>>>>> same or
>>>>>> identical.
>>>>>>
>>>>>> Having the same bus can also create sequencing issues.
>>>>>>
>>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>>
>>>>>> device_features = dev->config->get_features(dev);
>>>>>>
>>>>>> Now if we use same bus for both front-end and back-end, both will try to
>>>>>> get_features when there has been no set_features. Ideally vhost device should
>>>>>> be initialized first with the set of features it supports. Vhost and virtio
>>>>>> should use "status" and "features" complimentarily and not identically.
>>>>>
>>>>> Yes, but there's no need for doing status/features passthrough in epf vhost
>>>>> drivers.b
>>>>>
>>>>>
>>>>>> virtio device (or frontend) cannot be initialized before vhost device (or
>>>>>> backend) gets initialized with data such as features. Similarly vhost
>>>>>> (backend)
>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>>>>>> the physical memory for virtqueues are created by virtio (frontend).
>>>>>
>>>>> epf vhost drivers need to implement two devices: vhost(vringh) device and
>>>>> virtio device (which is a mediated device). The vhost(vringh) device is doing
>>>>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
>>>>> is doing feature negotiation with local virtio drivers. If there're feature
>>>>> mismatch, epf vhost drivers and do mediation between them.
>>>> Here epf vhost should be initialized with a set of features for it to negotiate
>>>> either as vhost device or virtio device no? Where should the initial feature
>>>> set for epf vhost come from?
>>>
>>>
>>> I think it can work as:
>>>
>>> 1) Having an initial features (hard coded in the code) set X in epf vhost
>>> 2) Using this X for both virtio device and vhost(vringh) device
>>> 3) local virtio driver will negotiate with virtio device with feature set Y
>>> 4) remote virtio driver will negotiate with vringh device with feature set Z
>>> 5) mediate between feature Y and feature Z since both Y and Z are a subset of X
>>>
>>>
>>
>> okay. I'm also thinking if we could have configfs for configuring this. Anyways
>> we could find different approaches of configuring this.
>>>>>
>>>>>>> It will have virtqueues but only used for the communication between itself
>>>>>>> and
>>>>>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>>>>>> virtio epf transport drivers. And it needs to do datacopy between
>>>>>>> virtqueue and
>>>>>>> vringh queues.
>>>>>>>
>>>>>>> It works like:
>>>>>>>
>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>>>>>> queue/epf>
>>>>>>>
>>>>>>> The advantages is that there's no need for writing new buses and drivers.
>>>>>> I think this will work however there is an addtional copy between vringh queue
>>>>>> and virtqueue,
>>>>>
>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>>>>>
>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
>>>>> -> virtio pci (RC) <-> virtio rpmsg (RC)
>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?
>>>
>>>
>>> Yes.
>>>
>>>
>>>> And virtio
>>>> ring(2) is created by virtio pci (RC).
>>>
>>>
>>> Yes.
>>>
>>>
>>>>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
>>>>> and addr and them DMA to Linux(RC)?
>>>> okay, I made some optimization here where vhost-rpmsg using a helper writes a
>>>> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
>>>> were it has to be first written to virtio ring (1).
>>>>
>>>> Thinking how this would look for NTB
>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
>>>> ring(2) -> virtio-rpmsg (HOST2)
>>>>
>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>>
>>>
>>> Yes, I think so it needs to use vring to access virtio ring (1) as well.
>>
>> NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to
>> use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.
>>>
>>>
>>>>
>>>> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?
>>>
>>>
>>> Yes.
>>
>> okay, I haven't looked at this but the backend of virtio_blk should access an
>> actual storage device no?
>>>
>>>
>>>>
>>>> I'd like to get clarity on two things in the approach you suggested, one is
>>>> features (since epf vhost should ideally be transparent to any virtio driver)
>>>
>>>
>>> We can have have an array of pre-defined features indexed by virtio device id
>>> in the code.
>>>
>>>
>>>> and the other is how certain inputs to virtio device such as number of buffers
>>>> be determined.
>>>
>>>
>>> We can start from hard coded the value like 256, or introduce some API for user
>>> to change the value.
>>>
>>>
>>>>
>>>> Thanks again for your suggestions!
>>>
>>>
>>> You're welcome.
>>>
>>> Note that I just want to check whether or not we can reuse the virtio
>>> bus/driver. It's something similar to what you proposed in Software Layering
>>> but we just replace "vhost core" with "virtio bus" and move the vhost core
>>> below epf/ntb/platform transport.
>>
>> Got it. My initial design was based on my understanding of your comments [1].
>>
>> I'll try to create something based on your proposed design here.
>
> Based on the above conversation it seems like I should wait for another revision
> of this set before reviewing the RPMSG part. Please confirm that my
> understanding is correct.

Right, there are multiple parts in this series that has to be aligned.
I'd still think irrespective of the approach something like Address
Service Notification support might have to be supported by rpmsg.

Thanks
Kishon
>
> Thanks,
> Mathieu
>
>>
>> Regards
>> Kishon
>>
>> [1] ->
>> https://lore.kernel.org/linux-pci/[email protected]/
>>>
>>> Thanks
>>>
>>>
>>>>
>>>> Regards
>>>> Kishon
>>>>
>>>>>
>>>>>> in some cases adds latency because of forwarding interrupts
>>>>>> between vhost and virtio driver, vhost drivers providing features (which means
>>>>>> it has to be aware of which virtio driver will be connected).
>>>>>> virtio drivers (front end) generally access the buffers from it's local memory
>>>>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or
>>>>>> userspace.
>>>>>>> Does this make sense?
>>>>>> Two copies in my opinion is an issue but lets get others opinions as well.
>>>>>
>>>>> Sure.
>>>>>
>>>>>
>>>>>> Thanks for your suggestions!
>>>>>
>>>>> You're welcome.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> Regards
>>>>>> Kishon
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Kishon
>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>>>>>       vhost: Make _feature_ bits a property of vhost device
>>>>>>>>>>>       vhost: Introduce standard Linux driver model in VHOST
>>>>>>>>>>>       vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>>>>>       vringh: Add helpers to access vring in MMIO
>>>>>>>>>>>       vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>>>>>       vhost: Introduce configfs entry for configuring VHOST
>>>>>>>>>>>       virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>>>>>         reading messages
>>>>>>>>>>>       rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>>>>>         rpmsg_internal.h
>>>>>>>>>>>       virtio: Add ops to allocate and free buffer
>>>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>>>>>         virtio_free_buffer()
>>>>>>>>>>>       rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>>>>>       samples/rpmsg: Setup delayed work to send message
>>>>>>>>>>>       samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>>>>>         message
>>>>>>>>>>>       rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>>>>>       virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe
>>>>>>>>>>> Endpoint
>>>>>>>>>>>         device
>>>>>>>>>>>       PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>>>>>       NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>>>>>       NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>>>>>       NTB: Describe the ntb_virtio and ntb_vhost client in the
>>>>>>>>>>> documentation
>>>>>>>>>>>
>>>>>>>>>>>      Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>>>>>      Documentation/rpmsg.txt                       |   56 +
>>>>>>>>>>>      drivers/ntb/Kconfig                           |   18 +
>>>>>>>>>>>      drivers/ntb/Makefile                          |    2 +
>>>>>>>>>>>      drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>>>>>      drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>>>>>      drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>>>>>      drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>>>>>      drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>>>>>      .../pci/endpoint/functions/pci-epf-vhost.c    | 1144
>>>>>>>>>>> ++++++++++++++++
>>>>>>>>>>>      drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>>>>>      drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>>>>>      drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>>>>>      drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>>>>>      drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>>>>>      drivers/rpmsg/vhost_rpmsg_bus.c               | 1151
>>>>>>>>>>> +++++++++++++++++
>>>>>>>>>>>      drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>>>>>      drivers/vhost/Kconfig                         |    1 +
>>>>>>>>>>>      drivers/vhost/Makefile                        |    2 +-
>>>>>>>>>>>      drivers/vhost/net.c                           |   10 +-
>>>>>>>>>>>      drivers/vhost/scsi.c                          |   24 +-
>>>>>>>>>>>      drivers/vhost/test.c                          |   17 +-
>>>>>>>>>>>      drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>>>>>      drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>>>>>      drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>>>>>      drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>>>>>      drivers/vhost/vsock.c                         |   20 +-
>>>>>>>>>>>      drivers/virtio/Kconfig                        |    9 +
>>>>>>>>>>>      drivers/virtio/Makefile                       |    1 +
>>>>>>>>>>>      drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>>>>>      drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>>>>>      include/linux/mod_devicetable.h               |    6 +
>>>>>>>>>>>      include/linux/rpmsg.h                         |    6 +
>>>>>>>>>>>      {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>>>>>      include/linux/virtio.h                        |    3 +
>>>>>>>>>>>      include/linux/virtio_config.h                 |   42 +
>>>>>>>>>>>      include/linux/vringh.h                        |   46 +
>>>>>>>>>>>      samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>>>>>      tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>>>>>      39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>>>>>      create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>>>>>      create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>>>>>      create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>>>>>      create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>>>>>      create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>>>>>      create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>>>>>      rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> 2.17.1
>>>>>>>>>>>
>>>

2020-09-01 05:26:13

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi,

On 28/08/20 4:04 pm, Cornelia Huck wrote:
> On Thu, 9 Jul 2020 14:26:53 +0800
> Jason Wang <[email protected]> wrote:
>
> [Let me note right at the beginning that I first noted this while
> listening to Kishon's talk at LPC on Wednesday. I might be very
> confused about the background here, so let me apologize beforehand for
> any confusion I might spread.]
>
>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 7/8/2020 4:52 PM, Jason Wang wrote:
>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>>>> Hi Jason,
>>>>>>>>>
>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>>>>>        rpmsg communication between two SoCs connected to each other
>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>>>>>        between two SoCs connected via NTB
>>>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>>>
>>>>>>>>>>>> UseCase1 :
>>>>>>>>>>>>
>>>>>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>>>>>           +                               +
>>>>>>>>>>>>           |                               |
>>>>>>>>>>>>           |                               |
>>>>>>>>>>>>           |                               |
>>>>>>>>>>>>           |                               |
>>>>>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>>>> |            <----------------->              |
>>>>>>>>>>>> |            |                 |              |
>>>>>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>>>>>> +------------+                 +--------------+
>>>>>>>>>>>>
>>>>>>>>>>>> UseCase 2:
>>>>>>>>>>>>
>>>>>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>>>>>               +                                                 +
>>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>>        +------v------+                                   +------v------+
>>>>>>>>>>>>        |             |                                   |             |
>>>>>>>>>>>>        |    HOST1    |                                   |    HOST2    |
>>>>>>>>>>>>        |             |                                   |             |
>>>>>>>>>>>>        +------^------+                                   +------^------+
>>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>>               |                                                 |
>>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>>>>>> +---------------------------------------------------------------------+
>
> First of all, to clarify the terminology:
> Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just

Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.
> virtqueues + the exiting vhost interfaces?

It's implemented to provide the full 'device' functionality.
>
>>>>>>>>>>>>
>>>>>>>>>>>> Software Layering:
>>>>>>>>>>>>
>>>>>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>>>>                 |                 |              |              |
>>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>>>> |                            VHOST CORE                                |
>>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>>>>              |               |                    |                  |
>>>>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>
> So, the upper half is basically various functionality types, e.g. a net
> device. What is the lower half, a hardware interface? Would it be
> equivalent to e.g. a normal PCI device?

Right, the upper half should provide the functionality.
The bottom layer could be a HW interface (like PCIe device or NTB
device) or it could be a SW interface (for accessing virtio ring in
userspace) that could be used by Hypervisor.

The top half should be transparent to what type of device is actually
using it.

>
>>>>>>>>>>>>
>>>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>>>
>>>>>>>>>>>> [1] ->
>>>>>>>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>>>>>> I've pushed the branch
>>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>>>>>> work is
>>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>>>> This is about connecting two different HW systems both running Linux and
>>>>>>>>> doesn't necessarily involve virtualization.
>>>>>>>> Right, this is something similar to VOP
>>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>>>>>> hardware I guess and VOP use userspace application to implement the device.
>>>>>>> I'd also like to point out, this series tries to have communication between
>>>>>>> two
>>>>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
>>>>>>> any
>>>>>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>>>>>
>>>>>>> #ls drivers/ntb/hw/
>>>>>>> amd  epf  idt  intel  mscc
>>>>>>>
>>>>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>>>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>>>>>> use virtio-vhost communication
>>>>>>>
>>>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>>>> Thanks for those backgrounds.
>>>>>>
>>>>>>
>>>>>>>>>     So there is no guest or host as in
>>>>>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>>>>>> one
>>>>>>>>> acting as guest and one as host. So one system will provide virtio
>>>>>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>>>>>> One is
>>>>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>>>>>> probably intermediate entity in virtualization?)
>>>>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>>>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>>>>>> guess "vringh" could be better.
>>>>>>> Initially I had named this vringh but later decided to choose vhost instead of
>>>>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>>>>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>>>>>> system, vhost can be that for a backend system. vring can be for accessing
>>>>>>> virtqueue and can be used either in frontend or backend.
>
> I guess that clears up at least some of my questions from above...
>
>>>>>> Ok.
>>>>>>
>>>>>>
>>>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>>>>>> driver
>>>>>>>>> or vhost net driver is not provided.
>>>>>>>>>
>>>>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>>>>>> (usecase2 above),
>>>>>>>> I see.
>>>>>>>>
>>>>>>>>
>>>>>>>>>     all the boards run Linux. The middle board provides NTB
>>>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>>>> functionality and
>>>>>>>>> transfer data using rpmsg.
>
> This setup looks really interesting (sometimes, it's really hard to
> imagine this in the abstract.)
>
>>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
>>>>>>>> the existed virtio-bus/drivers? It might work as, except for
>>>>>>>> the epf transport, we can introduce a epf "vhost" transport
>>>>>>>> driver.
>>>>>>> IMHO we'll need two buses one for frontend and other for
>>>>>>> backend because the two components can then co-operate/interact
>>>>>>> with each other to provide a functionality. Though both will
>>>>>>> seemingly provide similar callbacks, they are both provide
>>>>>>> symmetrical or complimentary funcitonality and need not be same
>>>>>>> or identical.
>>>>>>>
>>>>>>> Having the same bus can also create sequencing issues.
>>>>>>>
>>>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>>>
>>>>>>> device_features = dev->config->get_features(dev);
>>>>>>>
>>>>>>> Now if we use same bus for both front-end and back-end, both
>>>>>>> will try to get_features when there has been no set_features.
>>>>>>> Ideally vhost device should be initialized first with the set
>>>>>>> of features it supports. Vhost and virtio should use "status"
>>>>>>> and "features" complimentarily and not identically.
>>>>>> Yes, but there's no need for doing status/features passthrough
>>>>>> in epf vhost drivers.b
>>>>>>
>>>>>>
>>>>>>> virtio device (or frontend) cannot be initialized before vhost
>>>>>>> device (or backend) gets initialized with data such as
>>>>>>> features. Similarly vhost (backend)
>>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
>>>>>>> for virtio as the physical memory for virtqueues are created by
>>>>>>> virtio (frontend).
>>>>>> epf vhost drivers need to implement two devices: vhost(vringh)
>>>>>> device and virtio device (which is a mediated device). The
>>>>>> vhost(vringh) device is doing feature negotiation with the
>>>>>> virtio device via RC/EP or NTB. The virtio device is doing
>>>>>> feature negotiation with local virtio drivers. If there're
>>>>>> feature mismatch, epf vhost drivers and do mediation between
>>>>>> them.
>>>>> Here epf vhost should be initialized with a set of features for
>>>>> it to negotiate either as vhost device or virtio device no? Where
>>>>> should the initial feature set for epf vhost come from?
>>>>
>>>> I think it can work as:
>>>>
>>>> 1) Having an initial features (hard coded in the code) set X in
>>>> epf vhost 2) Using this X for both virtio device and vhost(vringh)
>>>> device 3) local virtio driver will negotiate with virtio device
>>>> with feature set Y 4) remote virtio driver will negotiate with
>>>> vringh device with feature set Z 5) mediate between feature Y and
>>>> feature Z since both Y and Z are a subset of X
>>>>
>>>>
>>> okay. I'm also thinking if we could have configfs for configuring
>>> this. Anyways we could find different approaches of configuring
>>> this.
>>
>>
>> Yes, and I think some management API is needed even in the design of
>> your "Software Layering". In that figure, rpmsg vhost need some
>> pre-set or hard-coded features.
>
> When I saw the plumbers talk, my first idea was "this needs to be a new
> transport". You have some hard-coded or pre-configured features, and
> then features are negotiated via a transport-specific means in the
> usual way. There's basically an extra/extended layer for this (and
> status, and whatever).

I think for PCIe root complex to PCIe endpoint communication it's still
"Virtio Over PCI Bus", though existing layout cannot be used in this
context (find virtio capability will fail for modern interface and
loading queue status immediately after writing queue number is not
possible for root complex to endpoint communication; setup_vq() in
virtio_pci_legacy.c).

"Virtio Over NTB" should anyways be a new transport.
>
> Does that make any sense?

yeah, in the approach I used the initial features are hard-coded in
vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
layer (vhost only for accessing virtio ring and use virtio drivers on
both front end and backend), based on the functionality (e.g, rpmsg),
the vhost should be configured with features (to be presented to the
virtio) and that's why additional layer or APIs will be required.
>
>>
>>
>>>>>>>> It will have virtqueues but only used for the communication
>>>>>>>> between itself and
>>>>>>>> uppter virtio driver. And it will have vringh queues which
>>>>>>>> will be probe by virtio epf transport drivers. And it needs to
>>>>>>>> do datacopy between virtqueue and
>>>>>>>> vringh queues.
>>>>>>>>
>>>>>>>> It works like:
>>>>>>>>
>>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
>>>>>>>> vringh queue/epf>
>>>>>>>>
>>>>>>>> The advantages is that there's no need for writing new buses
>>>>>>>> and drivers.
>>>>>>> I think this will work however there is an addtional copy
>>>>>>> between vringh queue and virtqueue,
>>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we
>>>>>> will have:
>>>>>>
>>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
>>>>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
>>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using
>>>>> vringh?
>>>>
>>>> Yes.
>>>>
>>>>
>>>>> And virtio
>>>>> ring(2) is created by virtio pci (RC).
>>>>
>>>> Yes.
>>>>
>>>>
>>>>>> What epf vhost driver did is to read from virtio ring(1) about
>>>>>> the buffer len and addr and them DMA to Linux(RC)?
>>>>> okay, I made some optimization here where vhost-rpmsg using a
>>>>> helper writes a buffer from rpmsg's upper layer directly to
>>>>> remote Linux (RC) as against here were it has to be first written
>>>>> to virtio ring (1).
>>>>>
>>>>> Thinking how this would look for NTB
>>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
>>>>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
>>>>>
>>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>>>
>>>> Yes, I think so it needs to use vring to access virtio ring (1) as
>>>> well.
>>> NTB(HOST1) and virtio ring(1) will be in the same system. So it
>>> doesn't have to use vring. virtio ring(1) is by the virtio device
>>> the NTB(HOST1) creates.
>>
>>
>> Right.
>>
>>
>>>>
>>>>> Do you also think this will work seamlessly with virtio_net.c,
>>>>> virtio_blk.c?
>>>>
>>>> Yes.
>>> okay, I haven't looked at this but the backend of virtio_blk should
>>> access an actual storage device no?
>>
>>
>> Good point, for non-peer device like storage. There's probably no
>> need for it to be registered on the virtio bus and it might be better
>> to behave as you proposed.
>
> I might be missing something; but if you expose something as a block
> device, it should have something it can access with block reads/writes,
> shouldn't it? Of course, that can be a variety of things.
>
>>
>> Just to make sure I understand the design, how is VHOST SCSI expected
>> to work in your proposal, does it have a device for file as a backend?
>>
>>
>>>>
>>>>> I'd like to get clarity on two things in the approach you
>>>>> suggested, one is features (since epf vhost should ideally be
>>>>> transparent to any virtio driver)
>>>>
>>>> We can have have an array of pre-defined features indexed by
>>>> virtio device id in the code.
>>>>
>>>>
>>>>> and the other is how certain inputs to virtio device such as
>>>>> number of buffers be determined.
>>>>
>>>> We can start from hard coded the value like 256, or introduce some
>>>> API for user to change the value.
>>>>
>>>>
>>>>> Thanks again for your suggestions!
>>>>
>>>> You're welcome.
>>>>
>>>> Note that I just want to check whether or not we can reuse the
>>>> virtio bus/driver. It's something similar to what you proposed in
>>>> Software Layering but we just replace "vhost core" with "virtio
>>>> bus" and move the vhost core below epf/ntb/platform transport.
>>> Got it. My initial design was based on my understanding of your
>>> comments [1].
>>
>>
>> Yes, but that's just for a networking device. If we want something
>> more generic, it may require more thought (bus etc).
>
> I believe that we indeed need something bus-like to be able to support
> a variety of devices.

I think we could still have adapter layers for different types of
devices ([1]) and use existing virtio bus for both front end and back
end. Using bus-like will however simplify adding support for new types
of devices and adding adapters for devices will be slightly more complex.

[1] -> Page 13 in
https://linuxplumbersconf.org/event/7/contributions/849/attachments/642/1175/Virtio_for_PCIe_RC_EP_NTB.pdf
>
>>
>>
>>>
>>> I'll try to create something based on your proposed design here.
>>
>>
>> Sure, but for coding, we'd better wait for other's opinion here.
>
> Please tell me if my thoughts above make any sense... I have just
> started looking at that, so I might be completely off.

I think your understanding is correct! Thanks for your inputs.

Thanks
Kishon

2020-09-01 08:54:16

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote:
> Hi,
>
> On 28/08/20 4:04 pm, Cornelia Huck wrote:
>> On Thu, 9 Jul 2020 14:26:53 +0800
>> Jason Wang <[email protected]> wrote:
>>
>> [Let me note right at the beginning that I first noted this while
>> listening to Kishon's talk at LPC on Wednesday. I might be very
>> confused about the background here, so let me apologize beforehand for
>> any confusion I might spread.]
>>
>>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
>>>> Hi Jason,
>>>>
>>>> On 7/8/2020 4:52 PM, Jason Wang wrote:
>>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>>>>> Hi Jason,
>>>>>>>>>>
>>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay
>>>>>>>>>>>> Abraham I wrote:
>>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>>>>> communication over MMIO. This series enables rpmsg
>>>>>>>>>>>>> communication between
>>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver
>>>>>>>>>>>>> (uses vhost) for
>>>>>>>>>>>>>          rpmsg communication between two SoCs connected to
>>>>>>>>>>>>> each other
>>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg
>>>>>>>>>>>>> communication
>>>>>>>>>>>>>          between two SoCs connected via NTB
>>>>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>>>>
>>>>>>>>>>>>> UseCase1 :
>>>>>>>>>>>>>
>>>>>>>>>>>>>        VHOST RPMSG VIRTIO RPMSG
>>>>>>>>>>>>> +                               +
>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>> +-----v------+ +------v-------+
>>>>>>>>>>>>> |   Linux    |                 | Linux    |
>>>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>>>>> | <----------------->              |
>>>>>>>>>>>>> |            | |              |
>>>>>>>>>>>>> |    SOC1    |                 | SOC2     |
>>>>>>>>>>>>> +------------+ +--------------+
>>>>>>>>>>>>>
>>>>>>>>>>>>> UseCase 2:
>>>>>>>>>>>>>
>>>>>>>>>>>>>            VHOST RPMSG VIRTIO RPMSG
>>>>>>>>>>>>> + +
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> +------v------+ +------v------+
>>>>>>>>>>>>>          | | |             |
>>>>>>>>>>>>>          |    HOST1 |                                   |
>>>>>>>>>>>>> HOST2    |
>>>>>>>>>>>>>          | | |             |
>>>>>>>>>>>>> +------^------+ +------^------+
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>>>
>>>>>>>>>>>>> | +------v------+ +------v------+  |
>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>> |  |     EP |                                   | EP     
>>>>>>>>>>>>> |  |
>>>>>>>>>>>>> |  | CONTROLLER1 |                                   |
>>>>>>>>>>>>> CONTROLLER2 |  |
>>>>>>>>>>>>> |  | <-----------------------------------> |  |
>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances  
>>>>>>>>>>>>> |             |  |
>>>>>>>>>>>>> |  |             |  (Configured using NTB Function) 
>>>>>>>>>>>>> |             |  |
>>>>>>>>>>>>> | +-------------+ +-------------+  |
>>>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>>>
>>
>> First of all, to clarify the terminology:
>> Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
>> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
>
> Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.
>> virtqueues + the exiting vhost interfaces?
>
> It's implemented to provide the full 'device' functionality.
>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Software Layering:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The high-level SW layering should look something like
>>>>>>>>>>>>> below. This series
>>>>>>>>>>>>> adds support only for RPMSG VHOST, however something
>>>>>>>>>>>>> similar should be
>>>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI,
>>>>>>>>>>>>> NTB, Platform
>>>>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>           +----------------+ +-----------+  +------------+
>>>>>>>>>>>>> +----------+
>>>>>>>>>>>>>           |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST
>>>>>>>>>>>>> |  |    X     |
>>>>>>>>>>>>>           +-------^--------+ +-----^-----+  +-----^------+
>>>>>>>>>>>>> +----^-----+
>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>>>>>
>>>>>>>>>>>>> |                            VHOST
>>>>>>>>>>>>> CORE                                |
>>>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>>>>
>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>> +--------v-------+  +----v------+ +----------v----------+ 
>>>>>>>>>>>>> +----v-----+
>>>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST | |PLATFORM DEVICE VHOST| 
>>>>>>>>>>>>> |    X     |
>>>>>>>>>>>>> +----------------+  +-----------+ +---------------------+ 
>>>>>>>>>>>>> +----------+
>>
>> So, the upper half is basically various functionality types, e.g. a net
>> device. What is the lower half, a hardware interface? Would it be
>> equivalent to e.g. a normal PCI device?
>
> Right, the upper half should provide the functionality.
> The bottom layer could be a HW interface (like PCIe device or NTB
> device) or it could be a SW interface (for accessing virtio ring in
> userspace) that could be used by Hypervisor.
>
> The top half should be transparent to what type of device is actually
> using it.
>
>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] ->
>>>>>>>>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>>>>>>>>>
>>>>>>>>>>>> I find this very interesting. A huge patchset so will take
>>>>>>>>>>>> a bit
>>>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>>>>> Yes, it would be better if there's a git branch for us to
>>>>>>>>>>> have a look.
>>>>>>>>>> I've pushed the branch
>>>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel
>>>>>>>>>>> some of the
>>>>>>>>>>> work is
>>>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>>>>> This is about connecting two different HW systems both
>>>>>>>>>> running Linux and
>>>>>>>>>> doesn't necessarily involve virtualization.
>>>>>>>>> Right, this is something similar to VOP
>>>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The
>>>>>>>>> different is the
>>>>>>>>> hardware I guess and VOP use userspace application to
>>>>>>>>> implement the device.
>>>>>>>> I'd also like to point out, this series tries to have
>>>>>>>> communication between
>>>>>>>> two
>>>>>>>> SoCs in vendor agnostic way. Since this series solves for 2
>>>>>>>> usecases (PCIe
>>>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB
>>>>>>>> framework and
>>>>>>>> any
>>>>>>>> of the HW in NTB below should be able to use a virtio-vhost
>>>>>>>> communication
>>>>>>>>
>>>>>>>> #ls drivers/ntb/hw/
>>>>>>>> amd  epf  idt  intel  mscc
>>>>>>>>
>>>>>>>> And similarly for the PCIe RC<->EP communication, this adds a
>>>>>>>> generic endpoint
>>>>>>>> function driver and hence any SoC that supports configurable
>>>>>>>> PCIe endpoint can
>>>>>>>> use virtio-vhost communication
>>>>>>>>
>>>>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>>>>> Thanks for those backgrounds.
>>>>>>>
>>>>>>>>>>       So there is no guest or host as in
>>>>>>>>>> virtualization but two entirely different systems connected
>>>>>>>>>> via PCIe cable,
>>>>>>>>>> one
>>>>>>>>>> acting as guest and one as host. So one system will provide
>>>>>>>>>> virtio
>>>>>>>>>> functionality reserving memory for virtqueues and the other
>>>>>>>>>> provides vhost
>>>>>>>>>> functionality providing a way to access the virtqueues in
>>>>>>>>>> virtio memory.
>>>>>>>>>> One is
>>>>>>>>>> source and the other is sink and there is no intermediate
>>>>>>>>>> entity. (vhost was
>>>>>>>>>> probably intermediate entity in virtualization?)
>>>>>>>>> (Not a native English speaker) but "vhost" could introduce
>>>>>>>>> some confusion for
>>>>>>>>> me since it was use for implementing virtio backend for
>>>>>>>>> userspace drivers. I
>>>>>>>>> guess "vringh" could be better.
>>>>>>>> Initially I had named this vringh but later decided to choose
>>>>>>>> vhost instead of
>>>>>>>> vringh. vhost is still a virtio backend (not necessarily
>>>>>>>> userspace) though it
>>>>>>>> now resides in an entirely different system. Whatever virtio is
>>>>>>>> for a frontend
>>>>>>>> system, vhost can be that for a backend system. vring can be
>>>>>>>> for accessing
>>>>>>>> virtqueue and can be used either in frontend or backend.
>>
>> I guess that clears up at least some of my questions from above...
>>
>>>>>>> Ok.
>>>>>>>
>>>>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>>>>> IIUC vDPA only provides an interface to userspace and an
>>>>>>>>>> in-kernel rpmsg
>>>>>>>>>> driver
>>>>>>>>>> or vhost net driver is not provided.
>>>>>>>>>>
>>>>>>>>>> The HW connection looks something like
>>>>>>>>>> https://pasteboard.co/JfMVVHC.jpg
>>>>>>>>>> (usecase2 above),
>>>>>>>>> I see.
>>>>>>>>>
>>>>>>>>>>       all the boards run Linux. The middle board provides NTB
>>>>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>>>>> functionality and
>>>>>>>>>> transfer data using rpmsg.
>>
>> This setup looks really interesting (sometimes, it's really hard to
>> imagine this in the abstract.)
>>>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
>>>>>>>>> the existed virtio-bus/drivers? It might work as, except for
>>>>>>>>> the epf transport, we can introduce a epf "vhost" transport
>>>>>>>>> driver.
>>>>>>>> IMHO we'll need two buses one for frontend and other for
>>>>>>>> backend because the two components can then co-operate/interact
>>>>>>>> with each other to provide a functionality. Though both will
>>>>>>>> seemingly provide similar callbacks, they are both provide
>>>>>>>> symmetrical or complimentary funcitonality and need not be same
>>>>>>>> or identical.
>>>>>>>>
>>>>>>>> Having the same bus can also create sequencing issues.
>>>>>>>>
>>>>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>>>>
>>>>>>>> device_features = dev->config->get_features(dev);
>>>>>>>>
>>>>>>>> Now if we use same bus for both front-end and back-end, both
>>>>>>>> will try to get_features when there has been no set_features.
>>>>>>>> Ideally vhost device should be initialized first with the set
>>>>>>>> of features it supports. Vhost and virtio should use "status"
>>>>>>>> and "features" complimentarily and not identically.
>>>>>>> Yes, but there's no need for doing status/features passthrough
>>>>>>> in epf vhost drivers.b
>>>>>>>
>>>>>>>> virtio device (or frontend) cannot be initialized before vhost
>>>>>>>> device (or backend) gets initialized with data such as
>>>>>>>> features. Similarly vhost (backend)
>>>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
>>>>>>>> for virtio as the physical memory for virtqueues are created by
>>>>>>>> virtio (frontend).
>>>>>>> epf vhost drivers need to implement two devices: vhost(vringh)
>>>>>>> device and virtio device (which is a mediated device). The
>>>>>>> vhost(vringh) device is doing feature negotiation with the
>>>>>>> virtio device via RC/EP or NTB. The virtio device is doing
>>>>>>> feature negotiation with local virtio drivers. If there're
>>>>>>> feature mismatch, epf vhost drivers and do mediation between
>>>>>>> them.
>>>>>> Here epf vhost should be initialized with a set of features for
>>>>>> it to negotiate either as vhost device or virtio device no? Where
>>>>>> should the initial feature set for epf vhost come from?
>>>>>
>>>>> I think it can work as:
>>>>>
>>>>> 1) Having an initial features (hard coded in the code) set X in
>>>>> epf vhost 2) Using this X for both virtio device and vhost(vringh)
>>>>> device 3) local virtio driver will negotiate with virtio device
>>>>> with feature set Y 4) remote virtio driver will negotiate with
>>>>> vringh device with feature set Z 5) mediate between feature Y and
>>>>> feature Z since both Y and Z are a subset of X
>>>>>
>>>> okay. I'm also thinking if we could have configfs for configuring
>>>> this. Anyways we could find different approaches of configuring
>>>> this.
>>>
>>>
>>> Yes, and I think some management API is needed even in the design of
>>> your "Software Layering". In that figure, rpmsg vhost need some
>>> pre-set or hard-coded features.
>>
>> When I saw the plumbers talk, my first idea was "this needs to be a new
>> transport". You have some hard-coded or pre-configured features, and
>> then features are negotiated via a transport-specific means in the
>> usual way. There's basically an extra/extended layer for this (and
>> status, and whatever).
>
> I think for PCIe root complex to PCIe endpoint communication it's
> still "Virtio Over PCI Bus", though existing layout cannot be used in
> this context (find virtio capability will fail for modern interface
> and loading queue status immediately after writing queue number is not
> possible for root complex to endpoint communication; setup_vq() in
> virtio_pci_legacy.c).


Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).


>
> "Virtio Over NTB" should anyways be a new transport.
>>
>> Does that make any sense?
>
> yeah, in the approach I used the initial features are hard-coded in
> vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
> layer (vhost only for accessing virtio ring and use virtio drivers on
> both front end and backend), based on the functionality (e.g, rpmsg),
> the vhost should be configured with features (to be presented to the
> virtio) and that's why additional layer or APIs will be required.


A question here, if we go with vhost bus approach, does it mean the
virtio device can only be implemented in EP's userspace?

Thanks


>>
>>>
>>>
>>>>>>>>> It will have virtqueues but only used for the communication
>>>>>>>>> between itself and
>>>>>>>>> uppter virtio driver. And it will have vringh queues which
>>>>>>>>> will be probe by virtio epf transport drivers. And it needs to
>>>>>>>>> do datacopy between virtqueue and
>>>>>>>>> vringh queues.
>>>>>>>>>
>>>>>>>>> It works like:
>>>>>>>>>
>>>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
>>>>>>>>> vringh queue/epf>
>>>>>>>>>
>>>>>>>>> The advantages is that there's no need for writing new buses
>>>>>>>>> and drivers.
>>>>>>>> I think this will work however there is an addtional copy
>>>>>>>> between vringh queue and virtqueue,
>>>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we
>>>>>>> will have:
>>>>>>>
>>>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
>>>>>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
>>>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using
>>>>>> vringh?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> And virtio
>>>>>> ring(2) is created by virtio pci (RC).
>>>>>
>>>>> Yes.
>>>>>
>>>>>>> What epf vhost driver did is to read from virtio ring(1) about
>>>>>>> the buffer len and addr and them DMA to Linux(RC)?
>>>>>> okay, I made some optimization here where vhost-rpmsg using a
>>>>>> helper writes a buffer from rpmsg's upper layer directly to
>>>>>> remote Linux (RC) as against here were it has to be first written
>>>>>> to virtio ring (1).
>>>>>>
>>>>>> Thinking how this would look for NTB
>>>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
>>>>>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
>>>>>>
>>>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>>>>
>>>>> Yes, I think so it needs to use vring to access virtio ring (1) as
>>>>> well.
>>>> NTB(HOST1) and virtio ring(1) will be in the same system. So it
>>>> doesn't have to use vring. virtio ring(1) is by the virtio device
>>>> the NTB(HOST1) creates.
>>>
>>>
>>> Right.
>>>
>>>
>>>>>> Do you also think this will work seamlessly with virtio_net.c,
>>>>>> virtio_blk.c?
>>>>>
>>>>> Yes.
>>>> okay, I haven't looked at this but the backend of virtio_blk should
>>>> access an actual storage device no?
>>>
>>>
>>> Good point, for non-peer device like storage. There's probably no
>>> need for it to be registered on the virtio bus and it might be better
>>> to behave as you proposed.
>>
>> I might be missing something; but if you expose something as a block
>> device, it should have something it can access with block reads/writes,
>> shouldn't it? Of course, that can be a variety of things.
>>
>>>
>>> Just to make sure I understand the design, how is VHOST SCSI expected
>>> to work in your proposal, does it have a device for file as a backend?
>>>
>>>
>>>>>> I'd like to get clarity on two things in the approach you
>>>>>> suggested, one is features (since epf vhost should ideally be
>>>>>> transparent to any virtio driver)
>>>>>
>>>>> We can have have an array of pre-defined features indexed by
>>>>> virtio device id in the code.
>>>>>
>>>>>> and the other is how certain inputs to virtio device such as
>>>>>> number of buffers be determined.
>>>>>
>>>>> We can start from hard coded the value like 256, or introduce some
>>>>> API for user to change the value.
>>>>>
>>>>>> Thanks again for your suggestions!
>>>>>
>>>>> You're welcome.
>>>>>
>>>>> Note that I just want to check whether or not we can reuse the
>>>>> virtio bus/driver. It's something similar to what you proposed in
>>>>> Software Layering but we just replace "vhost core" with "virtio
>>>>> bus" and move the vhost core below epf/ntb/platform transport.
>>>> Got it. My initial design was based on my understanding of your
>>>> comments [1].
>>>
>>>
>>> Yes, but that's just for a networking device. If we want something
>>> more generic, it may require more thought (bus etc).
>>
>> I believe that we indeed need something bus-like to be able to support
>> a variety of devices.
>
> I think we could still have adapter layers for different types of
> devices ([1]) and use existing virtio bus for both front end and back
> end. Using bus-like will however simplify adding support for new types
> of devices and adding adapters for devices will be slightly more complex.
>
> [1] -> Page 13 in
> https://linuxplumbersconf.org/event/7/contributions/849/attachments/642/1175/Virtio_for_PCIe_RC_EP_NTB.pdf
>>
>>>
>>>
>>>>
>>>> I'll try to create something based on your proposed design here.
>>>
>>>
>>> Sure, but for coding, we'd better wait for other's opinion here.
>>
>> Please tell me if my thoughts above make any sense... I have just
>> started looking at that, so I might be completely off.
>
> I think your understanding is correct! Thanks for your inputs.
>
> Thanks
> Kishon

2020-09-08 16:43:21

by Cornelia Huck

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

On Tue, 1 Sep 2020 16:50:03 +0800
Jason Wang <[email protected]> wrote:

> On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote:
> > Hi,
> >
> > On 28/08/20 4:04 pm, Cornelia Huck wrote:
> >> On Thu, 9 Jul 2020 14:26:53 +0800
> >> Jason Wang <[email protected]> wrote:
> >>
> >> [Let me note right at the beginning that I first noted this while
> >> listening to Kishon's talk at LPC on Wednesday. I might be very
> >> confused about the background here, so let me apologize beforehand for
> >> any confusion I might spread.]
> >>
> >>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
> >>>> Hi Jason,
> >>>>
> >>>> On 7/8/2020 4:52 PM, Jason Wang wrote:
> >>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
> >>>>>> Hi Jason,
> >>>>>>
> >>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
> >>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
> >>>>>>>> Hi Jason,
> >>>>>>>>
> >>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
> >>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
> >>>>>>>>>> Hi Jason,
> >>>>>>>>>>
> >>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
> >>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
> >>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay
> >>>>>>>>>>>> Abraham I wrote:
> >>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>>>>>> communication over MMIO. This series enables rpmsg
> >>>>>>>>>>>>> communication between
> >>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver
> >>>>>>>>>>>>> (uses vhost) for
> >>>>>>>>>>>>>          rpmsg communication between two SoCs connected to
> >>>>>>>>>>>>> each other
> >>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg
> >>>>>>>>>>>>> communication
> >>>>>>>>>>>>>          between two SoCs connected via NTB
> >>>>>>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> UseCase1 :
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>        VHOST RPMSG VIRTIO RPMSG
> >>>>>>>>>>>>> +                               +
> >>>>>>>>>>>>> |                               |
> >>>>>>>>>>>>> |                               |
> >>>>>>>>>>>>> |                               |
> >>>>>>>>>>>>> |                               |
> >>>>>>>>>>>>> +-----v------+ +------v-------+
> >>>>>>>>>>>>> |   Linux    |                 | Linux    |
> >>>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
> >>>>>>>>>>>>> | <----------------->              |
> >>>>>>>>>>>>> |            | |              |
> >>>>>>>>>>>>> |    SOC1    |                 | SOC2     |
> >>>>>>>>>>>>> +------------+ +--------------+
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> UseCase 2:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>            VHOST RPMSG VIRTIO RPMSG
> >>>>>>>>>>>>> + +
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> +------v------+ +------v------+
> >>>>>>>>>>>>>          | | |             |
> >>>>>>>>>>>>>          |    HOST1 |                                   |
> >>>>>>>>>>>>> HOST2    |
> >>>>>>>>>>>>>          | | |             |
> >>>>>>>>>>>>> +------^------+ +------^------+
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> | +------v------+ +------v------+  |
> >>>>>>>>>>>>> |  | | |             |  |
> >>>>>>>>>>>>> |  |     EP |                                   | EP     
> >>>>>>>>>>>>> |  |
> >>>>>>>>>>>>> |  | CONTROLLER1 |                                   |
> >>>>>>>>>>>>> CONTROLLER2 |  |
> >>>>>>>>>>>>> |  | <-----------------------------------> |  |
> >>>>>>>>>>>>> |  | | |             |  |
> >>>>>>>>>>>>> |  | | |             |  |
> >>>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances  
> >>>>>>>>>>>>> |             |  |
> >>>>>>>>>>>>> |  |             |  (Configured using NTB Function) 
> >>>>>>>>>>>>> |             |  |
> >>>>>>>>>>>>> | +-------------+ +-------------+  |
> >>>>>>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>>>>>>
> >>
> >> First of all, to clarify the terminology:
> >> Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
> >> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
> >
> > Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.
> >> virtqueues + the exiting vhost interfaces?
> >
> > It's implemented to provide the full 'device' functionality.

Ok.

> >>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Software Layering:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The high-level SW layering should look something like
> >>>>>>>>>>>>> below. This series
> >>>>>>>>>>>>> adds support only for RPMSG VHOST, however something
> >>>>>>>>>>>>> similar should be
> >>>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI,
> >>>>>>>>>>>>> NTB, Platform
> >>>>>>>>>>>>> device, user) can use any of the vhost client driver.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           +----------------+ +-----------+  +------------+
> >>>>>>>>>>>>> +----------+
> >>>>>>>>>>>>>           |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST
> >>>>>>>>>>>>> |  |    X     |
> >>>>>>>>>>>>>           +-------^--------+ +-----^-----+  +-----^------+
> >>>>>>>>>>>>> +----^-----+
> >>>>>>>>>>>>>                   | |              |              |
> >>>>>>>>>>>>>                   | |              |              |
> >>>>>>>>>>>>>                   | |              |              |
> >>>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> |                            VHOST
> >>>>>>>>>>>>> CORE                                |
> >>>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                | |                    |                  |
> >>>>>>>>>>>>>                | |                    |                  |
> >>>>>>>>>>>>>                | |                    |                  |
> >>>>>>>>>>>>> +--------v-------+  +----v------+ +----------v----------+ 
> >>>>>>>>>>>>> +----v-----+
> >>>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST | |PLATFORM DEVICE VHOST| 
> >>>>>>>>>>>>> |    X     |
> >>>>>>>>>>>>> +----------------+  +-----------+ +---------------------+ 
> >>>>>>>>>>>>> +----------+
> >>
> >> So, the upper half is basically various functionality types, e.g. a net
> >> device. What is the lower half, a hardware interface? Would it be
> >> equivalent to e.g. a normal PCI device?
> >
> > Right, the upper half should provide the functionality.
> > The bottom layer could be a HW interface (like PCIe device or NTB
> > device) or it could be a SW interface (for accessing virtio ring in
> > userspace) that could be used by Hypervisor.

Ok. In that respect, the SW interface is sufficiently device-like, I
guess.

> >
> > The top half should be transparent to what type of device is actually
> > using it.
> >
> >>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This was initially proposed here [1]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1] ->
> >>>>>>>>>>>>> https://lore.kernel.org/r/[email protected]
> >>>>>>>>>>>>>
> >>>>>>>>>>>> I find this very interesting. A huge patchset so will take
> >>>>>>>>>>>> a bit
> >>>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
> >>>>>>>>>>> Yes, it would be better if there's a git branch for us to
> >>>>>>>>>>> have a look.
> >>>>>>>>>> I've pushed the branch
> >>>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel
> >>>>>>>>>>> some of the
> >>>>>>>>>>> work is
> >>>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
> >>>>>>>>>> This is about connecting two different HW systems both
> >>>>>>>>>> running Linux and
> >>>>>>>>>> doesn't necessarily involve virtualization.
> >>>>>>>>> Right, this is something similar to VOP
> >>>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The
> >>>>>>>>> different is the
> >>>>>>>>> hardware I guess and VOP use userspace application to
> >>>>>>>>> implement the device.
> >>>>>>>> I'd also like to point out, this series tries to have
> >>>>>>>> communication between
> >>>>>>>> two
> >>>>>>>> SoCs in vendor agnostic way. Since this series solves for 2
> >>>>>>>> usecases (PCIe
> >>>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB
> >>>>>>>> framework and
> >>>>>>>> any
> >>>>>>>> of the HW in NTB below should be able to use a virtio-vhost
> >>>>>>>> communication
> >>>>>>>>
> >>>>>>>> #ls drivers/ntb/hw/
> >>>>>>>> amd  epf  idt  intel  mscc
> >>>>>>>>
> >>>>>>>> And similarly for the PCIe RC<->EP communication, this adds a
> >>>>>>>> generic endpoint
> >>>>>>>> function driver and hence any SoC that supports configurable
> >>>>>>>> PCIe endpoint can
> >>>>>>>> use virtio-vhost communication
> >>>>>>>>
> >>>>>>>> # ls drivers/pci/controller/dwc/*ep*
> >>>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
> >>>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> >>>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>>>>>> Thanks for those backgrounds.
> >>>>>>>
> >>>>>>>>>>       So there is no guest or host as in
> >>>>>>>>>> virtualization but two entirely different systems connected
> >>>>>>>>>> via PCIe cable,
> >>>>>>>>>> one
> >>>>>>>>>> acting as guest and one as host. So one system will provide
> >>>>>>>>>> virtio
> >>>>>>>>>> functionality reserving memory for virtqueues and the other
> >>>>>>>>>> provides vhost
> >>>>>>>>>> functionality providing a way to access the virtqueues in
> >>>>>>>>>> virtio memory.
> >>>>>>>>>> One is
> >>>>>>>>>> source and the other is sink and there is no intermediate
> >>>>>>>>>> entity. (vhost was
> >>>>>>>>>> probably intermediate entity in virtualization?)
> >>>>>>>>> (Not a native English speaker) but "vhost" could introduce
> >>>>>>>>> some confusion for
> >>>>>>>>> me since it was use for implementing virtio backend for
> >>>>>>>>> userspace drivers. I
> >>>>>>>>> guess "vringh" could be better.
> >>>>>>>> Initially I had named this vringh but later decided to choose
> >>>>>>>> vhost instead of
> >>>>>>>> vringh. vhost is still a virtio backend (not necessarily
> >>>>>>>> userspace) though it
> >>>>>>>> now resides in an entirely different system. Whatever virtio is
> >>>>>>>> for a frontend
> >>>>>>>> system, vhost can be that for a backend system. vring can be
> >>>>>>>> for accessing
> >>>>>>>> virtqueue and can be used either in frontend or backend.
> >>
> >> I guess that clears up at least some of my questions from above...
> >>
> >>>>>>> Ok.
> >>>>>>>
> >>>>>>>>>>> Have you considered to implement these through vDPA?
> >>>>>>>>>> IIUC vDPA only provides an interface to userspace and an
> >>>>>>>>>> in-kernel rpmsg
> >>>>>>>>>> driver
> >>>>>>>>>> or vhost net driver is not provided.
> >>>>>>>>>>
> >>>>>>>>>> The HW connection looks something like
> >>>>>>>>>> https://pasteboard.co/JfMVVHC.jpg
> >>>>>>>>>> (usecase2 above),
> >>>>>>>>> I see.
> >>>>>>>>>
> >>>>>>>>>>       all the boards run Linux. The middle board provides NTB
> >>>>>>>>>> functionality and board on either side provides virtio/vhost
> >>>>>>>>>> functionality and
> >>>>>>>>>> transfer data using rpmsg.
> >>
> >> This setup looks really interesting (sometimes, it's really hard to
> >> imagine this in the abstract.)
> >>>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
> >>>>>>>>> the existed virtio-bus/drivers? It might work as, except for
> >>>>>>>>> the epf transport, we can introduce a epf "vhost" transport
> >>>>>>>>> driver.
> >>>>>>>> IMHO we'll need two buses one for frontend and other for
> >>>>>>>> backend because the two components can then co-operate/interact
> >>>>>>>> with each other to provide a functionality. Though both will
> >>>>>>>> seemingly provide similar callbacks, they are both provide
> >>>>>>>> symmetrical or complimentary funcitonality and need not be same
> >>>>>>>> or identical.
> >>>>>>>>
> >>>>>>>> Having the same bus can also create sequencing issues.
> >>>>>>>>
> >>>>>>>> If you look at virtio_dev_probe() of virtio_bus
> >>>>>>>>
> >>>>>>>> device_features = dev->config->get_features(dev);
> >>>>>>>>
> >>>>>>>> Now if we use same bus for both front-end and back-end, both
> >>>>>>>> will try to get_features when there has been no set_features.
> >>>>>>>> Ideally vhost device should be initialized first with the set
> >>>>>>>> of features it supports. Vhost and virtio should use "status"
> >>>>>>>> and "features" complimentarily and not identically.
> >>>>>>> Yes, but there's no need for doing status/features passthrough
> >>>>>>> in epf vhost drivers.b
> >>>>>>>
> >>>>>>>> virtio device (or frontend) cannot be initialized before vhost
> >>>>>>>> device (or backend) gets initialized with data such as
> >>>>>>>> features. Similarly vhost (backend)
> >>>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
> >>>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
> >>>>>>>> for virtio as the physical memory for virtqueues are created by
> >>>>>>>> virtio (frontend).
> >>>>>>> epf vhost drivers need to implement two devices: vhost(vringh)
> >>>>>>> device and virtio device (which is a mediated device). The
> >>>>>>> vhost(vringh) device is doing feature negotiation with the
> >>>>>>> virtio device via RC/EP or NTB. The virtio device is doing
> >>>>>>> feature negotiation with local virtio drivers. If there're
> >>>>>>> feature mismatch, epf vhost drivers and do mediation between
> >>>>>>> them.
> >>>>>> Here epf vhost should be initialized with a set of features for
> >>>>>> it to negotiate either as vhost device or virtio device no? Where
> >>>>>> should the initial feature set for epf vhost come from?
> >>>>>
> >>>>> I think it can work as:
> >>>>>
> >>>>> 1) Having an initial features (hard coded in the code) set X in
> >>>>> epf vhost 2) Using this X for both virtio device and vhost(vringh)
> >>>>> device 3) local virtio driver will negotiate with virtio device
> >>>>> with feature set Y 4) remote virtio driver will negotiate with
> >>>>> vringh device with feature set Z 5) mediate between feature Y and
> >>>>> feature Z since both Y and Z are a subset of X
> >>>>>
> >>>> okay. I'm also thinking if we could have configfs for configuring
> >>>> this. Anyways we could find different approaches of configuring
> >>>> this.
> >>>
> >>>
> >>> Yes, and I think some management API is needed even in the design of
> >>> your "Software Layering". In that figure, rpmsg vhost need some
> >>> pre-set or hard-coded features.
> >>
> >> When I saw the plumbers talk, my first idea was "this needs to be a new
> >> transport". You have some hard-coded or pre-configured features, and
> >> then features are negotiated via a transport-specific means in the
> >> usual way. There's basically an extra/extended layer for this (and
> >> status, and whatever).
> >
> > I think for PCIe root complex to PCIe endpoint communication it's
> > still "Virtio Over PCI Bus", though existing layout cannot be used in
> > this context (find virtio capability will fail for modern interface
> > and loading queue status immediately after writing queue number is not
> > possible for root complex to endpoint communication; setup_vq() in
> > virtio_pci_legacy.c).
>
>
> Then you need something that is functional equivalent to virtio PCI
> which is actually the concept of vDPA (e.g vDPA provides alternatives if
> the queue_sel is hard in the EP implementation).

It seems I really need to read up on vDPA more... do you have a pointer
for diving into this alternatives aspect?

>
>
> >
> > "Virtio Over NTB" should anyways be a new transport.
> >>
> >> Does that make any sense?
> >
> > yeah, in the approach I used the initial features are hard-coded in
> > vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
> > layer (vhost only for accessing virtio ring and use virtio drivers on
> > both front end and backend), based on the functionality (e.g, rpmsg),
> > the vhost should be configured with features (to be presented to the
> > virtio) and that's why additional layer or APIs will be required.
>
>
> A question here, if we go with vhost bus approach, does it mean the
> virtio device can only be implemented in EP's userspace?

Can we maybe implement an alternative bus as well that would allow us
to support different virtio device implementations (in addition to the
vhost bus + userspace combination)?

>
> Thanks
>
>
> >>
> >>>
> >>>
> >>>>>>>>> It will have virtqueues but only used for the communication
> >>>>>>>>> between itself and
> >>>>>>>>> uppter virtio driver. And it will have vringh queues which
> >>>>>>>>> will be probe by virtio epf transport drivers. And it needs to
> >>>>>>>>> do datacopy between virtqueue and
> >>>>>>>>> vringh queues.
> >>>>>>>>>
> >>>>>>>>> It works like:
> >>>>>>>>>
> >>>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
> >>>>>>>>> vringh queue/epf>
> >>>>>>>>>
> >>>>>>>>> The advantages is that there's no need for writing new buses
> >>>>>>>>> and drivers.
> >>>>>>>> I think this will work however there is an addtional copy
> >>>>>>>> between vringh queue and virtqueue,
> >>>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we
> >>>>>>> will have:
> >>>>>>>
> >>>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
> >>>>>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
> >>>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using
> >>>>>> vringh?
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>>> And virtio
> >>>>>> ring(2) is created by virtio pci (RC).
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>>>> What epf vhost driver did is to read from virtio ring(1) about
> >>>>>>> the buffer len and addr and them DMA to Linux(RC)?
> >>>>>> okay, I made some optimization here where vhost-rpmsg using a
> >>>>>> helper writes a buffer from rpmsg's upper layer directly to
> >>>>>> remote Linux (RC) as against here were it has to be first written
> >>>>>> to virtio ring (1).
> >>>>>>
> >>>>>> Thinking how this would look for NTB
> >>>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
> >>>>>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
> >>>>>>
> >>>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
> >>>>>
> >>>>> Yes, I think so it needs to use vring to access virtio ring (1) as
> >>>>> well.
> >>>> NTB(HOST1) and virtio ring(1) will be in the same system. So it
> >>>> doesn't have to use vring. virtio ring(1) is by the virtio device
> >>>> the NTB(HOST1) creates.
> >>>
> >>>
> >>> Right.
> >>>
> >>>
> >>>>>> Do you also think this will work seamlessly with virtio_net.c,
> >>>>>> virtio_blk.c?
> >>>>>
> >>>>> Yes.
> >>>> okay, I haven't looked at this but the backend of virtio_blk should
> >>>> access an actual storage device no?
> >>>
> >>>
> >>> Good point, for non-peer device like storage. There's probably no
> >>> need for it to be registered on the virtio bus and it might be better
> >>> to behave as you proposed.
> >>
> >> I might be missing something; but if you expose something as a block
> >> device, it should have something it can access with block reads/writes,
> >> shouldn't it? Of course, that can be a variety of things.
> >>
> >>>
> >>> Just to make sure I understand the design, how is VHOST SCSI expected
> >>> to work in your proposal, does it have a device for file as a backend?
> >>>
> >>>
> >>>>>> I'd like to get clarity on two things in the approach you
> >>>>>> suggested, one is features (since epf vhost should ideally be
> >>>>>> transparent to any virtio driver)
> >>>>>
> >>>>> We can have have an array of pre-defined features indexed by
> >>>>> virtio device id in the code.
> >>>>>
> >>>>>> and the other is how certain inputs to virtio device such as
> >>>>>> number of buffers be determined.
> >>>>>
> >>>>> We can start from hard coded the value like 256, or introduce some
> >>>>> API for user to change the value.
> >>>>>
> >>>>>> Thanks again for your suggestions!
> >>>>>
> >>>>> You're welcome.
> >>>>>
> >>>>> Note that I just want to check whether or not we can reuse the
> >>>>> virtio bus/driver. It's something similar to what you proposed in
> >>>>> Software Layering but we just replace "vhost core" with "virtio
> >>>>> bus" and move the vhost core below epf/ntb/platform transport.
> >>>> Got it. My initial design was based on my understanding of your
> >>>> comments [1].
> >>>
> >>>
> >>> Yes, but that's just for a networking device. If we want something
> >>> more generic, it may require more thought (bus etc).
> >>
> >> I believe that we indeed need something bus-like to be able to support
> >> a variety of devices.
> >
> > I think we could still have adapter layers for different types of
> > devices ([1]) and use existing virtio bus for both front end and back
> > end. Using bus-like will however simplify adding support for new types
> > of devices and adding adapters for devices will be slightly more complex.
> >
> > [1] -> Page 13 in
> > https://linuxplumbersconf.org/event/7/contributions/849/attachments/642/1175/Virtio_for_PCIe_RC_EP_NTB.pdf

So, I guess it's a trade-off: do we expect a variety of device types,
or more of a variety of devices needing different adapters?

2020-09-09 08:45:22

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/9/9 上午12:37, Cornelia Huck wrote:
>> Then you need something that is functional equivalent to virtio PCI
>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>> the queue_sel is hard in the EP implementation).
> It seems I really need to read up on vDPA more... do you have a pointer
> for diving into this alternatives aspect?


See vpda_config_ops in include/linux/vdpa.h

Especially this part:

    int (*set_vq_address)(struct vdpa_device *vdev,
                  u16 idx, u64 desc_area, u64 driver_area,
                  u64 device_area);

This means for the devices (e.g endpoint device) that is hard to
implement virtio-pci layout, it can use any other register layout or
vendor specific way to configure the virtqueue.


>
>>> "Virtio Over NTB" should anyways be a new transport.
>>>> Does that make any sense?
>>> yeah, in the approach I used the initial features are hard-coded in
>>> vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
>>> layer (vhost only for accessing virtio ring and use virtio drivers on
>>> both front end and backend), based on the functionality (e.g, rpmsg),
>>> the vhost should be configured with features (to be presented to the
>>> virtio) and that's why additional layer or APIs will be required.
>> A question here, if we go with vhost bus approach, does it mean the
>> virtio device can only be implemented in EP's userspace?
> Can we maybe implement an alternative bus as well that would allow us
> to support different virtio device implementations (in addition to the
> vhost bus + userspace combination)?


That should be fine, but I'm not quite sure that implementing the device
in kerne (kthread) is the good approach.

Thanks


>

2020-09-14 07:27:34

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Jason,

On 01/09/20 2:20 pm, Jason Wang wrote:
>
> On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote:
>> Hi,
>>
>> On 28/08/20 4:04 pm, Cornelia Huck wrote:
>>> On Thu, 9 Jul 2020 14:26:53 +0800
>>> Jason Wang <[email protected]> wrote:
>>>
>>> [Let me note right at the beginning that I first noted this while
>>> listening to Kishon's talk at LPC on Wednesday. I might be very
>>> confused about the background here, so let me apologize beforehand for
>>> any confusion I might spread.]
>>>
>>>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 7/8/2020 4:52 PM, Jason Wang wrote:
>>>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>>>>>> Hi Jason,
>>>>>>>>>
>>>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>>>>>> Hi Jason,
>>>>>>>>>>>
>>>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay
>>>>>>>>>>>>> Abraham I wrote:
>>>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>>>>>> communication over MMIO. This series enables rpmsg
>>>>>>>>>>>>>> communication between
>>>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver
>>>>>>>>>>>>>> (uses vhost) for
>>>>>>>>>>>>>>          rpmsg communication between two SoCs connected to
>>>>>>>>>>>>>> each other
>>>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg
>>>>>>>>>>>>>> communication
>>>>>>>>>>>>>>          between two SoCs connected via NTB
>>>>>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> UseCase1 :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        VHOST RPMSG VIRTIO RPMSG
>>>>>>>>>>>>>> +                               +
>>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>>> +-----v------+ +------v-------+
>>>>>>>>>>>>>> |   Linux    |                 | Linux    |
>>>>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>>>>>> | <----------------->              |
>>>>>>>>>>>>>> |            | |              |
>>>>>>>>>>>>>> |    SOC1    |                 | SOC2     |
>>>>>>>>>>>>>> +------------+ +--------------+
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> UseCase 2:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>            VHOST RPMSG VIRTIO RPMSG
>>>>>>>>>>>>>> + +
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> +------v------+ +------v------+
>>>>>>>>>>>>>>          | | |             |
>>>>>>>>>>>>>>          |    HOST1 |                                   |
>>>>>>>>>>>>>> HOST2    |
>>>>>>>>>>>>>>          | | |             |
>>>>>>>>>>>>>> +------^------+ +------^------+
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> | +------v------+ +------v------+  |
>>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>>> |  |     EP |                                   | EP     
>>>>>>>>>>>>>> |  |
>>>>>>>>>>>>>> |  | CONTROLLER1 |                                   |
>>>>>>>>>>>>>> CONTROLLER2 |  |
>>>>>>>>>>>>>> |  | <-----------------------------------> |  |
>>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances  
>>>>>>>>>>>>>> |             |  |
>>>>>>>>>>>>>> |  |             |  (Configured using NTB Function) 
>>>>>>>>>>>>>> |             |  |
>>>>>>>>>>>>>> | +-------------+ +-------------+  |
>>>>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>>>>
>>>
>>> First of all, to clarify the terminology:
>>> Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
>>> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
>>
>> Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.
>>> virtqueues + the exiting vhost interfaces?
>>
>> It's implemented to provide the full 'device' functionality.
>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Software Layering:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The high-level SW layering should look something like
>>>>>>>>>>>>>> below. This series
>>>>>>>>>>>>>> adds support only for RPMSG VHOST, however something
>>>>>>>>>>>>>> similar should be
>>>>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI,
>>>>>>>>>>>>>> NTB, Platform
>>>>>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>           +----------------+ +-----------+  +------------+
>>>>>>>>>>>>>> +----------+
>>>>>>>>>>>>>>           |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST
>>>>>>>>>>>>>> |  |    X     |
>>>>>>>>>>>>>>           +-------^--------+ +-----^-----+  +-----^------+
>>>>>>>>>>>>>> +----^-----+
>>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> |                            VHOST
>>>>>>>>>>>>>> CORE                                |
>>>>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>> +--------v-------+  +----v------+ +----------v----------+ 
>>>>>>>>>>>>>> +----v-----+
>>>>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST | |PLATFORM DEVICE VHOST| 
>>>>>>>>>>>>>> |    X     |
>>>>>>>>>>>>>> +----------------+  +-----------+ +---------------------+ 
>>>>>>>>>>>>>> +----------+
>>>
>>> So, the upper half is basically various functionality types, e.g. a net
>>> device. What is the lower half, a hardware interface? Would it be
>>> equivalent to e.g. a normal PCI device?
>>
>> Right, the upper half should provide the functionality.
>> The bottom layer could be a HW interface (like PCIe device or NTB
>> device) or it could be a SW interface (for accessing virtio ring in
>> userspace) that could be used by Hypervisor.
>>
>> The top half should be transparent to what type of device is actually
>> using it.
>>
>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] ->
>>>>>>>>>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I find this very interesting. A huge patchset so will take
>>>>>>>>>>>>> a bit
>>>>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>>>>>> Yes, it would be better if there's a git branch for us to
>>>>>>>>>>>> have a look.
>>>>>>>>>>> I've pushed the branch
>>>>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel
>>>>>>>>>>>> some of the
>>>>>>>>>>>> work is
>>>>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>>>>>> This is about connecting two different HW systems both
>>>>>>>>>>> running Linux and
>>>>>>>>>>> doesn't necessarily involve virtualization.
>>>>>>>>>> Right, this is something similar to VOP
>>>>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The
>>>>>>>>>> different is the
>>>>>>>>>> hardware I guess and VOP use userspace application to
>>>>>>>>>> implement the device.
>>>>>>>>> I'd also like to point out, this series tries to have
>>>>>>>>> communication between
>>>>>>>>> two
>>>>>>>>> SoCs in vendor agnostic way. Since this series solves for 2
>>>>>>>>> usecases (PCIe
>>>>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB
>>>>>>>>> framework and
>>>>>>>>> any
>>>>>>>>> of the HW in NTB below should be able to use a virtio-vhost
>>>>>>>>> communication
>>>>>>>>>
>>>>>>>>> #ls drivers/ntb/hw/
>>>>>>>>> amd  epf  idt  intel  mscc
>>>>>>>>>
>>>>>>>>> And similarly for the PCIe RC<->EP communication, this adds a
>>>>>>>>> generic endpoint
>>>>>>>>> function driver and hence any SoC that supports configurable
>>>>>>>>> PCIe endpoint can
>>>>>>>>> use virtio-vhost communication
>>>>>>>>>
>>>>>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>>>>>> Thanks for those backgrounds.
>>>>>>>>
>>>>>>>>>>>       So there is no guest or host as in
>>>>>>>>>>> virtualization but two entirely different systems connected
>>>>>>>>>>> via PCIe cable,
>>>>>>>>>>> one
>>>>>>>>>>> acting as guest and one as host. So one system will provide
>>>>>>>>>>> virtio
>>>>>>>>>>> functionality reserving memory for virtqueues and the other
>>>>>>>>>>> provides vhost
>>>>>>>>>>> functionality providing a way to access the virtqueues in
>>>>>>>>>>> virtio memory.
>>>>>>>>>>> One is
>>>>>>>>>>> source and the other is sink and there is no intermediate
>>>>>>>>>>> entity. (vhost was
>>>>>>>>>>> probably intermediate entity in virtualization?)
>>>>>>>>>> (Not a native English speaker) but "vhost" could introduce
>>>>>>>>>> some confusion for
>>>>>>>>>> me since it was use for implementing virtio backend for
>>>>>>>>>> userspace drivers. I
>>>>>>>>>> guess "vringh" could be better.
>>>>>>>>> Initially I had named this vringh but later decided to choose
>>>>>>>>> vhost instead of
>>>>>>>>> vringh. vhost is still a virtio backend (not necessarily
>>>>>>>>> userspace) though it
>>>>>>>>> now resides in an entirely different system. Whatever virtio is
>>>>>>>>> for a frontend
>>>>>>>>> system, vhost can be that for a backend system. vring can be
>>>>>>>>> for accessing
>>>>>>>>> virtqueue and can be used either in frontend or backend.
>>>
>>> I guess that clears up at least some of my questions from above...
>>>
>>>>>>>> Ok.
>>>>>>>>
>>>>>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>>>>>> IIUC vDPA only provides an interface to userspace and an
>>>>>>>>>>> in-kernel rpmsg
>>>>>>>>>>> driver
>>>>>>>>>>> or vhost net driver is not provided.
>>>>>>>>>>>
>>>>>>>>>>> The HW connection looks something like
>>>>>>>>>>> https://pasteboard.co/JfMVVHC.jpg
>>>>>>>>>>> (usecase2 above),
>>>>>>>>>> I see.
>>>>>>>>>>
>>>>>>>>>>>       all the boards run Linux. The middle board provides NTB
>>>>>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>>>>>> functionality and
>>>>>>>>>>> transfer data using rpmsg.
>>>
>>> This setup looks really interesting (sometimes, it's really hard to
>>> imagine this in the abstract.)
>>>>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
>>>>>>>>>> the existed virtio-bus/drivers? It might work as, except for
>>>>>>>>>> the epf transport, we can introduce a epf "vhost" transport
>>>>>>>>>> driver.
>>>>>>>>> IMHO we'll need two buses one for frontend and other for
>>>>>>>>> backend because the two components can then co-operate/interact
>>>>>>>>> with each other to provide a functionality. Though both will
>>>>>>>>> seemingly provide similar callbacks, they are both provide
>>>>>>>>> symmetrical or complimentary funcitonality and need not be same
>>>>>>>>> or identical.
>>>>>>>>>
>>>>>>>>> Having the same bus can also create sequencing issues.
>>>>>>>>>
>>>>>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>>>>>
>>>>>>>>> device_features = dev->config->get_features(dev);
>>>>>>>>>
>>>>>>>>> Now if we use same bus for both front-end and back-end, both
>>>>>>>>> will try to get_features when there has been no set_features.
>>>>>>>>> Ideally vhost device should be initialized first with the set
>>>>>>>>> of features it supports. Vhost and virtio should use "status"
>>>>>>>>> and "features" complimentarily and not identically.
>>>>>>>> Yes, but there's no need for doing status/features passthrough
>>>>>>>> in epf vhost drivers.b
>>>>>>>>
>>>>>>>>> virtio device (or frontend) cannot be initialized before vhost
>>>>>>>>> device (or backend) gets initialized with data such as
>>>>>>>>> features. Similarly vhost (backend)
>>>>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
>>>>>>>>> for virtio as the physical memory for virtqueues are created by
>>>>>>>>> virtio (frontend).
>>>>>>>> epf vhost drivers need to implement two devices: vhost(vringh)
>>>>>>>> device and virtio device (which is a mediated device). The
>>>>>>>> vhost(vringh) device is doing feature negotiation with the
>>>>>>>> virtio device via RC/EP or NTB. The virtio device is doing
>>>>>>>> feature negotiation with local virtio drivers. If there're
>>>>>>>> feature mismatch, epf vhost drivers and do mediation between
>>>>>>>> them.
>>>>>>> Here epf vhost should be initialized with a set of features for
>>>>>>> it to negotiate either as vhost device or virtio device no? Where
>>>>>>> should the initial feature set for epf vhost come from?
>>>>>>
>>>>>> I think it can work as:
>>>>>>
>>>>>> 1) Having an initial features (hard coded in the code) set X in
>>>>>> epf vhost 2) Using this X for both virtio device and vhost(vringh)
>>>>>> device 3) local virtio driver will negotiate with virtio device
>>>>>> with feature set Y 4) remote virtio driver will negotiate with
>>>>>> vringh device with feature set Z 5) mediate between feature Y and
>>>>>> feature Z since both Y and Z are a subset of X
>>>>>>
>>>>> okay. I'm also thinking if we could have configfs for configuring
>>>>> this. Anyways we could find different approaches of configuring
>>>>> this.
>>>>
>>>>
>>>> Yes, and I think some management API is needed even in the design of
>>>> your "Software Layering". In that figure, rpmsg vhost need some
>>>> pre-set or hard-coded features.
>>>
>>> When I saw the plumbers talk, my first idea was "this needs to be a new
>>> transport". You have some hard-coded or pre-configured features, and
>>> then features are negotiated via a transport-specific means in the
>>> usual way. There's basically an extra/extended layer for this (and
>>> status, and whatever).
>>
>> I think for PCIe root complex to PCIe endpoint communication it's
>> still "Virtio Over PCI Bus", though existing layout cannot be used in
>> this context (find virtio capability will fail for modern interface
>> and loading queue status immediately after writing queue number is not
>> possible for root complex to endpoint communication; setup_vq() in
>> virtio_pci_legacy.c).
>
>
> Then you need something that is functional equivalent to virtio PCI
> which is actually the concept of vDPA (e.g vDPA provides alternatives if
> the queue_sel is hard in the EP implementation).

Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).

struct vdpa_config_ops {
/* Virtqueue ops */
int (*set_vq_address)(struct vdpa_device *vdev,
u16 idx, u64 desc_area, u64 driver_area,
u64 device_area);
void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
struct vdpa_callback *cb);
void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
const struct vdpa_vq_state *state);
int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
struct vdpa_vq_state *state);
struct vdpa_notification_area
(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
/* vq irq is not expected to be changed once DRIVER_OK is set */
int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);

/* Device ops */
u32 (*get_vq_align)(struct vdpa_device *vdev);
u64 (*get_features)(struct vdpa_device *vdev);
int (*set_features)(struct vdpa_device *vdev, u64 features);
void (*set_config_cb)(struct vdpa_device *vdev,
struct vdpa_callback *cb);
u16 (*get_vq_num_max)(struct vdpa_device *vdev);
u32 (*get_device_id)(struct vdpa_device *vdev);
u32 (*get_vendor_id)(struct vdpa_device *vdev);
u8 (*get_status)(struct vdpa_device *vdev);
void (*set_status)(struct vdpa_device *vdev, u8 status);
void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
void *buf, unsigned int len);
void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
const void *buf, unsigned int len);
u32 (*get_generation)(struct vdpa_device *vdev);

/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
u64 pa, u32 perm);
int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

/* Free device resources */
void (*free)(struct vdpa_device *vdev);
};

+struct vhost_config_ops {
+ int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[]);
+ void (*del_vqs)(struct vhost_dev *vdev);
+ int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len);
+ int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len);
+ int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+ int (*set_status)(struct vhost_dev *vdev, u8 status);
+ u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.

I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].

Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.

I think vdpa always works with client drivers in userspace and providing
userspace address for vring.
>
>
>>
>> "Virtio Over NTB" should anyways be a new transport.
>>>
>>> Does that make any sense?
>>
>> yeah, in the approach I used the initial features are hard-coded in
>> vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
>> layer (vhost only for accessing virtio ring and use virtio drivers on
>> both front end and backend), based on the functionality (e.g, rpmsg),
>> the vhost should be configured with features (to be presented to the
>> virtio) and that's why additional layer or APIs will be required.
>
>
> A question here, if we go with vhost bus approach, does it mean the
> virtio device can only be implemented in EP's userspace?

The vhost bus approach doesn't provide any restriction in where the
virto backend device should be created. This series creates two types of
virtio backend device (one for PCIe endpoint and the other for NTB) and
both these devices are created in kernel.

Thanks
Kishon

>
> Thanks
>
>
>>>
>>>>
>>>>
>>>>>>>>>> It will have virtqueues but only used for the communication
>>>>>>>>>> between itself and
>>>>>>>>>> uppter virtio driver. And it will have vringh queues which
>>>>>>>>>> will be probe by virtio epf transport drivers. And it needs to
>>>>>>>>>> do datacopy between virtqueue and
>>>>>>>>>> vringh queues.
>>>>>>>>>>
>>>>>>>>>> It works like:
>>>>>>>>>>
>>>>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
>>>>>>>>>> vringh queue/epf>
>>>>>>>>>>
>>>>>>>>>> The advantages is that there's no need for writing new buses
>>>>>>>>>> and drivers.
>>>>>>>>> I think this will work however there is an addtional copy
>>>>>>>>> between vringh queue and virtqueue,
>>>>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we
>>>>>>>> will have:
>>>>>>>>
>>>>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
>>>>>>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
>>>>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using
>>>>>>> vringh?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> And virtio
>>>>>>> ring(2) is created by virtio pci (RC).
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> What epf vhost driver did is to read from virtio ring(1) about
>>>>>>>> the buffer len and addr and them DMA to Linux(RC)?
>>>>>>> okay, I made some optimization here where vhost-rpmsg using a
>>>>>>> helper writes a buffer from rpmsg's upper layer directly to
>>>>>>> remote Linux (RC) as against here were it has to be first written
>>>>>>> to virtio ring (1).
>>>>>>>
>>>>>>> Thinking how this would look for NTB
>>>>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
>>>>>>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
>>>>>>>
>>>>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>>>>>
>>>>>> Yes, I think so it needs to use vring to access virtio ring (1) as
>>>>>> well.
>>>>> NTB(HOST1) and virtio ring(1) will be in the same system. So it
>>>>> doesn't have to use vring. virtio ring(1) is by the virtio device
>>>>> the NTB(HOST1) creates.
>>>>
>>>>
>>>> Right.
>>>>
>>>>
>>>>>>> Do you also think this will work seamlessly with virtio_net.c,
>>>>>>> virtio_blk.c?
>>>>>>
>>>>>> Yes.
>>>>> okay, I haven't looked at this but the backend of virtio_blk should
>>>>> access an actual storage device no?
>>>>
>>>>
>>>> Good point, for non-peer device like storage. There's probably no
>>>> need for it to be registered on the virtio bus and it might be better
>>>> to behave as you proposed.
>>>
>>> I might be missing something; but if you expose something as a block
>>> device, it should have something it can access with block reads/writes,
>>> shouldn't it? Of course, that can be a variety of things.
>>>
>>>>
>>>> Just to make sure I understand the design, how is VHOST SCSI expected
>>>> to work in your proposal, does it have a device for file as a backend?
>>>>
>>>>
>>>>>>> I'd like to get clarity on two things in the approach you
>>>>>>> suggested, one is features (since epf vhost should ideally be
>>>>>>> transparent to any virtio driver)
>>>>>>
>>>>>> We can have have an array of pre-defined features indexed by
>>>>>> virtio device id in the code.
>>>>>>
>>>>>>> and the other is how certain inputs to virtio device such as
>>>>>>> number of buffers be determined.
>>>>>>
>>>>>> We can start from hard coded the value like 256, or introduce some
>>>>>> API for user to change the value.
>>>>>>
>>>>>>> Thanks again for your suggestions!
>>>>>>
>>>>>> You're welcome.
>>>>>>
>>>>>> Note that I just want to check whether or not we can reuse the
>>>>>> virtio bus/driver. It's something similar to what you proposed in
>>>>>> Software Layering but we just replace "vhost core" with "virtio
>>>>>> bus" and move the vhost core below epf/ntb/platform transport.
>>>>> Got it. My initial design was based on my understanding of your
>>>>> comments [1].
>>>>
>>>>
>>>> Yes, but that's just for a networking device. If we want something
>>>> more generic, it may require more thought (bus etc).
>>>
>>> I believe that we indeed need something bus-like to be able to support
>>> a variety of devices.
>>
>> I think we could still have adapter layers for different types of
>> devices ([1]) and use existing virtio bus for both front end and back
>> end. Using bus-like will however simplify adding support for new types
>> of devices and adding adapters for devices will be slightly more complex.
>>
>> [1] -> Page 13 in
>> https://linuxplumbersconf.org/event/7/contributions/849/attachments/642/1175/Virtio_for_PCIe_RC_EP_NTB.pdf
>>
>>>
>>>>
>>>>
>>>>>
>>>>> I'll try to create something based on your proposed design here.
>>>>
>>>>
>>>> Sure, but for coding, we'd better wait for other's opinion here.
>>>
>>> Please tell me if my thoughts above make any sense... I have just
>>> started looking at that, so I might be completely off.
>>
>> I think your understanding is correct! Thanks for your inputs.
>>
>> Thanks
>> Kishon
>

2020-09-15 08:23:51

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Kishon:

On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>> Then you need something that is functional equivalent to virtio PCI
>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>> the queue_sel is hard in the EP implementation).
> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
> the VHOST driver to configure VHOST device).
>
> struct vdpa_config_ops {
> /* Virtqueue ops */
> int (*set_vq_address)(struct vdpa_device *vdev,
> u16 idx, u64 desc_area, u64 driver_area,
> u64 device_area);
> void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
> void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
> void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
> struct vdpa_callback *cb);
> void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
> bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
> int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
> const struct vdpa_vq_state *state);
> int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
> struct vdpa_vq_state *state);
> struct vdpa_notification_area
> (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
> /* vq irq is not expected to be changed once DRIVER_OK is set */
> int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>
> /* Device ops */
> u32 (*get_vq_align)(struct vdpa_device *vdev);
> u64 (*get_features)(struct vdpa_device *vdev);
> int (*set_features)(struct vdpa_device *vdev, u64 features);
> void (*set_config_cb)(struct vdpa_device *vdev,
> struct vdpa_callback *cb);
> u16 (*get_vq_num_max)(struct vdpa_device *vdev);
> u32 (*get_device_id)(struct vdpa_device *vdev);
> u32 (*get_vendor_id)(struct vdpa_device *vdev);
> u8 (*get_status)(struct vdpa_device *vdev);
> void (*set_status)(struct vdpa_device *vdev, u8 status);
> void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
> void *buf, unsigned int len);
> void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
> const void *buf, unsigned int len);
> u32 (*get_generation)(struct vdpa_device *vdev);
>
> /* DMA ops */
> int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
> int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
> u64 pa, u32 perm);
> int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>
> /* Free device resources */
> void (*free)(struct vdpa_device *vdev);
> };
>
> +struct vhost_config_ops {
> + int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
> + unsigned int num_bufs, struct vhost_virtqueue *vqs[],
> + vhost_vq_callback_t *callbacks[],
> + const char * const names[]);
> + void (*del_vqs)(struct vhost_dev *vdev);
> + int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len);
> + int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len);
> + int (*set_features)(struct vhost_dev *vdev, u64 device_features);
> + int (*set_status)(struct vhost_dev *vdev, u8 status);
> + u8 (*get_status)(struct vhost_dev *vdev);
> +};
> +
> struct virtio_config_ops
> I think there's some overlap here and some of the ops tries to do the
> same thing.
>
> I think it differs in (*set_vq_address)() and (*create_vqs)().
> [create_vqs() introduced in struct vhost_config_ops provides
> complimentary functionality to (*find_vqs)() in struct
> virtio_config_ops. It seemingly encapsulates the functionality of
> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>
> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
> set_vq_address() directly provides the virtqueue address to the vdpa
> device but create_vqs() only provides the parameters of the virtqueue
> (like the number of virtqueues, number of buffers) but does not directly
> provide the address. IMO the backend client drivers (like net or vhost)
> shouldn't/cannot by itself know how to access the vring created on
> virtio front-end. The vdpa device/vhost device should have logic for
> that. That will help the client drivers to work with different types of
> vdpa device/vhost device and can access the vring created by virtio
> irrespective of whether the vring can be accessed via mmio or kernel
> space or user space.
>
> I think vdpa always works with client drivers in userspace and providing
> userspace address for vring.


Sorry for being unclear. What I meant is not replacing vDPA with the
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
with vDPA in:

My question is basically for the part of virtio_pci_epf_send_command(),
so it looks to me you have a vendor specific API to replace the
virtio-pci layout of the BAR:


+static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
+                       u32 command)
+{
+    struct virtio_pci_epf *pci_epf;
+    void __iomem *ioaddr;
+    ktime_t timeout;
+    bool timedout;
+    int ret = 0;
+    u8 status;
+
+    pci_epf = to_virtio_pci_epf(vp_dev);
+    ioaddr = vp_dev->ioaddr;
+
+    mutex_lock(&pci_epf->lock);
+    writeb(command, ioaddr + HOST_CMD);
+    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
+    while (1) {
+        timedout = ktime_after(ktime_get(), timeout);
+        status = readb(ioaddr + HOST_CMD_STATUS);
+

Several questions:

- It's not clear to me how the synchronization is done between the RC
and EP. E.g how and when the value of HOST_CMD_STATUS can be changed. 
If you still want to introduce a new transport, a virtio spec patch
would be helpful for us to understand the device API.
- You have you vendor specific layout (according to
virtio_pci_epb_table()), so I guess you it's better to have a vendor
specific vDPA driver instead
- The advantage of vendor specific vDPA driver is that it can 1) have
less codes 2) support userspace drivers through vhost-vDPA (instead of
inventing new APIs since we can't use vfio-pci here).


>>> "Virtio Over NTB" should anyways be a new transport.
>>>> Does that make any sense?
>>> yeah, in the approach I used the initial features are hard-coded in
>>> vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
>>> layer (vhost only for accessing virtio ring and use virtio drivers on
>>> both front end and backend), based on the functionality (e.g, rpmsg),
>>> the vhost should be configured with features (to be presented to the
>>> virtio) and that's why additional layer or APIs will be required.
>> A question here, if we go with vhost bus approach, does it mean the
>> virtio device can only be implemented in EP's userspace?
> The vhost bus approach doesn't provide any restriction in where the
> virto backend device should be created. This series creates two types of
> virtio backend device (one for PCIe endpoint and the other for NTB) and
> both these devices are created in kernel.


Ok.

Thanks


>
> Thanks
> Kishon
>

2020-09-15 18:57:28

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

Hi Jason,

On 15/09/20 1:48 pm, Jason Wang wrote:
> Hi Kishon:
>
> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>> Then you need something that is functional equivalent to virtio PCI
>>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>>> the queue_sel is hard in the EP implementation).
>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>> the VHOST driver to configure VHOST device).
>>
>> struct vdpa_config_ops {
>>     /* Virtqueue ops */
>>     int (*set_vq_address)(struct vdpa_device *vdev,
>>                   u16 idx, u64 desc_area, u64 driver_area,
>>                   u64 device_area);
>>     void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>     void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>     void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>               struct vdpa_callback *cb);
>>     void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
>>     bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>     int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>                 const struct vdpa_vq_state *state);
>>     int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>                 struct vdpa_vq_state *state);
>>     struct vdpa_notification_area
>>     (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>     /* vq irq is not expected to be changed once DRIVER_OK is set */
>>     int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>
>>     /* Device ops */
>>     u32 (*get_vq_align)(struct vdpa_device *vdev);
>>     u64 (*get_features)(struct vdpa_device *vdev);
>>     int (*set_features)(struct vdpa_device *vdev, u64 features);
>>     void (*set_config_cb)(struct vdpa_device *vdev,
>>                   struct vdpa_callback *cb);
>>     u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>     u32 (*get_device_id)(struct vdpa_device *vdev);
>>     u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>     u8 (*get_status)(struct vdpa_device *vdev);
>>     void (*set_status)(struct vdpa_device *vdev, u8 status);
>>     void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>                void *buf, unsigned int len);
>>     void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>                const void *buf, unsigned int len);
>>     u32 (*get_generation)(struct vdpa_device *vdev);
>>
>>     /* DMA ops */
>>     int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
>>     int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>                u64 pa, u32 perm);
>>     int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>
>>     /* Free device resources */
>>     void (*free)(struct vdpa_device *vdev);
>> };
>>
>> +struct vhost_config_ops {
>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>> +              vhost_vq_callback_t *callbacks[],
>> +              const char * const names[]);
>> +    void (*del_vqs)(struct vhost_dev *vdev);
>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>> int len);
>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>> len);
>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>> +    u8 (*get_status)(struct vhost_dev *vdev);
>> +};
>> +
>> struct virtio_config_ops
>> I think there's some overlap here and some of the ops tries to do the
>> same thing.
>>
>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>> [create_vqs() introduced in struct vhost_config_ops provides
>> complimentary functionality to (*find_vqs)() in struct
>> virtio_config_ops. It seemingly encapsulates the functionality of
>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>
>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>> set_vq_address() directly provides the virtqueue address to the vdpa
>> device but create_vqs() only provides the parameters of the virtqueue
>> (like the number of virtqueues, number of buffers) but does not directly
>> provide the address. IMO the backend client drivers (like net or vhost)
>> shouldn't/cannot by itself know how to access the vring created on
>> virtio front-end. The vdpa device/vhost device should have logic for
>> that. That will help the client drivers to work with different types of
>> vdpa device/vhost device and can access the vring created by virtio
>> irrespective of whether the vring can be accessed via mmio or kernel
>> space or user space.
>>
>> I think vdpa always works with client drivers in userspace and providing
>> userspace address for vring.
>
>
> Sorry for being unclear. What I meant is not replacing vDPA with the
> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
> with vDPA in:

Okay, so the virtio back-end still use vhost and front end should use
vDPA. I see. So the host side PCI driver for EPF should populate
vdpa_config_ops and invoke vdpa_register_device().
>
> My question is basically for the part of virtio_pci_epf_send_command(),
> so it looks to me you have a vendor specific API to replace the
> virtio-pci layout of the BAR:

Even when we use vDPA, we have to use some sort of
virtio_pci_epf_send_command() to communicate with virtio backend right?

Right, the layout is slightly different from the standard layout.

This is the layout
struct epf_vhost_reg_queue {
u8 cmd;
u8 cmd_status;
u16 status;
u16 num_buffers;
u16 msix_vector;
u64 queue_addr;
} __packed;

struct epf_vhost_reg {
u64 host_features;
u64 guest_features;
u16 msix_config;
u16 num_queues;
u8 device_status;
u8 config_generation;
u32 isr;
u8 cmd;
u8 cmd_status;
struct epf_vhost_reg_queue vq[MAX_VQS];
} __packed;
>
>
> +static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
> +                       u32 command)
> +{
> +    struct virtio_pci_epf *pci_epf;
> +    void __iomem *ioaddr;
> +    ktime_t timeout;
> +    bool timedout;
> +    int ret = 0;
> +    u8 status;
> +
> +    pci_epf = to_virtio_pci_epf(vp_dev);
> +    ioaddr = vp_dev->ioaddr;
> +
> +    mutex_lock(&pci_epf->lock);
> +    writeb(command, ioaddr + HOST_CMD);
> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
> +    while (1) {
> +        timedout = ktime_after(ktime_get(), timeout);
> +        status = readb(ioaddr + HOST_CMD_STATUS);
> +
>
> Several questions:
>
> - It's not clear to me how the synchronization is done between the RC
> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.

The HOST_CMD (commands sent to the EP) is serialized by using mutex.
Once the EP reads the command, it resets the value in HOST_CMD. So
HOST_CMD is less likely an issue.

A sufficiently large time is given for the EP to complete it's operation
(1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
be case where EP updates HOST_CMD_STATUS after RC writes
HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
failure and error-ed out.
 
> If you still want to introduce a new transport, a virtio spec patch
> would be helpful for us to understand the device API.

Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?
> - You have you vendor specific layout (according to
> virtio_pci_epb_table()), so I guess you it's better to have a vendor
> specific vDPA driver instead

Okay, with vDPA, we are free to define our own layouts.
> - The advantage of vendor specific vDPA driver is that it can 1) have
> less codes 2) support userspace drivers through vhost-vDPA (instead of
> inventing new APIs since we can't use vfio-pci here).

I see there's an additional level of indirection from virtio to vDPA and
probably no need for spec update but don't exactly see how it'll reduce
code.

For 2, Isn't vhost-vdpa supposed to run on virtio backend?

From a high level, I think I should be able to use vDPA for
virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
functionality).

Thanks
Kishon

2020-09-16 03:12:52

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 15/09/20 1:48 pm, Jason Wang wrote:
>> Hi Kishon:
>>
>> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>>> Then you need something that is functional equivalent to virtio PCI
>>>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>>>> the queue_sel is hard in the EP implementation).
>>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>>> the VHOST driver to configure VHOST device).
>>>
>>> struct vdpa_config_ops {
>>>     /* Virtqueue ops */
>>>     int (*set_vq_address)(struct vdpa_device *vdev,
>>>                   u16 idx, u64 desc_area, u64 driver_area,
>>>                   u64 device_area);
>>>     void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>>     void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>>     void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>>               struct vdpa_callback *cb);
>>>     void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
>>>     bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>>     int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>                 const struct vdpa_vq_state *state);
>>>     int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>                 struct vdpa_vq_state *state);
>>>     struct vdpa_notification_area
>>>     (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>>     /* vq irq is not expected to be changed once DRIVER_OK is set */
>>>     int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>>
>>>     /* Device ops */
>>>     u32 (*get_vq_align)(struct vdpa_device *vdev);
>>>     u64 (*get_features)(struct vdpa_device *vdev);
>>>     int (*set_features)(struct vdpa_device *vdev, u64 features);
>>>     void (*set_config_cb)(struct vdpa_device *vdev,
>>>                   struct vdpa_callback *cb);
>>>     u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>>     u32 (*get_device_id)(struct vdpa_device *vdev);
>>>     u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>>     u8 (*get_status)(struct vdpa_device *vdev);
>>>     void (*set_status)(struct vdpa_device *vdev, u8 status);
>>>     void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>>                void *buf, unsigned int len);
>>>     void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>>                const void *buf, unsigned int len);
>>>     u32 (*get_generation)(struct vdpa_device *vdev);
>>>
>>>     /* DMA ops */
>>>     int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
>>>     int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>>                u64 pa, u32 perm);
>>>     int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>>
>>>     /* Free device resources */
>>>     void (*free)(struct vdpa_device *vdev);
>>> };
>>>
>>> +struct vhost_config_ops {
>>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>>> +              vhost_vq_callback_t *callbacks[],
>>> +              const char * const names[]);
>>> +    void (*del_vqs)(struct vhost_dev *vdev);
>>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>>> int len);
>>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>>> len);
>>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>>> +    u8 (*get_status)(struct vhost_dev *vdev);
>>> +};
>>> +
>>> struct virtio_config_ops
>>> I think there's some overlap here and some of the ops tries to do the
>>> same thing.
>>>
>>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>>> [create_vqs() introduced in struct vhost_config_ops provides
>>> complimentary functionality to (*find_vqs)() in struct
>>> virtio_config_ops. It seemingly encapsulates the functionality of
>>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>>
>>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>>> set_vq_address() directly provides the virtqueue address to the vdpa
>>> device but create_vqs() only provides the parameters of the virtqueue
>>> (like the number of virtqueues, number of buffers) but does not directly
>>> provide the address. IMO the backend client drivers (like net or vhost)
>>> shouldn't/cannot by itself know how to access the vring created on
>>> virtio front-end. The vdpa device/vhost device should have logic for
>>> that. That will help the client drivers to work with different types of
>>> vdpa device/vhost device and can access the vring created by virtio
>>> irrespective of whether the vring can be accessed via mmio or kernel
>>> space or user space.
>>>
>>> I think vdpa always works with client drivers in userspace and providing
>>> userspace address for vring.
>>
>> Sorry for being unclear. What I meant is not replacing vDPA with the
>> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
>> with vDPA in:
> Okay, so the virtio back-end still use vhost and front end should use
> vDPA. I see. So the host side PCI driver for EPF should populate
> vdpa_config_ops and invoke vdpa_register_device().


Yes.


>> My question is basically for the part of virtio_pci_epf_send_command(),
>> so it looks to me you have a vendor specific API to replace the
>> virtio-pci layout of the BAR:
> Even when we use vDPA, we have to use some sort of
> virtio_pci_epf_send_command() to communicate with virtio backend right?


Right.


>
> Right, the layout is slightly different from the standard layout.
>
> This is the layout
> struct epf_vhost_reg_queue {
> u8 cmd;
> u8 cmd_status;
> u16 status;
> u16 num_buffers;
> u16 msix_vector;
> u64 queue_addr;


What's the meaning of queue_addr here?

Does not mean the device expects a contiguous memory for avail/desc/used
ring?


> } __packed;
>
> struct epf_vhost_reg {
> u64 host_features;
> u64 guest_features;
> u16 msix_config;
> u16 num_queues;
> u8 device_status;
> u8 config_generation;
> u32 isr;
> u8 cmd;
> u8 cmd_status;
> struct epf_vhost_reg_queue vq[MAX_VQS];
> } __packed;
>>
>> +static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
>> +                       u32 command)
>> +{
>> +    struct virtio_pci_epf *pci_epf;
>> +    void __iomem *ioaddr;
>> +    ktime_t timeout;
>> +    bool timedout;
>> +    int ret = 0;
>> +    u8 status;
>> +
>> +    pci_epf = to_virtio_pci_epf(vp_dev);
>> +    ioaddr = vp_dev->ioaddr;
>> +
>> +    mutex_lock(&pci_epf->lock);
>> +    writeb(command, ioaddr + HOST_CMD);
>> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
>> +    while (1) {
>> +        timedout = ktime_after(ktime_get(), timeout);
>> +        status = readb(ioaddr + HOST_CMD_STATUS);
>> +
>>
>> Several questions:
>>
>> - It's not clear to me how the synchronization is done between the RC
>> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.
> The HOST_CMD (commands sent to the EP) is serialized by using mutex.
> Once the EP reads the command, it resets the value in HOST_CMD. So
> HOST_CMD is less likely an issue.


Here's my understanding of the protocol:

1) RC write to HOST_CMD
2) RC wait for HOST_CMD_STATUS to be HOST_CMD_STATUS_OKAY

It looks to me what EP should do is

1) EP reset HOST_CMD after reading new command

And it looks to me EP should also reset HOST_CMD_STATUS here?

(I thought there should be patch to handle stuffs like this but I didn't
find it in this series)


>
> A sufficiently large time is given for the EP to complete it's operation
> (1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
> expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
> be case where EP updates HOST_CMD_STATUS after RC writes
> HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
> failure and error-ed out.
>
>> If you still want to introduce a new transport, a virtio spec patch
>> would be helpful for us to understand the device API.
> Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?


Yes.


>> - You have you vendor specific layout (according to
>> virtio_pci_epb_table()), so I guess you it's better to have a vendor
>> specific vDPA driver instead
> Okay, with vDPA, we are free to define our own layouts.


Right, but vDPA have other requirements. E.g it requires the device have
the ability to save/restore the state (e.g the last_avail_idx).

So it actually depends on what you want. If you don't care about
userspace drivers and want to have a standard transport, you can still
go virtio.


>> - The advantage of vendor specific vDPA driver is that it can 1) have
>> less codes 2) support userspace drivers through vhost-vDPA (instead of
>> inventing new APIs since we can't use vfio-pci here).
> I see there's an additional level of indirection from virtio to vDPA and
> probably no need for spec update but don't exactly see how it'll reduce
> code.


AFAIK you don't need to implement your own setup_vq and del_vq.


>
> For 2, Isn't vhost-vdpa supposed to run on virtio backend?


Not currently, vDPA is a superset of virtio (e.g it support virtqueue
state save/restore). This it should be possible in the future probably.


>
> From a high level, I think I should be able to use vDPA for
> virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
> ([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
> functionality).


I think it's your call. If you want

1) a well-defined standard virtio transport
2) willing to finalize d and maintain the spec
3) doesn't care about userspace drivers

You can go with virtio, otherwise vDPA.

Thanks


>
> Thanks
> Kishon
>

2020-09-18 04:09:36

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication


On 2020/9/16 下午7:47, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 16/09/20 8:40 am, Jason Wang wrote:
>> On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 15/09/20 1:48 pm, Jason Wang wrote:
>>>> Hi Kishon:
>>>>
>>>> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>>>>> Then you need something that is functional equivalent to virtio PCI
>>>>>> which is actually the concept of vDPA (e.g vDPA provides
>>>>>> alternatives if
>>>>>> the queue_sel is hard in the EP implementation).
>>>>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>>>>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>>>>> the VHOST driver to configure VHOST device).
>>>>>
>>>>> struct vdpa_config_ops {
>>>>>      /* Virtqueue ops */
>>>>>      int (*set_vq_address)(struct vdpa_device *vdev,
>>>>>                    u16 idx, u64 desc_area, u64 driver_area,
>>>>>                    u64 device_area);
>>>>>      void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>>>>      void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>>>>      void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>>>>                struct vdpa_callback *cb);
>>>>>      void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool
>>>>> ready);
>>>>>      bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>>>>      int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>>>                  const struct vdpa_vq_state *state);
>>>>>      int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>>>                  struct vdpa_vq_state *state);
>>>>>      struct vdpa_notification_area
>>>>>      (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>>>>      /* vq irq is not expected to be changed once DRIVER_OK is set */
>>>>>      int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>>>>
>>>>>      /* Device ops */
>>>>>      u32 (*get_vq_align)(struct vdpa_device *vdev);
>>>>>      u64 (*get_features)(struct vdpa_device *vdev);
>>>>>      int (*set_features)(struct vdpa_device *vdev, u64 features);
>>>>>      void (*set_config_cb)(struct vdpa_device *vdev,
>>>>>                    struct vdpa_callback *cb);
>>>>>      u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>>>>      u32 (*get_device_id)(struct vdpa_device *vdev);
>>>>>      u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>>>>      u8 (*get_status)(struct vdpa_device *vdev);
>>>>>      void (*set_status)(struct vdpa_device *vdev, u8 status);
>>>>>      void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>>>>                 void *buf, unsigned int len);
>>>>>      void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>>>>                 const void *buf, unsigned int len);
>>>>>      u32 (*get_generation)(struct vdpa_device *vdev);
>>>>>
>>>>>      /* DMA ops */
>>>>>      int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb
>>>>> *iotlb);
>>>>>      int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>>>>                 u64 pa, u32 perm);
>>>>>      int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>>>>
>>>>>      /* Free device resources */
>>>>>      void (*free)(struct vdpa_device *vdev);
>>>>> };
>>>>>
>>>>> +struct vhost_config_ops {
>>>>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>>>>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>>>>> +              vhost_vq_callback_t *callbacks[],
>>>>> +              const char * const names[]);
>>>>> +    void (*del_vqs)(struct vhost_dev *vdev);
>>>>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>>>>> int len);
>>>>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>>>>> len);
>>>>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>>>>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>>>>> +    u8 (*get_status)(struct vhost_dev *vdev);
>>>>> +};
>>>>> +
>>>>> struct virtio_config_ops
>>>>> I think there's some overlap here and some of the ops tries to do the
>>>>> same thing.
>>>>>
>>>>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>>>>> [create_vqs() introduced in struct vhost_config_ops provides
>>>>> complimentary functionality to (*find_vqs)() in struct
>>>>> virtio_config_ops. It seemingly encapsulates the functionality of
>>>>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>>>>
>>>>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>>>>> set_vq_address() directly provides the virtqueue address to the vdpa
>>>>> device but create_vqs() only provides the parameters of the virtqueue
>>>>> (like the number of virtqueues, number of buffers) but does not
>>>>> directly
>>>>> provide the address. IMO the backend client drivers (like net or vhost)
>>>>> shouldn't/cannot by itself know how to access the vring created on
>>>>> virtio front-end. The vdpa device/vhost device should have logic for
>>>>> that. That will help the client drivers to work with different types of
>>>>> vdpa device/vhost device and can access the vring created by virtio
>>>>> irrespective of whether the vring can be accessed via mmio or kernel
>>>>> space or user space.
>>>>>
>>>>> I think vdpa always works with client drivers in userspace and
>>>>> providing
>>>>> userspace address for vring.
>>>> Sorry for being unclear. What I meant is not replacing vDPA with the
>>>> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
>>>> with vDPA in:
>>> Okay, so the virtio back-end still use vhost and front end should use
>>> vDPA. I see. So the host side PCI driver for EPF should populate
>>> vdpa_config_ops and invoke vdpa_register_device().
>>
>> Yes.
>>
>>
>>>> My question is basically for the part of virtio_pci_epf_send_command(),
>>>> so it looks to me you have a vendor specific API to replace the
>>>> virtio-pci layout of the BAR:
>>> Even when we use vDPA, we have to use some sort of
>>> virtio_pci_epf_send_command() to communicate with virtio backend right?
>>
>> Right.
>>
>>
>>> Right, the layout is slightly different from the standard layout.
>>>
>>> This is the layout
>>> struct epf_vhost_reg_queue {
>>>          u8 cmd;
>>>          u8 cmd_status;
>>>          u16 status;
>>>          u16 num_buffers;
>>>          u16 msix_vector;
>>>          u64 queue_addr;
>>
>> What's the meaning of queue_addr here?
> Using queue_addr, the virtio front-end communicates the address of the
> allocated memory for virtqueue to the virtio back-end.
>> Does not mean the device expects a contiguous memory for avail/desc/used
>> ring?
> It's contiguous memory. Isn't this similar to other virtio transport
> (both PCI legacy and modern interface)?.


That's only for legacy device, for modern device we don't have such
restriction.


>>
>>> } __packed;
>>>
>>> struct epf_vhost_reg {
>>>          u64 host_features;
>>>          u64 guest_features;
>>>          u16 msix_config;
>>>          u16 num_queues;
>>>          u8 device_status;
>>>          u8 config_generation;
>>>          u32 isr;
>>>          u8 cmd;
>>>          u8 cmd_status;
>>>          struct epf_vhost_reg_queue vq[MAX_VQS];
>>> } __packed;
>>>> +static int virtio_pci_epf_send_command(struct virtio_pci_device
>>>> *vp_dev,
>>>> +                       u32 command)
>>>> +{
>>>> +    struct virtio_pci_epf *pci_epf;
>>>> +    void __iomem *ioaddr;
>>>> +    ktime_t timeout;
>>>> +    bool timedout;
>>>> +    int ret = 0;
>>>> +    u8 status;
>>>> +
>>>> +    pci_epf = to_virtio_pci_epf(vp_dev);
>>>> +    ioaddr = vp_dev->ioaddr;
>>>> +
>>>> +    mutex_lock(&pci_epf->lock);
>>>> +    writeb(command, ioaddr + HOST_CMD);
>>>> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
>>>> +    while (1) {
>>>> +        timedout = ktime_after(ktime_get(), timeout);
>>>> +        status = readb(ioaddr + HOST_CMD_STATUS);
>>>> +
>>>>
>>>> Several questions:
>>>>
>>>> - It's not clear to me how the synchronization is done between the RC
>>>> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.
>>> The HOST_CMD (commands sent to the EP) is serialized by using mutex.
>>> Once the EP reads the command, it resets the value in HOST_CMD. So
>>> HOST_CMD is less likely an issue.
>>
>> Here's my understanding of the protocol:
>>
>> 1) RC write to HOST_CMD
>> 2) RC wait for HOST_CMD_STATUS to be HOST_CMD_STATUS_OKAY
> That's right!
>> It looks to me what EP should do is
>>
>> 1) EP reset HOST_CMD after reading new command
> That's right! It does.
>> And it looks to me EP should also reset HOST_CMD_STATUS here?
> yeah, that would require RC to send another command to reset the status.
> Didn't see it required in the normal scenario but good to add this.
>> (I thought there should be patch to handle stuffs like this but I didn't
>> find it in this series)
> This is added in [RFC PATCH 19/22] PCI: endpoint: Add EP function driver
> to provide VHOST interface
>
> pci_epf_vhost_cmd_handler() gets commands from RC using "reg->cmd;". On
> the EP side, it is local memory access (mapped to BAR memory exposed to
> the host) and hence accessed using structure member access.


Thanks for the pointer, will have a look at and I think this part need
to be carefully designed and the key to the success of the epf transport.