2021-01-28 14:44:55

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 00/10] vdpa: add vdpa simulator for block device

v1: https://lore.kernel.org/lkml/[email protected]/

This series is the second part of the v1 linked above. The first part with
refactoring of vdpa_sim has already been merged.

The patches are based on Max Gurtovoy's work and extend the block simulator to
have a ramdisk behaviour.

As mentioned in the v1 there was 2 issues and I fixed them in this series:
1. The identical mapping in the IOMMU used until now in vdpa_sim created issues
when mapping different virtual pages with the same physical address.
Fixed by patch "vdpa_sim: use iova module to allocate IOVA addresses"

2. There was a race accessing the IOMMU between the vdpasim_blk_work() and the
device driver that map/unmap DMA regions. Fixed by patch "vringh: add
'iotlb_lock' to synchronize iotlb accesses"

Since this series is still a RFC, I used the Xie's patch as is to allow
vhost-vdpa to use block devices, but I'll rebase when he splits it into
multiple patches.

The series also includes small fixes for vdpa_sim that I discovered while
implementing the block simulator.

Thanks for your feedback,
Stefano

Max Gurtovoy (1):
vdpa: add vdpa simulator for block device

Stefano Garzarella (8):
vdpa_sim: use iova module to allocate IOVA addresses
vringh: add 'iotlb_lock' to synchronize iotlb accesses
vringh: reset kiov 'consumed' field in __vringh_iov()
vringh: implement vringh_kiov_advance()
vringh: add vringh_kiov_length() helper
vdpa_sim: cleanup kiovs in vdpasim_free()
vdpa_sim_blk: implement ramdisk behaviour
vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID

Xie Yongji (1):
vdpa: Remove the restriction that only supports virtio-net devices

drivers/vdpa/vdpa_sim/vdpa_sim.h | 2 +
include/linux/vringh.h | 19 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 118 +++++++----
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 288 +++++++++++++++++++++++++++
drivers/vhost/vdpa.c | 28 +--
drivers/vhost/vringh.c | 54 +++--
drivers/vdpa/Kconfig | 8 +
drivers/vdpa/vdpa_sim/Makefile | 1 +
8 files changed, 433 insertions(+), 85 deletions(-)
create mode 100644 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c

--
2.29.2


2021-01-28 14:44:58

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 03/10] vringh: reset kiov 'consumed' field in __vringh_iov()

__vringh_iov() overwrites the contents of riov and wiov, in fact it
resets the 'i' and 'used' fields, but also the consumed field should
be reset to avoid an inconsistent state.

Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vhost/vringh.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index f68122705719..bee63d68201a 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -290,9 +290,9 @@ __vringh_iov(struct vringh *vrh, u16 i,
return -EINVAL;

if (riov)
- riov->i = riov->used = 0;
+ riov->i = riov->used = riov->consumed = 0;
if (wiov)
- wiov->i = wiov->used = 0;
+ wiov->i = wiov->used = wiov->consumed = 0;

for (;;) {
void *addr;
--
2.29.2

2021-01-28 14:47:10

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 06/10] vdpa_sim: cleanup kiovs in vdpasim_free()

vringh_getdesc_iotlb() allocates memory to store the kvec, that
is freed with vringh_kiov_cleanup().

vringh_getdesc_iotlb() is able to reuse a kvec previously allocated,
so in order to avoid to allocate the kvec for each request, we are
not calling vringh_kiov_cleanup() when we finished to handle a
request, but we should call it when we free the entire device.

Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vdpa/vdpa_sim/vdpa_sim.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 53238989713d..a7aeb5d01c3e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -562,8 +562,15 @@ static int vdpasim_dma_unmap(struct vdpa_device *vdpa, u64 iova, u64 size)
static void vdpasim_free(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+ int i;

cancel_work_sync(&vdpasim->work);
+
+ for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
+ vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
+ vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov);
+ }
+
put_iova_domain(&vdpasim->iova);
iova_cache_put();
kvfree(vdpasim->buffer);
--
2.29.2

2021-01-28 14:47:51

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 08/10] vdpa: add vdpa simulator for block device

From: Max Gurtovoy <[email protected]>

This will allow running vDPA for virtio block protocol.

Signed-off-by: Max Gurtovoy <[email protected]>
[sgarzare: various cleanups/fixes]
Signed-off-by: Stefano Garzarella <[email protected]>
---
v2:
- rebased on top of other changes (dev_attr, get_config(), notify(), etc.)
- memset to 0 the config structure in vdpasim_blk_get_config()
- used vdpasim pointer in vdpasim_blk_get_config()

v1:
- Removed unused headers
- Used cpu_to_vdpasim*() to store config fields
- Replaced 'select VDPA_SIM' with 'depends on VDPA_SIM' since selected
option can not depend on other [Jason]
- Start with a single queue for now [Jason]
- Add comments to memory barriers
---
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 145 +++++++++++++++++++++++++++
drivers/vdpa/Kconfig | 7 ++
drivers/vdpa/vdpa_sim/Makefile | 1 +
3 files changed, 153 insertions(+)
create mode 100644 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
new file mode 100644
index 000000000000..999f9ca0b628
--- /dev/null
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDPA simulator for block device.
+ *
+ * Copyright (c) 2020, Mellanox Technologies. All rights reserved.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/vringh.h>
+#include <linux/vdpa.h>
+#include <uapi/linux/virtio_blk.h>
+
+#include "vdpa_sim.h"
+
+#define DRV_VERSION "0.1"
+#define DRV_AUTHOR "Max Gurtovoy <[email protected]>"
+#define DRV_DESC "vDPA Device Simulator for block device"
+#define DRV_LICENSE "GPL v2"
+
+#define VDPASIM_BLK_FEATURES (VDPASIM_FEATURES | \
+ (1ULL << VIRTIO_BLK_F_SIZE_MAX) | \
+ (1ULL << VIRTIO_BLK_F_SEG_MAX) | \
+ (1ULL << VIRTIO_BLK_F_BLK_SIZE) | \
+ (1ULL << VIRTIO_BLK_F_TOPOLOGY) | \
+ (1ULL << VIRTIO_BLK_F_MQ))
+
+#define VDPASIM_BLK_CAPACITY 0x40000
+#define VDPASIM_BLK_SIZE_MAX 0x1000
+#define VDPASIM_BLK_SEG_MAX 32
+#define VDPASIM_BLK_VQ_NUM 1
+
+static struct vdpasim *vdpasim_blk_dev;
+
+static void vdpasim_blk_work(struct work_struct *work)
+{
+ struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
+ u8 status = VIRTIO_BLK_S_OK;
+ int i;
+
+ spin_lock(&vdpasim->lock);
+
+ if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
+ goto out;
+
+ for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
+ struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
+
+ if (!vq->ready)
+ continue;
+
+ while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov,
+ &vq->in_iov, &vq->head,
+ GFP_ATOMIC) > 0) {
+ int write;
+
+ vq->in_iov.i = vq->in_iov.used - 1;
+ write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
+ &status, 1);
+ if (write <= 0)
+ break;
+
+ /* Make sure data is wrote before advancing index */
+ smp_wmb();
+
+ vringh_complete_iotlb(&vq->vring, vq->head, write);
+
+ /* Make sure used is visible before rasing the interrupt. */
+ smp_wmb();
+
+ local_bh_disable();
+ if (vringh_need_notify_iotlb(&vq->vring) > 0)
+ vringh_notify(&vq->vring);
+ local_bh_enable();
+ }
+ }
+out:
+ spin_unlock(&vdpasim->lock);
+}
+
+static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
+{
+ struct virtio_blk_config *blk_config =
+ (struct virtio_blk_config *)config;
+
+ memset(config, 0, sizeof(struct virtio_blk_config));
+
+ blk_config->capacity = cpu_to_vdpasim64(vdpasim, VDPASIM_BLK_CAPACITY);
+ blk_config->size_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SIZE_MAX);
+ blk_config->seg_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SEG_MAX);
+ blk_config->num_queues = cpu_to_vdpasim16(vdpasim, VDPASIM_BLK_VQ_NUM);
+ blk_config->min_io_size = cpu_to_vdpasim16(vdpasim, 1);
+ blk_config->opt_io_size = cpu_to_vdpasim32(vdpasim, 1);
+ blk_config->blk_size = cpu_to_vdpasim32(vdpasim, SECTOR_SIZE);
+}
+
+static int __init vdpasim_blk_init(void)
+{
+ struct vdpasim_dev_attr dev_attr = {};
+ int ret;
+
+ dev_attr.id = VIRTIO_ID_BLOCK;
+ dev_attr.supported_features = VDPASIM_BLK_FEATURES;
+ dev_attr.nvqs = VDPASIM_BLK_VQ_NUM;
+ dev_attr.config_size = sizeof(struct virtio_blk_config);
+ dev_attr.get_config = vdpasim_blk_get_config;
+ dev_attr.work_fn = vdpasim_blk_work;
+ dev_attr.buffer_size = PAGE_SIZE;
+
+ vdpasim_blk_dev = vdpasim_create(&dev_attr);
+ if (IS_ERR(vdpasim_blk_dev)) {
+ ret = PTR_ERR(vdpasim_blk_dev);
+ goto out;
+ }
+
+ ret = vdpa_register_device(&vdpasim_blk_dev->vdpa);
+ if (ret)
+ goto put_dev;
+
+ return 0;
+
+put_dev:
+ put_device(&vdpasim_blk_dev->vdpa.dev);
+out:
+ return ret;
+}
+
+static void __exit vdpasim_blk_exit(void)
+{
+ struct vdpa_device *vdpa = &vdpasim_blk_dev->vdpa;
+
+ vdpa_unregister_device(vdpa);
+}
+
+module_init(vdpasim_blk_init)
+module_exit(vdpasim_blk_exit)
+
+MODULE_VERSION(DRV_VERSION);
+MODULE_LICENSE(DRV_LICENSE);
+MODULE_AUTHOR(DRV_AUTHOR);
+MODULE_DESCRIPTION(DRV_DESC);
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 21a23500f430..b8bd92cf04f9 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -26,6 +26,13 @@ config VDPA_SIM_NET
help
vDPA networking device simulator which loops TX traffic back to RX.

+config VDPA_SIM_BLOCK
+ tristate "vDPA simulator for block device"
+ depends on VDPA_SIM
+ help
+ vDPA block device simulator which terminates IO request in a
+ memory buffer.
+
config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/vdpa_sim/Makefile b/drivers/vdpa/vdpa_sim/Makefile
index 79d4536d347e..d458103302f2 100644
--- a/drivers/vdpa/vdpa_sim/Makefile
+++ b/drivers/vdpa/vdpa_sim/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_VDPA_SIM) += vdpa_sim.o
obj-$(CONFIG_VDPA_SIM_NET) += vdpa_sim_net.o
+obj-$(CONFIG_VDPA_SIM_BLOCK) += vdpa_sim_blk.o
--
2.29.2

2021-01-28 14:48:05

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 07/10] vdpa: Remove the restriction that only supports virtio-net devices

From: Xie Yongji <[email protected]>

With VDUSE, we should be able to support all kinds of virtio devices.

Signed-off-by: Xie Yongji <[email protected]>
---
drivers/vhost/vdpa.c | 28 ++--------------------------
1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ef688c8c0e0e..28624cbfe6dd 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -185,26 +185,6 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
return 0;
}

-static int vhost_vdpa_config_validate(struct vhost_vdpa *v,
- struct vhost_vdpa_config *c)
-{
- long size = 0;
-
- switch (v->virtio_id) {
- case VIRTIO_ID_NET:
- size = sizeof(struct virtio_net_config);
- break;
- }
-
- if (c->len == 0)
- return -EINVAL;
-
- if (c->len > size - c->off)
- return -E2BIG;
-
- return 0;
-}
-
static long vhost_vdpa_get_config(struct vhost_vdpa *v,
struct vhost_vdpa_config __user *c)
{
@@ -215,7 +195,7 @@ static long vhost_vdpa_get_config(struct vhost_vdpa *v,

if (copy_from_user(&config, c, size))
return -EFAULT;
- if (vhost_vdpa_config_validate(v, &config))
+ if (config.len == 0)
return -EINVAL;
buf = kvzalloc(config.len, GFP_KERNEL);
if (!buf)
@@ -243,7 +223,7 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v,

if (copy_from_user(&config, c, size))
return -EFAULT;
- if (vhost_vdpa_config_validate(v, &config))
+ if (config.len == 0)
return -EINVAL;

buf = vmemdup_user(c->buf, config.len);
@@ -1021,10 +1001,6 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
int minor;
int r;

- /* Currently, we only accept the network devices. */
- if (ops->get_device_id(vdpa) != VIRTIO_ID_NET)
- return -ENOTSUPP;
-
v = kzalloc(sizeof(*v), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
if (!v)
return -ENOMEM;
--
2.29.2

2021-01-28 14:48:27

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 04/10] vringh: implement vringh_kiov_advance()

In some cases, it may be useful to provide a way to skip a number
of bytes in a vringh_kiov.

Let's implement vringh_kiov_advance() for this purpose, reusing the
code from vringh_iov_xfer().
We replace that code calling the new vringh_kiov_advance().

Signed-off-by: Stefano Garzarella <[email protected]>
---
include/linux/vringh.h | 2 ++
drivers/vhost/vringh.c | 41 +++++++++++++++++++++++++++++------------
2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 9c077863c8f6..755211ebd195 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -199,6 +199,8 @@ static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov)
kiov->iov = NULL;
}

+void vringh_kiov_advance(struct vringh_kiov *kiov, size_t len);
+
int vringh_getdesc_kern(struct vringh *vrh,
struct vringh_kiov *riov,
struct vringh_kiov *wiov,
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index bee63d68201a..4d800e4f31ca 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -75,6 +75,34 @@ static inline int __vringh_get_head(const struct vringh *vrh,
return head;
}

+/**
+ * vringh_kiov_advance - skip bytes from vring_kiov
+ * @iov: an iov passed to vringh_getdesc_*() (updated as we consume)
+ * @len: the maximum length to advance
+ */
+void vringh_kiov_advance(struct vringh_kiov *iov, size_t len)
+{
+ while (len && iov->i < iov->used) {
+ size_t partlen = min(iov->iov[iov->i].iov_len, len);
+
+ iov->consumed += partlen;
+ iov->iov[iov->i].iov_len -= partlen;
+ iov->iov[iov->i].iov_base += partlen;
+
+ if (!iov->iov[iov->i].iov_len) {
+ /* Fix up old iov element then increment. */
+ iov->iov[iov->i].iov_len = iov->consumed;
+ iov->iov[iov->i].iov_base -= iov->consumed;
+
+ iov->consumed = 0;
+ iov->i++;
+ }
+
+ len -= partlen;
+ }
+}
+EXPORT_SYMBOL(vringh_kiov_advance);
+
/* Copy some bytes to/from the iovec. Returns num copied. */
static inline ssize_t vringh_iov_xfer(struct vringh *vrh,
struct vringh_kiov *iov,
@@ -95,19 +123,8 @@ static inline ssize_t vringh_iov_xfer(struct vringh *vrh,
done += partlen;
len -= partlen;
ptr += partlen;
- iov->consumed += partlen;
- iov->iov[iov->i].iov_len -= partlen;
- iov->iov[iov->i].iov_base += partlen;

- if (!iov->iov[iov->i].iov_len) {
- /* Fix up old iov element then increment. */
- iov->iov[iov->i].iov_len = iov->consumed;
- iov->iov[iov->i].iov_base -= iov->consumed;
-
-
- iov->consumed = 0;
- iov->i++;
- }
+ vringh_kiov_advance(iov, partlen);
}
return done;
}
--
2.29.2

2021-01-28 14:49:23

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 09/10] vdpa_sim_blk: implement ramdisk behaviour

The previous implementation wrote only the status of each request.
This patch implements a more accurate block device simulator,
providing a ramdisk-like behavior.

Signed-off-by: Stefano Garzarella <[email protected]>
---
v2:
- used %zd %zx to print size_t and ssize_t variables in dev_err()
- removed unnecessary new line [Jason]
- moved VIRTIO_BLK_T_GET_ID in another patch [Jason]
- used push/pull instead of write/read terminology
- added vdpasim_blk_check_range() to avoid overflows [Stefan]
- use vdpasim*_to_cpu instead of le*_to_cpu
- used vringh_kiov_length() helper [Jason]
---
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 164 ++++++++++++++++++++++++---
1 file changed, 146 insertions(+), 18 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index 999f9ca0b628..fc47e8320358 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -3,6 +3,7 @@
* VDPA simulator for block device.
*
* Copyright (c) 2020, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2021, Red Hat Inc. All rights reserved.
*
*/

@@ -13,6 +14,7 @@
#include <linux/sched.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
+#include <linux/blkdev.h>
#include <uapi/linux/virtio_blk.h>

#include "vdpa_sim.h"
@@ -36,10 +38,151 @@

static struct vdpasim *vdpasim_blk_dev;

+static bool vdpasim_blk_check_range(u64 start_sector, size_t range_size)
+{
+ u64 range_sectors = range_size >> SECTOR_SHIFT;
+
+ if (range_size > VDPASIM_BLK_SIZE_MAX * VDPASIM_BLK_SEG_MAX)
+ return false;
+
+ if (start_sector > VDPASIM_BLK_CAPACITY)
+ return false;
+
+ if (range_sectors > VDPASIM_BLK_CAPACITY - start_sector)
+ return false;
+
+ return true;
+}
+
+/* Returns 'true' if the request is handled (with or without an I/O error)
+ * and the status is correctly written in the last byte of the 'in iov',
+ * 'false' otherwise.
+ */
+static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
+ struct vdpasim_virtqueue *vq)
+{
+ size_t pushed = 0, to_pull, to_push;
+ struct virtio_blk_outhdr hdr;
+ ssize_t bytes;
+ loff_t offset;
+ u64 sector;
+ u8 status;
+ u32 type;
+ int ret;
+
+ ret = vringh_getdesc_iotlb(&vq->vring, &vq->out_iov, &vq->in_iov,
+ &vq->head, GFP_ATOMIC);
+ if (ret != 1)
+ return false;
+
+ if (vq->out_iov.used < 1 || vq->in_iov.used < 1) {
+ dev_err(&vdpasim->vdpa.dev, "missing headers - out_iov: %u in_iov %u\n",
+ vq->out_iov.used, vq->in_iov.used);
+ return false;
+ }
+
+ if (vq->in_iov.iov[vq->in_iov.used - 1].iov_len < 1) {
+ dev_err(&vdpasim->vdpa.dev, "request in header too short\n");
+ return false;
+ }
+
+ /* The last byte is the status and we checked if the last iov has
+ * enough room for it.
+ */
+ to_push = vringh_kiov_length(&vq->in_iov) - 1;
+
+ to_pull = vringh_kiov_length(&vq->out_iov);
+
+ bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, &hdr,
+ sizeof(hdr));
+ if (bytes != sizeof(hdr)) {
+ dev_err(&vdpasim->vdpa.dev, "request out header too short\n");
+ return false;
+ }
+
+ to_pull -= bytes;
+
+ type = vdpasim32_to_cpu(vdpasim, hdr.type);
+ sector = vdpasim64_to_cpu(vdpasim, hdr.sector);
+ offset = sector << SECTOR_SHIFT;
+ status = VIRTIO_BLK_S_OK;
+
+ switch (type) {
+ case VIRTIO_BLK_T_IN:
+ if (!vdpasim_blk_check_range(sector, to_push)) {
+ dev_err(&vdpasim->vdpa.dev,
+ "reading over the capacity - offset: 0x%llx len: 0x%zx\n",
+ offset, to_push);
+ status = VIRTIO_BLK_S_IOERR;
+ break;
+ }
+
+ bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
+ vdpasim->buffer + offset,
+ to_push);
+ if (bytes < 0) {
+ dev_err(&vdpasim->vdpa.dev,
+ "vringh_iov_push_iotlb() error: %zd offset: 0x%llx len: 0x%zx\n",
+ bytes, offset, to_push);
+ status = VIRTIO_BLK_S_IOERR;
+ break;
+ }
+
+ pushed += bytes;
+ break;
+
+ case VIRTIO_BLK_T_OUT:
+ if (!vdpasim_blk_check_range(sector, to_pull)) {
+ dev_err(&vdpasim->vdpa.dev,
+ "writing over the capacity - offset: 0x%llx len: 0x%zx\n",
+ offset, to_pull);
+ status = VIRTIO_BLK_S_IOERR;
+ break;
+ }
+
+ bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov,
+ vdpasim->buffer + offset,
+ to_pull);
+ if (bytes < 0) {
+ dev_err(&vdpasim->vdpa.dev,
+ "vringh_iov_pull_iotlb() error: %zd offset: 0x%llx len: 0x%zx\n",
+ bytes, offset, to_pull);
+ status = VIRTIO_BLK_S_IOERR;
+ break;
+ }
+ break;
+
+ default:
+ dev_warn(&vdpasim->vdpa.dev,
+ "Unsupported request type %d\n", type);
+ status = VIRTIO_BLK_S_IOERR;
+ break;
+ }
+
+ /* If some operations fail, we need to skip the remaining bytes
+ * to put the status in the last byte
+ */
+ if (to_push - pushed > 0)
+ vringh_kiov_advance(&vq->in_iov, to_push - pushed);
+
+ /* Last byte is the status */
+ bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, &status, 1);
+ if (bytes != 1)
+ return false;
+
+ pushed += bytes;
+
+ /* Make sure data is wrote before advancing index */
+ smp_wmb();
+
+ vringh_complete_iotlb(&vq->vring, vq->head, pushed);
+
+ return true;
+}
+
static void vdpasim_blk_work(struct work_struct *work)
{
struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
- u8 status = VIRTIO_BLK_S_OK;
int i;

spin_lock(&vdpasim->lock);
@@ -53,22 +196,7 @@ static void vdpasim_blk_work(struct work_struct *work)
if (!vq->ready)
continue;

- while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov,
- &vq->in_iov, &vq->head,
- GFP_ATOMIC) > 0) {
- int write;
-
- vq->in_iov.i = vq->in_iov.used - 1;
- write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
- &status, 1);
- if (write <= 0)
- break;
-
- /* Make sure data is wrote before advancing index */
- smp_wmb();
-
- vringh_complete_iotlb(&vq->vring, vq->head, write);
-
+ while (vdpasim_blk_handle_req(vdpasim, vq)) {
/* Make sure used is visible before rasing the interrupt. */
smp_wmb();

@@ -109,7 +237,7 @@ static int __init vdpasim_blk_init(void)
dev_attr.config_size = sizeof(struct virtio_blk_config);
dev_attr.get_config = vdpasim_blk_get_config;
dev_attr.work_fn = vdpasim_blk_work;
- dev_attr.buffer_size = PAGE_SIZE;
+ dev_attr.buffer_size = VDPASIM_BLK_CAPACITY << SECTOR_SHIFT;

vdpasim_blk_dev = vdpasim_create(&dev_attr);
if (IS_ERR(vdpasim_blk_dev)) {
--
2.29.2

2021-01-28 14:51:19

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 10/10] vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID

Handle VIRTIO_BLK_T_GET_ID request, always answering the
"vdpa_blk_sim" string.

Signed-off-by: Stefano Garzarella <[email protected]>
---
v2:
- made 'vdpasim_blk_id' static [Jason]
---
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index fc47e8320358..a3f8afad8d14 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -37,6 +37,7 @@
#define VDPASIM_BLK_VQ_NUM 1

static struct vdpasim *vdpasim_blk_dev;
+static char vdpasim_blk_id[VIRTIO_BLK_ID_BYTES] = "vdpa_blk_sim";

static bool vdpasim_blk_check_range(u64 start_sector, size_t range_size)
{
@@ -152,6 +153,20 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
}
break;

+ case VIRTIO_BLK_T_GET_ID:
+ bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
+ vdpasim_blk_id,
+ VIRTIO_BLK_ID_BYTES);
+ if (bytes < 0) {
+ dev_err(&vdpasim->vdpa.dev,
+ "vringh_iov_push_iotlb() error: %zd\n", bytes);
+ status = VIRTIO_BLK_S_IOERR;
+ break;
+ }
+
+ pushed += bytes;
+ break;
+
default:
dev_warn(&vdpasim->vdpa.dev,
"Unsupported request type %d\n", type);
--
2.29.2

2021-01-28 14:51:29

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 05/10] vringh: add vringh_kiov_length() helper

This new helper returns the total number of bytes covered by
a vringh_kiov.

Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---
include/linux/vringh.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 755211ebd195..84db7b8f912f 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -199,6 +199,17 @@ static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov)
kiov->iov = NULL;
}

+static inline size_t vringh_kiov_length(struct vringh_kiov *kiov)
+{
+ size_t len = 0;
+ int i;
+
+ for (i = kiov->i; i < kiov->used; i++)
+ len += kiov->iov[i].iov_len;
+
+ return len;
+}
+
void vringh_kiov_advance(struct vringh_kiov *kiov, size_t len);

int vringh_getdesc_kern(struct vringh *vrh,
--
2.29.2

2021-01-28 14:53:01

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 02/10] vringh: add 'iotlb_lock' to synchronize iotlb accesses

Usually iotlb accesses are synchronized with a spinlock.
Let's request it as a new parameter in vringh_set_iotlb() and
hold it when we navigate the iotlb in iotlb_translate() to avoid
race conditions with any new additions/deletions of ranges from
the ioltb.

Signed-off-by: Stefano Garzarella <[email protected]>
---
include/linux/vringh.h | 6 +++++-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++-
drivers/vhost/vringh.c | 9 ++++++++-
3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 59bd50f99291..9c077863c8f6 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -46,6 +46,9 @@ struct vringh {
/* IOTLB for this vring */
struct vhost_iotlb *iotlb;

+ /* spinlock to synchronize IOTLB accesses */
+ spinlock_t *iotlb_lock;
+
/* The function to call to notify the guest about added buffers */
void (*notify)(struct vringh *);
};
@@ -258,7 +261,8 @@ static inline __virtio64 cpu_to_vringh64(const struct vringh *vrh, u64 val)

#if IS_REACHABLE(CONFIG_VHOST_IOTLB)

-void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb);
+void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
+ spinlock_t *iotlb_lock);

int vringh_init_iotlb(struct vringh *vrh, u64 features,
unsigned int num, bool weak_barriers,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 2183a833fcf4..53238989713d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -284,7 +284,8 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr)
goto err_iommu;

for (i = 0; i < dev_attr->nvqs; i++)
- vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu);
+ vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu,
+ &vdpasim->iommu_lock);

ret = iova_cache_get();
if (ret)
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 85d85faba058..f68122705719 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1074,6 +1074,8 @@ static int iotlb_translate(const struct vringh *vrh,
int ret = 0;
u64 s = 0;

+ spin_lock(vrh->iotlb_lock);
+
while (len > s) {
u64 size, pa, pfn;

@@ -1103,6 +1105,8 @@ static int iotlb_translate(const struct vringh *vrh,
++ret;
}

+ spin_unlock(vrh->iotlb_lock);
+
return ret;
}

@@ -1262,10 +1266,13 @@ EXPORT_SYMBOL(vringh_init_iotlb);
* vringh_set_iotlb - initialize a vringh for a ring with IOTLB.
* @vrh: the vring
* @iotlb: iotlb associated with this vring
+ * @iotlb_lock: spinlock to synchronize the iotlb accesses
*/
-void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb)
+void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
+ spinlock_t *iotlb_lock)
{
vrh->iotlb = iotlb;
+ vrh->iotlb_lock = iotlb_lock;
}
EXPORT_SYMBOL(vringh_set_iotlb);

--
2.29.2

2021-01-28 14:53:52

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH RFC v2 01/10] vdpa_sim: use iova module to allocate IOVA addresses

The identical mapping used until now created issues when mapping
different virtual pages with the same physical address.
To solve this issue, we can use the iova module, to handle the IOVA
allocation.
For simplicity we use an IOVA allocator with byte granularity.

We add two new functions, vdpasim_map_range() and vdpasim_unmap_range(),
to handle the IOVA allocation and the registration into the IOMMU/IOTLB.
These functions are used by dma_map_ops callbacks.

Acked-by: Jason Wang <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---
v2:
- used ULONG_MAX instead of ~0UL [Jason]
- fixed typos in comment and patch description [Jason]
---
drivers/vdpa/vdpa_sim/vdpa_sim.h | 2 +
drivers/vdpa/vdpa_sim/vdpa_sim.c | 108 +++++++++++++++++++------------
drivers/vdpa/Kconfig | 1 +
3 files changed, 69 insertions(+), 42 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 6d75444f9948..cd58e888bcf3 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -6,6 +6,7 @@
#ifndef _VDPA_SIM_H
#define _VDPA_SIM_H

+#include <linux/iova.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
#include <linux/virtio_byteorder.h>
@@ -57,6 +58,7 @@ struct vdpasim {
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
+ struct iova_domain iova;
void *buffer;
u32 status;
u32 generation;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index d5942842432d..2183a833fcf4 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -17,6 +17,7 @@
#include <linux/vringh.h>
#include <linux/vdpa.h>
#include <linux/vhost_iotlb.h>
+#include <linux/iova.h>

#include "vdpa_sim.h"

@@ -128,30 +129,57 @@ static int dir_to_perm(enum dma_data_direction dir)
return perm;
}

+static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr,
+ size_t size, unsigned int perm)
+{
+ struct iova *iova;
+ dma_addr_t dma_addr;
+ int ret;
+
+ /* We set the limit_pfn to the maximum (ULONG_MAX - 1) */
+ iova = alloc_iova(&vdpasim->iova, size, ULONG_MAX - 1, true);
+ if (!iova)
+ return DMA_MAPPING_ERROR;
+
+ dma_addr = iova_dma_addr(&vdpasim->iova, iova);
+
+ spin_lock(&vdpasim->iommu_lock);
+ ret = vhost_iotlb_add_range(vdpasim->iommu, (u64)dma_addr,
+ (u64)dma_addr + size - 1, (u64)paddr, perm);
+ spin_unlock(&vdpasim->iommu_lock);
+
+ if (ret) {
+ __free_iova(&vdpasim->iova, iova);
+ return DMA_MAPPING_ERROR;
+ }
+
+ return dma_addr;
+}
+
+static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr,
+ size_t size)
+{
+ spin_lock(&vdpasim->iommu_lock);
+ vhost_iotlb_del_range(vdpasim->iommu, (u64)dma_addr,
+ (u64)dma_addr + size - 1);
+ spin_unlock(&vdpasim->iommu_lock);
+
+ free_iova(&vdpasim->iova, iova_pfn(&vdpasim->iova, dma_addr));
+}
+
static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size,
enum dma_data_direction dir,
unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
- struct vhost_iotlb *iommu = vdpasim->iommu;
- u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset;
- int ret, perm = dir_to_perm(dir);
+ phys_addr_t paddr = page_to_phys(page) + offset;
+ int perm = dir_to_perm(dir);

if (perm < 0)
return DMA_MAPPING_ERROR;

- /* For simplicity, use identical mapping to avoid e.g iova
- * allocator.
- */
- spin_lock(&vdpasim->iommu_lock);
- ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1,
- pa, dir_to_perm(dir));
- spin_unlock(&vdpasim->iommu_lock);
- if (ret)
- return DMA_MAPPING_ERROR;
-
- return (dma_addr_t)(pa);
+ return vdpasim_map_range(vdpasim, paddr, size, perm);
}

static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,
@@ -159,12 +187,8 @@ static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,
unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
- struct vhost_iotlb *iommu = vdpasim->iommu;

- spin_lock(&vdpasim->iommu_lock);
- vhost_iotlb_del_range(iommu, (u64)dma_addr,
- (u64)dma_addr + size - 1);
- spin_unlock(&vdpasim->iommu_lock);
+ vdpasim_unmap_range(vdpasim, dma_addr, size);
}

static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
@@ -172,27 +196,22 @@ static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
- struct vhost_iotlb *iommu = vdpasim->iommu;
- void *addr = kmalloc(size, flag);
- int ret;
+ phys_addr_t paddr;
+ void *addr;

- spin_lock(&vdpasim->iommu_lock);
+ addr = kmalloc(size, flag);
if (!addr) {
*dma_addr = DMA_MAPPING_ERROR;
- } else {
- u64 pa = virt_to_phys(addr);
-
- ret = vhost_iotlb_add_range(iommu, (u64)pa,
- (u64)pa + size - 1,
- pa, VHOST_MAP_RW);
- if (ret) {
- *dma_addr = DMA_MAPPING_ERROR;
- kfree(addr);
- addr = NULL;
- } else
- *dma_addr = (dma_addr_t)pa;
+ return NULL;
+ }
+
+ paddr = virt_to_phys(addr);
+
+ *dma_addr = vdpasim_map_range(vdpasim, paddr, size, VHOST_MAP_RW);
+ if (*dma_addr == DMA_MAPPING_ERROR) {
+ kfree(addr);
+ return NULL;
}
- spin_unlock(&vdpasim->iommu_lock);

return addr;
}
@@ -202,14 +221,10 @@ static void vdpasim_free_coherent(struct device *dev, size_t size,
unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
- struct vhost_iotlb *iommu = vdpasim->iommu;

- spin_lock(&vdpasim->iommu_lock);
- vhost_iotlb_del_range(iommu, (u64)dma_addr,
- (u64)dma_addr + size - 1);
- spin_unlock(&vdpasim->iommu_lock);
+ vdpasim_unmap_range(vdpasim, dma_addr, size);

- kfree(phys_to_virt((uintptr_t)dma_addr));
+ kfree(vaddr);
}

static const struct dma_map_ops vdpasim_dma_ops = {
@@ -271,6 +286,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr)
for (i = 0; i < dev_attr->nvqs; i++)
vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu);

+ ret = iova_cache_get();
+ if (ret)
+ goto err_iommu;
+
+ /* For simplicity we use an IOVA allocator with byte granularity */
+ init_iova_domain(&vdpasim->iova, 1, 0);
+
vdpasim->vdpa.dma_dev = dev;

return vdpasim;
@@ -541,6 +563,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

cancel_work_sync(&vdpasim->work);
+ put_iova_domain(&vdpasim->iova);
+ iova_cache_put();
kvfree(vdpasim->buffer);
if (vdpasim->iommu)
vhost_iotlb_free(vdpasim->iommu);
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index ffd1e098bfd2..21a23500f430 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -14,6 +14,7 @@ config VDPA_SIM
depends on RUNTIME_TESTING_MENU && HAS_DMA
select DMA_OPS
select VHOST_RING
+ select IOMMU_IOVA
help
Enable this module to support vDPA device simulators. These devices
are used for testing, prototyping and development of vDPA.
--
2.29.2

2021-01-29 07:46:19

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 02/10] vringh: add 'iotlb_lock' to synchronize iotlb accesses


On 2021/1/28 下午10:41, Stefano Garzarella wrote:
> Usually iotlb accesses are synchronized with a spinlock.
> Let's request it as a new parameter in vringh_set_iotlb() and
> hold it when we navigate the iotlb in iotlb_translate() to avoid
> race conditions with any new additions/deletions of ranges from
> the ioltb.


Patch looks fine but I wonder if this is the best approach comparing to
do locking by the caller.

Thanks


>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> include/linux/vringh.h | 6 +++++-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++-
> drivers/vhost/vringh.c | 9 ++++++++-
> 3 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 59bd50f99291..9c077863c8f6 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -46,6 +46,9 @@ struct vringh {
> /* IOTLB for this vring */
> struct vhost_iotlb *iotlb;
>
> + /* spinlock to synchronize IOTLB accesses */
> + spinlock_t *iotlb_lock;
> +
> /* The function to call to notify the guest about added buffers */
> void (*notify)(struct vringh *);
> };
> @@ -258,7 +261,8 @@ static inline __virtio64 cpu_to_vringh64(const struct vringh *vrh, u64 val)
>
> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>
> -void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb);
> +void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
> + spinlock_t *iotlb_lock);
>
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> unsigned int num, bool weak_barriers,
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 2183a833fcf4..53238989713d 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -284,7 +284,8 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr)
> goto err_iommu;
>
> for (i = 0; i < dev_attr->nvqs; i++)
> - vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu);
> + vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu,
> + &vdpasim->iommu_lock);
>
> ret = iova_cache_get();
> if (ret)
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 85d85faba058..f68122705719 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1074,6 +1074,8 @@ static int iotlb_translate(const struct vringh *vrh,
> int ret = 0;
> u64 s = 0;
>
> + spin_lock(vrh->iotlb_lock);
> +
> while (len > s) {
> u64 size, pa, pfn;
>
> @@ -1103,6 +1105,8 @@ static int iotlb_translate(const struct vringh *vrh,
> ++ret;
> }
>
> + spin_unlock(vrh->iotlb_lock);
> +
> return ret;
> }
>
> @@ -1262,10 +1266,13 @@ EXPORT_SYMBOL(vringh_init_iotlb);
> * vringh_set_iotlb - initialize a vringh for a ring with IOTLB.
> * @vrh: the vring
> * @iotlb: iotlb associated with this vring
> + * @iotlb_lock: spinlock to synchronize the iotlb accesses
> */
> -void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb)
> +void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
> + spinlock_t *iotlb_lock)
> {
> vrh->iotlb = iotlb;
> + vrh->iotlb_lock = iotlb_lock;
> }
> EXPORT_SYMBOL(vringh_set_iotlb);
>

2021-01-31 15:35:59

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/10] vdpa: add vdpa simulator for block device


On 1/28/2021 4:41 PM, Stefano Garzarella wrote:
> From: Max Gurtovoy <[email protected]>
>
> This will allow running vDPA for virtio block protocol.
>
> Signed-off-by: Max Gurtovoy <[email protected]>
> [sgarzare: various cleanups/fixes]
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> v2:
> - rebased on top of other changes (dev_attr, get_config(), notify(), etc.)
> - memset to 0 the config structure in vdpasim_blk_get_config()
> - used vdpasim pointer in vdpasim_blk_get_config()
>
> v1:
> - Removed unused headers
> - Used cpu_to_vdpasim*() to store config fields
> - Replaced 'select VDPA_SIM' with 'depends on VDPA_SIM' since selected
> option can not depend on other [Jason]
> - Start with a single queue for now [Jason]
> - Add comments to memory barriers
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 145 +++++++++++++++++++++++++++
> drivers/vdpa/Kconfig | 7 ++
> drivers/vdpa/vdpa_sim/Makefile | 1 +
> 3 files changed, 153 insertions(+)
> create mode 100644 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> new file mode 100644
> index 000000000000..999f9ca0b628
> --- /dev/null
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -0,0 +1,145 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * VDPA simulator for block device.
> + *
> + * Copyright (c) 2020, Mellanox Technologies. All rights reserved.

I guess we can change the copyright from Mellanox to:

Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

Thanks.

2021-02-01 05:50:28

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 04/10] vringh: implement vringh_kiov_advance()


On 2021/1/28 下午10:41, Stefano Garzarella wrote:
> In some cases, it may be useful to provide a way to skip a number
> of bytes in a vringh_kiov.
>
> Let's implement vringh_kiov_advance() for this purpose, reusing the
> code from vringh_iov_xfer().
> We replace that code calling the new vringh_kiov_advance().


Acked-by: Jason Wang <[email protected]>

In the long run we need to switch to use iov iterator library instead.

Thanks


>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> include/linux/vringh.h | 2 ++
> drivers/vhost/vringh.c | 41 +++++++++++++++++++++++++++++------------
> 2 files changed, 31 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 9c077863c8f6..755211ebd195 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -199,6 +199,8 @@ static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov)
> kiov->iov = NULL;
> }
>
> +void vringh_kiov_advance(struct vringh_kiov *kiov, size_t len);
> +
> int vringh_getdesc_kern(struct vringh *vrh,
> struct vringh_kiov *riov,
> struct vringh_kiov *wiov,
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index bee63d68201a..4d800e4f31ca 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -75,6 +75,34 @@ static inline int __vringh_get_head(const struct vringh *vrh,
> return head;
> }
>
> +/**
> + * vringh_kiov_advance - skip bytes from vring_kiov
> + * @iov: an iov passed to vringh_getdesc_*() (updated as we consume)
> + * @len: the maximum length to advance
> + */
> +void vringh_kiov_advance(struct vringh_kiov *iov, size_t len)
> +{
> + while (len && iov->i < iov->used) {
> + size_t partlen = min(iov->iov[iov->i].iov_len, len);
> +
> + iov->consumed += partlen;
> + iov->iov[iov->i].iov_len -= partlen;
> + iov->iov[iov->i].iov_base += partlen;
> +
> + if (!iov->iov[iov->i].iov_len) {
> + /* Fix up old iov element then increment. */
> + iov->iov[iov->i].iov_len = iov->consumed;
> + iov->iov[iov->i].iov_base -= iov->consumed;
> +
> + iov->consumed = 0;
> + iov->i++;
> + }
> +
> + len -= partlen;
> + }
> +}
> +EXPORT_SYMBOL(vringh_kiov_advance);
> +
> /* Copy some bytes to/from the iovec. Returns num copied. */
> static inline ssize_t vringh_iov_xfer(struct vringh *vrh,
> struct vringh_kiov *iov,
> @@ -95,19 +123,8 @@ static inline ssize_t vringh_iov_xfer(struct vringh *vrh,
> done += partlen;
> len -= partlen;
> ptr += partlen;
> - iov->consumed += partlen;
> - iov->iov[iov->i].iov_len -= partlen;
> - iov->iov[iov->i].iov_base += partlen;
>
> - if (!iov->iov[iov->i].iov_len) {
> - /* Fix up old iov element then increment. */
> - iov->iov[iov->i].iov_len = iov->consumed;
> - iov->iov[iov->i].iov_base -= iov->consumed;
> -
> -
> - iov->consumed = 0;
> - iov->i++;
> - }
> + vringh_kiov_advance(iov, partlen);
> }
> return done;
> }

2021-02-01 05:56:45

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/10] vdpa: add vdpa simulator for block device


On 2021/1/28 下午10:41, Stefano Garzarella wrote:
> From: Max Gurtovoy <[email protected]>
>
> This will allow running vDPA for virtio block protocol.
>
> Signed-off-by: Max Gurtovoy <[email protected]>
> [sgarzare: various cleanups/fixes]
> Signed-off-by: Stefano Garzarella <[email protected]>


Acked-by: Jason Wang <[email protected]>


> ---
> v2:
> - rebased on top of other changes (dev_attr, get_config(), notify(), etc.)
> - memset to 0 the config structure in vdpasim_blk_get_config()
> - used vdpasim pointer in vdpasim_blk_get_config()
>
> v1:
> - Removed unused headers
> - Used cpu_to_vdpasim*() to store config fields
> - Replaced 'select VDPA_SIM' with 'depends on VDPA_SIM' since selected
> option can not depend on other [Jason]
> - Start with a single queue for now [Jason]
> - Add comments to memory barriers
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 145 +++++++++++++++++++++++++++
> drivers/vdpa/Kconfig | 7 ++
> drivers/vdpa/vdpa_sim/Makefile | 1 +
> 3 files changed, 153 insertions(+)
> create mode 100644 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> new file mode 100644
> index 000000000000..999f9ca0b628
> --- /dev/null
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -0,0 +1,145 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * VDPA simulator for block device.
> + *
> + * Copyright (c) 2020, Mellanox Technologies. All rights reserved.
> + *
> + */
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +#include <linux/vringh.h>
> +#include <linux/vdpa.h>
> +#include <uapi/linux/virtio_blk.h>
> +
> +#include "vdpa_sim.h"
> +
> +#define DRV_VERSION "0.1"
> +#define DRV_AUTHOR "Max Gurtovoy <[email protected]>"
> +#define DRV_DESC "vDPA Device Simulator for block device"
> +#define DRV_LICENSE "GPL v2"
> +
> +#define VDPASIM_BLK_FEATURES (VDPASIM_FEATURES | \
> + (1ULL << VIRTIO_BLK_F_SIZE_MAX) | \
> + (1ULL << VIRTIO_BLK_F_SEG_MAX) | \
> + (1ULL << VIRTIO_BLK_F_BLK_SIZE) | \
> + (1ULL << VIRTIO_BLK_F_TOPOLOGY) | \
> + (1ULL << VIRTIO_BLK_F_MQ))
> +
> +#define VDPASIM_BLK_CAPACITY 0x40000
> +#define VDPASIM_BLK_SIZE_MAX 0x1000
> +#define VDPASIM_BLK_SEG_MAX 32
> +#define VDPASIM_BLK_VQ_NUM 1
> +
> +static struct vdpasim *vdpasim_blk_dev;
> +
> +static void vdpasim_blk_work(struct work_struct *work)
> +{
> + struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> + u8 status = VIRTIO_BLK_S_OK;
> + int i;
> +
> + spin_lock(&vdpasim->lock);
> +
> + if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> + goto out;
> +
> + for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
> + struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
> +
> + if (!vq->ready)
> + continue;
> +
> + while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov,
> + &vq->in_iov, &vq->head,
> + GFP_ATOMIC) > 0) {
> + int write;
> +
> + vq->in_iov.i = vq->in_iov.used - 1;
> + write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
> + &status, 1);
> + if (write <= 0)
> + break;
> +
> + /* Make sure data is wrote before advancing index */
> + smp_wmb();
> +
> + vringh_complete_iotlb(&vq->vring, vq->head, write);
> +
> + /* Make sure used is visible before rasing the interrupt. */
> + smp_wmb();
> +
> + local_bh_disable();
> + if (vringh_need_notify_iotlb(&vq->vring) > 0)
> + vringh_notify(&vq->vring);
> + local_bh_enable();
> + }
> + }
> +out:
> + spin_unlock(&vdpasim->lock);
> +}
> +
> +static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
> +{
> + struct virtio_blk_config *blk_config =
> + (struct virtio_blk_config *)config;
> +
> + memset(config, 0, sizeof(struct virtio_blk_config));
> +
> + blk_config->capacity = cpu_to_vdpasim64(vdpasim, VDPASIM_BLK_CAPACITY);
> + blk_config->size_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SIZE_MAX);
> + blk_config->seg_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SEG_MAX);
> + blk_config->num_queues = cpu_to_vdpasim16(vdpasim, VDPASIM_BLK_VQ_NUM);
> + blk_config->min_io_size = cpu_to_vdpasim16(vdpasim, 1);
> + blk_config->opt_io_size = cpu_to_vdpasim32(vdpasim, 1);
> + blk_config->blk_size = cpu_to_vdpasim32(vdpasim, SECTOR_SIZE);
> +}
> +
> +static int __init vdpasim_blk_init(void)
> +{
> + struct vdpasim_dev_attr dev_attr = {};
> + int ret;
> +
> + dev_attr.id = VIRTIO_ID_BLOCK;
> + dev_attr.supported_features = VDPASIM_BLK_FEATURES;
> + dev_attr.nvqs = VDPASIM_BLK_VQ_NUM;
> + dev_attr.config_size = sizeof(struct virtio_blk_config);
> + dev_attr.get_config = vdpasim_blk_get_config;
> + dev_attr.work_fn = vdpasim_blk_work;
> + dev_attr.buffer_size = PAGE_SIZE;
> +
> + vdpasim_blk_dev = vdpasim_create(&dev_attr);
> + if (IS_ERR(vdpasim_blk_dev)) {
> + ret = PTR_ERR(vdpasim_blk_dev);
> + goto out;
> + }
> +
> + ret = vdpa_register_device(&vdpasim_blk_dev->vdpa);
> + if (ret)
> + goto put_dev;
> +
> + return 0;
> +
> +put_dev:
> + put_device(&vdpasim_blk_dev->vdpa.dev);
> +out:
> + return ret;
> +}
> +
> +static void __exit vdpasim_blk_exit(void)
> +{
> + struct vdpa_device *vdpa = &vdpasim_blk_dev->vdpa;
> +
> + vdpa_unregister_device(vdpa);
> +}
> +
> +module_init(vdpasim_blk_init)
> +module_exit(vdpasim_blk_exit)
> +
> +MODULE_VERSION(DRV_VERSION);
> +MODULE_LICENSE(DRV_LICENSE);
> +MODULE_AUTHOR(DRV_AUTHOR);
> +MODULE_DESCRIPTION(DRV_DESC);
> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> index 21a23500f430..b8bd92cf04f9 100644
> --- a/drivers/vdpa/Kconfig
> +++ b/drivers/vdpa/Kconfig
> @@ -26,6 +26,13 @@ config VDPA_SIM_NET
> help
> vDPA networking device simulator which loops TX traffic back to RX.
>
> +config VDPA_SIM_BLOCK
> + tristate "vDPA simulator for block device"
> + depends on VDPA_SIM
> + help
> + vDPA block device simulator which terminates IO request in a
> + memory buffer.
> +
> config IFCVF
> tristate "Intel IFC VF vDPA driver"
> depends on PCI_MSI
> diff --git a/drivers/vdpa/vdpa_sim/Makefile b/drivers/vdpa/vdpa_sim/Makefile
> index 79d4536d347e..d458103302f2 100644
> --- a/drivers/vdpa/vdpa_sim/Makefile
> +++ b/drivers/vdpa/vdpa_sim/Makefile
> @@ -1,3 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0
> obj-$(CONFIG_VDPA_SIM) += vdpa_sim.o
> obj-$(CONFIG_VDPA_SIM_NET) += vdpa_sim_net.o
> +obj-$(CONFIG_VDPA_SIM_BLOCK) += vdpa_sim_blk.o

2021-02-01 06:36:48

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 10/10] vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID


On 2021/1/28 下午10:41, Stefano Garzarella wrote:
> Handle VIRTIO_BLK_T_GET_ID request, always answering the
> "vdpa_blk_sim" string.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> v2:
> - made 'vdpasim_blk_id' static [Jason]


Acked-by: Jason Wang <[email protected]>


> ---
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index fc47e8320358..a3f8afad8d14 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -37,6 +37,7 @@
> #define VDPASIM_BLK_VQ_NUM 1
>
> static struct vdpasim *vdpasim_blk_dev;
> +static char vdpasim_blk_id[VIRTIO_BLK_ID_BYTES] = "vdpa_blk_sim";
>
> static bool vdpasim_blk_check_range(u64 start_sector, size_t range_size)
> {
> @@ -152,6 +153,20 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
> }
> break;
>
> + case VIRTIO_BLK_T_GET_ID:
> + bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
> + vdpasim_blk_id,
> + VIRTIO_BLK_ID_BYTES);
> + if (bytes < 0) {
> + dev_err(&vdpasim->vdpa.dev,
> + "vringh_iov_push_iotlb() error: %zd\n", bytes);
> + status = VIRTIO_BLK_S_IOERR;
> + break;
> + }
> +
> + pushed += bytes;
> + break;
> +
> default:
> dev_warn(&vdpasim->vdpa.dev,
> "Unsupported request type %d\n", type);

2021-02-01 06:38:25

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 09/10] vdpa_sim_blk: implement ramdisk behaviour


On 2021/1/28 下午10:41, Stefano Garzarella wrote:
> The previous implementation wrote only the status of each request.
> This patch implements a more accurate block device simulator,
> providing a ramdisk-like behavior.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> v2:
> - used %zd %zx to print size_t and ssize_t variables in dev_err()
> - removed unnecessary new line [Jason]
> - moved VIRTIO_BLK_T_GET_ID in another patch [Jason]
> - used push/pull instead of write/read terminology
> - added vdpasim_blk_check_range() to avoid overflows [Stefan]
> - use vdpasim*_to_cpu instead of le*_to_cpu
> - used vringh_kiov_length() helper [Jason]


Acked-by: Jason Wang <[email protected]>


> ---
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 164 ++++++++++++++++++++++++---
> 1 file changed, 146 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index 999f9ca0b628..fc47e8320358 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -3,6 +3,7 @@
> * VDPA simulator for block device.
> *
> * Copyright (c) 2020, Mellanox Technologies. All rights reserved.
> + * Copyright (c) 2021, Red Hat Inc. All rights reserved.
> *
> */
>
> @@ -13,6 +14,7 @@
> #include <linux/sched.h>
> #include <linux/vringh.h>
> #include <linux/vdpa.h>
> +#include <linux/blkdev.h>
> #include <uapi/linux/virtio_blk.h>
>
> #include "vdpa_sim.h"
> @@ -36,10 +38,151 @@
>
> static struct vdpasim *vdpasim_blk_dev;
>
> +static bool vdpasim_blk_check_range(u64 start_sector, size_t range_size)
> +{
> + u64 range_sectors = range_size >> SECTOR_SHIFT;
> +
> + if (range_size > VDPASIM_BLK_SIZE_MAX * VDPASIM_BLK_SEG_MAX)
> + return false;
> +
> + if (start_sector > VDPASIM_BLK_CAPACITY)
> + return false;
> +
> + if (range_sectors > VDPASIM_BLK_CAPACITY - start_sector)
> + return false;
> +
> + return true;
> +}
> +
> +/* Returns 'true' if the request is handled (with or without an I/O error)
> + * and the status is correctly written in the last byte of the 'in iov',
> + * 'false' otherwise.
> + */
> +static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
> + struct vdpasim_virtqueue *vq)
> +{
> + size_t pushed = 0, to_pull, to_push;
> + struct virtio_blk_outhdr hdr;
> + ssize_t bytes;
> + loff_t offset;
> + u64 sector;
> + u8 status;
> + u32 type;
> + int ret;
> +
> + ret = vringh_getdesc_iotlb(&vq->vring, &vq->out_iov, &vq->in_iov,
> + &vq->head, GFP_ATOMIC);
> + if (ret != 1)
> + return false;
> +
> + if (vq->out_iov.used < 1 || vq->in_iov.used < 1) {
> + dev_err(&vdpasim->vdpa.dev, "missing headers - out_iov: %u in_iov %u\n",
> + vq->out_iov.used, vq->in_iov.used);
> + return false;
> + }
> +
> + if (vq->in_iov.iov[vq->in_iov.used - 1].iov_len < 1) {
> + dev_err(&vdpasim->vdpa.dev, "request in header too short\n");
> + return false;
> + }
> +
> + /* The last byte is the status and we checked if the last iov has
> + * enough room for it.
> + */
> + to_push = vringh_kiov_length(&vq->in_iov) - 1;
> +
> + to_pull = vringh_kiov_length(&vq->out_iov);
> +
> + bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, &hdr,
> + sizeof(hdr));
> + if (bytes != sizeof(hdr)) {
> + dev_err(&vdpasim->vdpa.dev, "request out header too short\n");
> + return false;
> + }
> +
> + to_pull -= bytes;
> +
> + type = vdpasim32_to_cpu(vdpasim, hdr.type);
> + sector = vdpasim64_to_cpu(vdpasim, hdr.sector);
> + offset = sector << SECTOR_SHIFT;
> + status = VIRTIO_BLK_S_OK;
> +
> + switch (type) {
> + case VIRTIO_BLK_T_IN:
> + if (!vdpasim_blk_check_range(sector, to_push)) {
> + dev_err(&vdpasim->vdpa.dev,
> + "reading over the capacity - offset: 0x%llx len: 0x%zx\n",
> + offset, to_push);
> + status = VIRTIO_BLK_S_IOERR;
> + break;
> + }
> +
> + bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
> + vdpasim->buffer + offset,
> + to_push);
> + if (bytes < 0) {
> + dev_err(&vdpasim->vdpa.dev,
> + "vringh_iov_push_iotlb() error: %zd offset: 0x%llx len: 0x%zx\n",
> + bytes, offset, to_push);
> + status = VIRTIO_BLK_S_IOERR;
> + break;
> + }
> +
> + pushed += bytes;
> + break;
> +
> + case VIRTIO_BLK_T_OUT:
> + if (!vdpasim_blk_check_range(sector, to_pull)) {
> + dev_err(&vdpasim->vdpa.dev,
> + "writing over the capacity - offset: 0x%llx len: 0x%zx\n",
> + offset, to_pull);
> + status = VIRTIO_BLK_S_IOERR;
> + break;
> + }
> +
> + bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov,
> + vdpasim->buffer + offset,
> + to_pull);
> + if (bytes < 0) {
> + dev_err(&vdpasim->vdpa.dev,
> + "vringh_iov_pull_iotlb() error: %zd offset: 0x%llx len: 0x%zx\n",
> + bytes, offset, to_pull);
> + status = VIRTIO_BLK_S_IOERR;
> + break;
> + }
> + break;
> +
> + default:
> + dev_warn(&vdpasim->vdpa.dev,
> + "Unsupported request type %d\n", type);
> + status = VIRTIO_BLK_S_IOERR;
> + break;
> + }
> +
> + /* If some operations fail, we need to skip the remaining bytes
> + * to put the status in the last byte
> + */
> + if (to_push - pushed > 0)
> + vringh_kiov_advance(&vq->in_iov, to_push - pushed);
> +
> + /* Last byte is the status */
> + bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, &status, 1);
> + if (bytes != 1)
> + return false;
> +
> + pushed += bytes;
> +
> + /* Make sure data is wrote before advancing index */
> + smp_wmb();
> +
> + vringh_complete_iotlb(&vq->vring, vq->head, pushed);
> +
> + return true;
> +}
> +
> static void vdpasim_blk_work(struct work_struct *work)
> {
> struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> - u8 status = VIRTIO_BLK_S_OK;
> int i;
>
> spin_lock(&vdpasim->lock);
> @@ -53,22 +196,7 @@ static void vdpasim_blk_work(struct work_struct *work)
> if (!vq->ready)
> continue;
>
> - while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov,
> - &vq->in_iov, &vq->head,
> - GFP_ATOMIC) > 0) {
> - int write;
> -
> - vq->in_iov.i = vq->in_iov.used - 1;
> - write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
> - &status, 1);
> - if (write <= 0)
> - break;
> -
> - /* Make sure data is wrote before advancing index */
> - smp_wmb();
> -
> - vringh_complete_iotlb(&vq->vring, vq->head, write);
> -
> + while (vdpasim_blk_handle_req(vdpasim, vq)) {
> /* Make sure used is visible before rasing the interrupt. */
> smp_wmb();
>
> @@ -109,7 +237,7 @@ static int __init vdpasim_blk_init(void)
> dev_attr.config_size = sizeof(struct virtio_blk_config);
> dev_attr.get_config = vdpasim_blk_get_config;
> dev_attr.work_fn = vdpasim_blk_work;
> - dev_attr.buffer_size = PAGE_SIZE;
> + dev_attr.buffer_size = VDPASIM_BLK_CAPACITY << SECTOR_SHIFT;
>
> vdpasim_blk_dev = vdpasim_create(&dev_attr);
> if (IS_ERR(vdpasim_blk_dev)) {

2021-02-01 06:42:11

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 02/10] vringh: add 'iotlb_lock' to synchronize iotlb accesses


On 2021/1/29 下午5:18, Stefano Garzarella wrote:
> On Fri, Jan 29, 2021 at 03:43:40PM +0800, Jason Wang wrote:
>>
>> On 2021/1/28 下午10:41, Stefano Garzarella wrote:
>>> Usually iotlb accesses are synchronized with a spinlock.
>>> Let's request it as a new parameter in vringh_set_iotlb() and
>>> hold it when we navigate the iotlb in iotlb_translate() to avoid
>>> race conditions with any new additions/deletions of ranges from
>>> the ioltb.
>>
>>
>> Patch looks fine but I wonder if this is the best approach comparing
>> to do locking by the caller.
>
> Initially I tried to hold the lock in the vdpasim_blk_work(), but since
> we have a lot of different functions for vringh, I opted to take the
> lock at the beginning and release it at the end.
> Also because several times I went to see if that call used
> iotlb_translate or not.
>
> This could be a problem for example if we have multiple workers to
> handle multiple queues.
>
> Also, some functions are quite long (e.g. vringh_getdesc_iotlb) and
> holding the lock for that long could reduce parallelism.
>
> For these reasons I thought it was better to hide everything from the
> caller who doesn't have to worry about which function calls
> iotlb_translate() and thus hold the lock.


Fine with me.

Acked-by: Jason Wang <[email protected]>

Thanks


>
> Thanks,
> Stefano
>
>>
>> Thanks
>>
>>
>>>
>>> Signed-off-by: Stefano Garzarella <[email protected]>
>>> ---
>>>  include/linux/vringh.h           | 6 +++++-
>>>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++-
>>>  drivers/vhost/vringh.c           | 9 ++++++++-
>>>  3 files changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>>> index 59bd50f99291..9c077863c8f6 100644
>>> --- a/include/linux/vringh.h
>>> +++ b/include/linux/vringh.h
>>> @@ -46,6 +46,9 @@ struct vringh {
>>>      /* IOTLB for this vring */
>>>      struct vhost_iotlb *iotlb;
>>> +    /* spinlock to synchronize IOTLB accesses */
>>> +    spinlock_t *iotlb_lock;
>>> +
>>>      /* The function to call to notify the guest about added buffers */
>>>      void (*notify)(struct vringh *);
>>>  };
>>> @@ -258,7 +261,8 @@ static inline __virtio64 cpu_to_vringh64(const
>>> struct vringh *vrh, u64 val)
>>>  #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>>> -void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb);
>>> +void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>>> +              spinlock_t *iotlb_lock);
>>>  int vringh_init_iotlb(struct vringh *vrh, u64 features,
>>>                unsigned int num, bool weak_barriers,
>>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>>> b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>>> index 2183a833fcf4..53238989713d 100644
>>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>>> @@ -284,7 +284,8 @@ struct vdpasim *vdpasim_create(struct
>>> vdpasim_dev_attr *dev_attr)
>>>          goto err_iommu;
>>>      for (i = 0; i < dev_attr->nvqs; i++)
>>> -        vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu);
>>> +        vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu,
>>> +                 &vdpasim->iommu_lock);
>>>      ret = iova_cache_get();
>>>      if (ret)
>>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>>> index 85d85faba058..f68122705719 100644
>>> --- a/drivers/vhost/vringh.c
>>> +++ b/drivers/vhost/vringh.c
>>> @@ -1074,6 +1074,8 @@ static int iotlb_translate(const struct vringh
>>> *vrh,
>>>      int ret = 0;
>>>      u64 s = 0;
>>> +    spin_lock(vrh->iotlb_lock);
>>> +
>>>      while (len > s) {
>>>          u64 size, pa, pfn;
>>> @@ -1103,6 +1105,8 @@ static int iotlb_translate(const struct vringh
>>> *vrh,
>>>          ++ret;
>>>      }
>>> +    spin_unlock(vrh->iotlb_lock);
>>> +
>>>      return ret;
>>>  }
>>> @@ -1262,10 +1266,13 @@ EXPORT_SYMBOL(vringh_init_iotlb);
>>>   * vringh_set_iotlb - initialize a vringh for a ring with IOTLB.
>>>   * @vrh: the vring
>>>   * @iotlb: iotlb associated with this vring
>>> + * @iotlb_lock: spinlock to synchronize the iotlb accesses
>>>   */
>>> -void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb)
>>> +void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>>> +              spinlock_t *iotlb_lock)
>>>  {
>>>      vrh->iotlb = iotlb;
>>> +    vrh->iotlb_lock = iotlb_lock;
>>>  }
>>>  EXPORT_SYMBOL(vringh_set_iotlb);
>>
>

2021-02-01 06:49:09

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 05/10] vringh: add vringh_kiov_length() helper


On 2021/1/28 下午10:41, Stefano Garzarella wrote:
> This new helper returns the total number of bytes covered by
> a vringh_kiov.
>
> Suggested-by: Jason Wang <[email protected]>
> Signed-off-by: Stefano Garzarella <[email protected]>


Acked-by: Jason Wang <[email protected]>


> ---
> include/linux/vringh.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 755211ebd195..84db7b8f912f 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -199,6 +199,17 @@ static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov)
> kiov->iov = NULL;
> }
>
> +static inline size_t vringh_kiov_length(struct vringh_kiov *kiov)
> +{
> + size_t len = 0;
> + int i;
> +
> + for (i = kiov->i; i < kiov->used; i++)
> + len += kiov->iov[i].iov_len;
> +
> + return len;
> +}
> +
> void vringh_kiov_advance(struct vringh_kiov *kiov, size_t len);
>
> int vringh_getdesc_kern(struct vringh *vrh,

2021-02-01 06:50:13

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/10] vringh: reset kiov 'consumed' field in __vringh_iov()


On 2021/1/28 下午10:41, Stefano Garzarella wrote:
> __vringh_iov() overwrites the contents of riov and wiov, in fact it
> resets the 'i' and 'used' fields, but also the consumed field should
> be reset to avoid an inconsistent state.
>
> Signed-off-by: Stefano Garzarella <[email protected]>


I had a question(I remember we had some discussion like this but I
forget the conclusion):

I see e.g in vringh_getdesc_kern() it has the following comment:

/*
 * Note that you may need to clean up riov and wiov, even on error!
 */

So it looks to me the correct way is to call vringh_kiov_cleanup() before?

Thanks


> ---
> drivers/vhost/vringh.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index f68122705719..bee63d68201a 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -290,9 +290,9 @@ __vringh_iov(struct vringh *vrh, u16 i,
> return -EINVAL;
>
> if (riov)
> - riov->i = riov->used = 0;
> + riov->i = riov->used = riov->consumed = 0;
> if (wiov)
> - wiov->i = wiov->used = 0;
> + wiov->i = wiov->used = wiov->consumed = 0;
>
> for (;;) {
> void *addr;

2021-02-01 06:50:57

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 06/10] vdpa_sim: cleanup kiovs in vdpasim_free()


On 2021/1/28 下午10:41, Stefano Garzarella wrote:
> vringh_getdesc_iotlb() allocates memory to store the kvec, that
> is freed with vringh_kiov_cleanup().
>
> vringh_getdesc_iotlb() is able to reuse a kvec previously allocated,
> so in order to avoid to allocate the kvec for each request, we are
> not calling vringh_kiov_cleanup() when we finished to handle a
> request, but we should call it when we free the entire device.
>
> Signed-off-by: Stefano Garzarella <[email protected]>


Acked-by: Jason Wang <[email protected]>


> ---
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 53238989713d..a7aeb5d01c3e 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -562,8 +562,15 @@ static int vdpasim_dma_unmap(struct vdpa_device *vdpa, u64 iova, u64 size)
> static void vdpasim_free(struct vdpa_device *vdpa)
> {
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> + int i;
>
> cancel_work_sync(&vdpasim->work);
> +
> + for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
> + vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
> + vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov);
> + }
> +
> put_iova_domain(&vdpasim->iova);
> iova_cache_put();
> kvfree(vdpasim->buffer);

2021-02-01 08:33:19

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/10] vdpa: add vdpa simulator for block device

On Sun, Jan 31, 2021 at 05:31:43PM +0200, Max Gurtovoy wrote:
>
>On 1/28/2021 4:41 PM, Stefano Garzarella wrote:
>>From: Max Gurtovoy <[email protected]>
>>
>>This will allow running vDPA for virtio block protocol.
>>
>>Signed-off-by: Max Gurtovoy <[email protected]>
>>[sgarzare: various cleanups/fixes]
>>Signed-off-by: Stefano Garzarella <[email protected]>
>>---
>>v2:
>>- rebased on top of other changes (dev_attr, get_config(), notify(), etc.)
>>- memset to 0 the config structure in vdpasim_blk_get_config()
>>- used vdpasim pointer in vdpasim_blk_get_config()
>>
>>v1:
>>- Removed unused headers
>>- Used cpu_to_vdpasim*() to store config fields
>>- Replaced 'select VDPA_SIM' with 'depends on VDPA_SIM' since selected
>> option can not depend on other [Jason]
>>- Start with a single queue for now [Jason]
>>- Add comments to memory barriers
>>---
>> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 145 +++++++++++++++++++++++++++
>> drivers/vdpa/Kconfig | 7 ++
>> drivers/vdpa/vdpa_sim/Makefile | 1 +
>> 3 files changed, 153 insertions(+)
>> create mode 100644 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
>>
>>diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
>>new file mode 100644
>>index 000000000000..999f9ca0b628
>>--- /dev/null
>>+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
>>@@ -0,0 +1,145 @@
>>+// SPDX-License-Identifier: GPL-2.0-only
>>+/*
>>+ * VDPA simulator for block device.
>>+ *
>>+ * Copyright (c) 2020, Mellanox Technologies. All rights reserved.
>
>I guess we can change the copyright from Mellanox to:
>
>Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

I'll update in the next version.

Thanks,
Stefano

2021-02-01 10:26:22

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/10] vringh: reset kiov 'consumed' field in __vringh_iov()

On Mon, Feb 01, 2021 at 01:40:01PM +0800, Jason Wang wrote:
>
>On 2021/1/28 下午10:41, Stefano Garzarella wrote:
>>__vringh_iov() overwrites the contents of riov and wiov, in fact it
>>resets the 'i' and 'used' fields, but also the consumed field should
>>be reset to avoid an inconsistent state.
>>
>>Signed-off-by: Stefano Garzarella <[email protected]>
>
>
>I had a question(I remember we had some discussion like this but I
>forget the conclusion):

Sorry, I forgot to update you.

>
>I see e.g in vringh_getdesc_kern() it has the following comment:
>
>/*
> * Note that you may need to clean up riov and wiov, even on error!
> */
>
>So it looks to me the correct way is to call vringh_kiov_cleanup()
>before?

Looking at the code the right pattern should be:

vringh_getdesc_*(..., &out_iov, &in_iov, ...);

// use out_iov and in_iov

vringh_kiov_cleanup(&out_iov);
vringh_kiov_cleanup(&in_iov);

This because vringh_getdesc_*() calls __vringh_iov() where
resize_iovec() is called to allocate the iov wrapped by 'struct
vringh_kiov' and vringh_kiov_cleanup() frees that memory.

Looking better, __vringh_iov() is able to extend a 'vringh_kiov'
pre-allocated, so in order to avoid to allocate and free the iov for
each request we can avoid to call vringh_kiov_cleanup(), but this patch
is needed to avoid an inconsistent state.

And also patch "vdpa_sim: cleanup kiovs in vdpasim_free()" is required
to free the iov when the device is going away.

Does that make sense to you?

Maybe I should add a comment in vringh.c to explain this better.

Thanks,
Stefano

>
>Thanks
>
>
>>---
>> drivers/vhost/vringh.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>>index f68122705719..bee63d68201a 100644
>>--- a/drivers/vhost/vringh.c
>>+++ b/drivers/vhost/vringh.c
>>@@ -290,9 +290,9 @@ __vringh_iov(struct vringh *vrh, u16 i,
>> return -EINVAL;
>> if (riov)
>>- riov->i = riov->used = 0;
>>+ riov->i = riov->used = riov->consumed = 0;
>> if (wiov)
>>- wiov->i = wiov->used = 0;
>>+ wiov->i = wiov->used = wiov->consumed = 0;
>> for (;;) {
>> void *addr;
>

2021-02-01 10:28:12

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC v2 04/10] vringh: implement vringh_kiov_advance()

On Mon, Feb 01, 2021 at 01:43:23PM +0800, Jason Wang wrote:
>
>On 2021/1/28 下午10:41, Stefano Garzarella wrote:
>>In some cases, it may be useful to provide a way to skip a number
>>of bytes in a vringh_kiov.
>>
>>Let's implement vringh_kiov_advance() for this purpose, reusing the
>>code from vringh_iov_xfer().
>>We replace that code calling the new vringh_kiov_advance().
>
>
>Acked-by: Jason Wang <[email protected]>
>
>In the long run we need to switch to use iov iterator library instead.

Yes I agree.
I've tried to do this, but it requires quite a bit of work to change
vringh, I'll put it on my todo list.

Thanks,
Stefano

>
>Thanks
>
>
>>
>>Signed-off-by: Stefano Garzarella <[email protected]>
>>---
>> include/linux/vringh.h | 2 ++
>> drivers/vhost/vringh.c | 41 +++++++++++++++++++++++++++++------------
>> 2 files changed, 31 insertions(+), 12 deletions(-)
>>
>>diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>>index 9c077863c8f6..755211ebd195 100644
>>--- a/include/linux/vringh.h
>>+++ b/include/linux/vringh.h
>>@@ -199,6 +199,8 @@ static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov)
>> kiov->iov = NULL;
>> }
>>+void vringh_kiov_advance(struct vringh_kiov *kiov, size_t len);
>>+
>> int vringh_getdesc_kern(struct vringh *vrh,
>> struct vringh_kiov *riov,
>> struct vringh_kiov *wiov,
>>diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>>index bee63d68201a..4d800e4f31ca 100644
>>--- a/drivers/vhost/vringh.c
>>+++ b/drivers/vhost/vringh.c
>>@@ -75,6 +75,34 @@ static inline int __vringh_get_head(const struct vringh *vrh,
>> return head;
>> }
>>+/**
>>+ * vringh_kiov_advance - skip bytes from vring_kiov
>>+ * @iov: an iov passed to vringh_getdesc_*() (updated as we consume)
>>+ * @len: the maximum length to advance
>>+ */
>>+void vringh_kiov_advance(struct vringh_kiov *iov, size_t len)
>>+{
>>+ while (len && iov->i < iov->used) {
>>+ size_t partlen = min(iov->iov[iov->i].iov_len, len);
>>+
>>+ iov->consumed += partlen;
>>+ iov->iov[iov->i].iov_len -= partlen;
>>+ iov->iov[iov->i].iov_base += partlen;
>>+
>>+ if (!iov->iov[iov->i].iov_len) {
>>+ /* Fix up old iov element then increment. */
>>+ iov->iov[iov->i].iov_len = iov->consumed;
>>+ iov->iov[iov->i].iov_base -= iov->consumed;
>>+
>>+ iov->consumed = 0;
>>+ iov->i++;
>>+ }
>>+
>>+ len -= partlen;
>>+ }
>>+}
>>+EXPORT_SYMBOL(vringh_kiov_advance);
>>+
>> /* Copy some bytes to/from the iovec. Returns num copied. */
>> static inline ssize_t vringh_iov_xfer(struct vringh *vrh,
>> struct vringh_kiov *iov,
>>@@ -95,19 +123,8 @@ static inline ssize_t vringh_iov_xfer(struct vringh *vrh,
>> done += partlen;
>> len -= partlen;
>> ptr += partlen;
>>- iov->consumed += partlen;
>>- iov->iov[iov->i].iov_len -= partlen;
>>- iov->iov[iov->i].iov_base += partlen;
>>- if (!iov->iov[iov->i].iov_len) {
>>- /* Fix up old iov element then increment. */
>>- iov->iov[iov->i].iov_len = iov->consumed;
>>- iov->iov[iov->i].iov_base -= iov->consumed;
>>-
>>-
>>- iov->consumed = 0;
>>- iov->i++;
>>- }
>>+ vringh_kiov_advance(iov, partlen);
>> }
>> return done;
>> }
>

2021-02-02 03:29:52

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 03/10] vringh: reset kiov 'consumed' field in __vringh_iov()


On 2021/2/1 下午6:21, Stefano Garzarella wrote:
> On Mon, Feb 01, 2021 at 01:40:01PM +0800, Jason Wang wrote:
>>
>> On 2021/1/28 下午10:41, Stefano Garzarella wrote:
>>> __vringh_iov() overwrites the contents of riov and wiov, in fact it
>>> resets the 'i' and 'used' fields, but also the consumed field should
>>> be reset to avoid an inconsistent state.
>>>
>>> Signed-off-by: Stefano Garzarella <[email protected]>
>>
>>
>> I had a question(I remember we had some discussion like this but I
>> forget the conclusion):
>
> Sorry, I forgot to update you.
>
>>
>> I see e.g in vringh_getdesc_kern() it has the following comment:
>>
>> /*
>>  * Note that you may need to clean up riov and wiov, even on error!
>>  */
>>
>> So it looks to me the correct way is to call vringh_kiov_cleanup()
>> before?
>
> Looking at the code the right pattern should be:
>
>     vringh_getdesc_*(..., &out_iov, &in_iov, ...);
>
>     // use out_iov and in_iov
>
>     vringh_kiov_cleanup(&out_iov);
>     vringh_kiov_cleanup(&in_iov);
>
> This because vringh_getdesc_*() calls __vringh_iov() where
> resize_iovec() is called to allocate the iov wrapped by 'struct
> vringh_kiov' and vringh_kiov_cleanup() frees that memory.
>
> Looking better, __vringh_iov() is able to extend a 'vringh_kiov'
> pre-allocated, so in order to avoid to allocate and free the iov for
> each request we can avoid to call vringh_kiov_cleanup(), but this
> patch is needed to avoid an inconsistent state.
>
> And also patch "vdpa_sim: cleanup kiovs in vdpasim_free()" is required
> to free the iov when the device is going away.
>
> Does that make sense to you?


Make sense.


>
> Maybe I should add a comment in vringh.c to explain this better.


Yes, please.

Thanks


>
> Thanks,
> Stefano
>
>>
>> Thanks
>>
>>
>>> ---
>>>  drivers/vhost/vringh.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>>> index f68122705719..bee63d68201a 100644
>>> --- a/drivers/vhost/vringh.c
>>> +++ b/drivers/vhost/vringh.c
>>> @@ -290,9 +290,9 @@ __vringh_iov(struct vringh *vrh, u16 i,
>>>          return -EINVAL;
>>>      if (riov)
>>> -        riov->i = riov->used = 0;
>>> +        riov->i = riov->used = riov->consumed = 0;
>>>      if (wiov)
>>> -        wiov->i = wiov->used = 0;
>>> +        wiov->i = wiov->used = wiov->consumed = 0;
>>>      for (;;) {
>>>          void *addr;
>>
>

2021-02-02 22:31:21

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH RFC v2 10/10] vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID

On Thu, Jan 28, 2021 at 03:41:27PM +0100, Stefano Garzarella wrote:
> Handle VIRTIO_BLK_T_GET_ID request, always answering the
> "vdpa_blk_sim" string.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> v2:
> - made 'vdpasim_blk_id' static [Jason]
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)

Reviewed-by: Stefan Hajnoczi <[email protected]>


Attachments:
(No filename) (434.00 B)
signature.asc (499.00 B)
Download all attachments

2021-02-02 22:32:13

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH RFC v2 09/10] vdpa_sim_blk: implement ramdisk behaviour

On Thu, Jan 28, 2021 at 03:41:26PM +0100, Stefano Garzarella wrote:
> The previous implementation wrote only the status of each request.
> This patch implements a more accurate block device simulator,
> providing a ramdisk-like behavior.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> v2:
> - used %zd %zx to print size_t and ssize_t variables in dev_err()
> - removed unnecessary new line [Jason]
> - moved VIRTIO_BLK_T_GET_ID in another patch [Jason]
> - used push/pull instead of write/read terminology
> - added vdpasim_blk_check_range() to avoid overflows [Stefan]
> - use vdpasim*_to_cpu instead of le*_to_cpu
> - used vringh_kiov_length() helper [Jason]
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 164 ++++++++++++++++++++++++---
> 1 file changed, 146 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi <[email protected]>


Attachments:
(No filename) (889.00 B)
signature.asc (499.00 B)
Download all attachments

2021-02-02 22:33:32

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/10] vdpa: add vdpa simulator for block device

On Thu, Jan 28, 2021 at 03:41:25PM +0100, Stefano Garzarella wrote:
> +static void vdpasim_blk_work(struct work_struct *work)
> +{
> + struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> + u8 status = VIRTIO_BLK_S_OK;
> + int i;
> +
> + spin_lock(&vdpasim->lock);
> +
> + if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> + goto out;
> +
> + for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
> + struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
> +
> + if (!vq->ready)
> + continue;
> +
> + while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov,
> + &vq->in_iov, &vq->head,
> + GFP_ATOMIC) > 0) {
> + int write;
> +
> + vq->in_iov.i = vq->in_iov.used - 1;
> + write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
> + &status, 1);
> + if (write <= 0)
> + break;

This code looks fragile:

1. Relying on unsigned underflow and the while loop in
vringh_iov_push_iotlb() to handle the case where in_iov.used == 0 is
risky and could break.

2. Does this assume that the last in_iov element has size 1? For
example, the guest driver may send a single "in" iovec with size 513
when reading 512 bytes (with an extra byte for the request status).

Please validate inputs fully, even in test/development code, because
it's likely to be copied by others when writing production code (or
deployed in production by unsuspecting users) :).


Attachments:
(No filename) (1.41 kB)
signature.asc (499.00 B)
Download all attachments

2021-02-02 22:45:58

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/10] vdpa: add vdpa simulator for block device

On Tue, Feb 02, 2021 at 09:34:12AM +0000, Stefan Hajnoczi wrote:
>On Thu, Jan 28, 2021 at 03:41:25PM +0100, Stefano Garzarella wrote:
>> +static void vdpasim_blk_work(struct work_struct *work)
>> +{
>> + struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>> + u8 status = VIRTIO_BLK_S_OK;
>> + int i;
>> +
>> + spin_lock(&vdpasim->lock);
>> +
>> + if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
>> + goto out;
>> +
>> + for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
>> + struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
>> +
>> + if (!vq->ready)
>> + continue;
>> +
>> + while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov,
>> + &vq->in_iov, &vq->head,
>> + GFP_ATOMIC) > 0) {
>> + int write;
>> +
>> + vq->in_iov.i = vq->in_iov.used - 1;
>> + write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
>> + &status, 1);
>> + if (write <= 0)
>> + break;
>
>This code looks fragile:
>
>1. Relying on unsigned underflow and the while loop in
> vringh_iov_push_iotlb() to handle the case where in_iov.used == 0 is
> risky and could break.
>
>2. Does this assume that the last in_iov element has size 1? For
> example, the guest driver may send a single "in" iovec with size 513
> when reading 512 bytes (with an extra byte for the request status).
>
>Please validate inputs fully, even in test/development code, because
>it's likely to be copied by others when writing production code (or
>deployed in production by unsuspecting users) :).

Perfectly agree on that, so I addressed these things, also following
your review on the previous version, on the next patch of this series:
"vdpa_sim_blk: implement ramdisk behaviour".

Do you think should I move these checks in this patch?

I did this to leave Max credit for this patch and add more code to
emulate a ramdisk in later patches.

Thanks,
Stefano

2021-02-03 16:51:26

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/10] vdpa: add vdpa simulator for block device

On Tue, Feb 02, 2021 at 04:49:50PM +0100, Stefano Garzarella wrote:
> On Tue, Feb 02, 2021 at 09:34:12AM +0000, Stefan Hajnoczi wrote:
> > On Thu, Jan 28, 2021 at 03:41:25PM +0100, Stefano Garzarella wrote:
> > > +static void vdpasim_blk_work(struct work_struct *work)
> > > +{
> > > + struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> > > + u8 status = VIRTIO_BLK_S_OK;
> > > + int i;
> > > +
> > > + spin_lock(&vdpasim->lock);
> > > +
> > > + if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > > + goto out;
> > > +
> > > + for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
> > > + struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
> > > +
> > > + if (!vq->ready)
> > > + continue;
> > > +
> > > + while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov,
> > > + &vq->in_iov, &vq->head,
> > > + GFP_ATOMIC) > 0) {
> > > + int write;
> > > +
> > > + vq->in_iov.i = vq->in_iov.used - 1;
> > > + write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
> > > + &status, 1);
> > > + if (write <= 0)
> > > + break;
> >
> > This code looks fragile:
> >
> > 1. Relying on unsigned underflow and the while loop in
> > vringh_iov_push_iotlb() to handle the case where in_iov.used == 0 is
> > risky and could break.
> >
> > 2. Does this assume that the last in_iov element has size 1? For
> > example, the guest driver may send a single "in" iovec with size 513
> > when reading 512 bytes (with an extra byte for the request status).
> >
> > Please validate inputs fully, even in test/development code, because
> > it's likely to be copied by others when writing production code (or
> > deployed in production by unsuspecting users) :).
>
> Perfectly agree on that, so I addressed these things, also following your
> review on the previous version, on the next patch of this series:
> "vdpa_sim_blk: implement ramdisk behaviour".
>
> Do you think should I move these checks in this patch?
>
> I did this to leave Max credit for this patch and add more code to emulate a
> ramdisk in later patches.

You could update the commit description so it's clear that input
validation is missing and will be added in the next commit.

Stefan


Attachments:
(No filename) (2.20 kB)
signature.asc (499.00 B)
Download all attachments

2021-02-04 23:32:21

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC v2 08/10] vdpa: add vdpa simulator for block device

On Wed, Feb 03, 2021 at 04:45:51PM +0000, Stefan Hajnoczi wrote:
>On Tue, Feb 02, 2021 at 04:49:50PM +0100, Stefano Garzarella wrote:
>> On Tue, Feb 02, 2021 at 09:34:12AM +0000, Stefan Hajnoczi wrote:
>> > On Thu, Jan 28, 2021 at 03:41:25PM +0100, Stefano Garzarella wrote:
>> > > +static void vdpasim_blk_work(struct work_struct *work)
>> > > +{
>> > > + struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>> > > + u8 status = VIRTIO_BLK_S_OK;
>> > > + int i;
>> > > +
>> > > + spin_lock(&vdpasim->lock);
>> > > +
>> > > + if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
>> > > + goto out;
>> > > +
>> > > + for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
>> > > + struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
>> > > +
>> > > + if (!vq->ready)
>> > > + continue;
>> > > +
>> > > + while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov,
>> > > + &vq->in_iov, &vq->head,
>> > > + GFP_ATOMIC) > 0) {
>> > > + int write;
>> > > +
>> > > + vq->in_iov.i = vq->in_iov.used - 1;
>> > > + write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
>> > > + &status, 1);
>> > > + if (write <= 0)
>> > > + break;
>> >
>> > This code looks fragile:
>> >
>> > 1. Relying on unsigned underflow and the while loop in
>> > vringh_iov_push_iotlb() to handle the case where in_iov.used == 0 is
>> > risky and could break.
>> >
>> > 2. Does this assume that the last in_iov element has size 1? For
>> > example, the guest driver may send a single "in" iovec with size 513
>> > when reading 512 bytes (with an extra byte for the request status).
>> >
>> > Please validate inputs fully, even in test/development code, because
>> > it's likely to be copied by others when writing production code (or
>> > deployed in production by unsuspecting users) :).
>>
>> Perfectly agree on that, so I addressed these things, also following your
>> review on the previous version, on the next patch of this series:
>> "vdpa_sim_blk: implement ramdisk behaviour".
>>
>> Do you think should I move these checks in this patch?
>>
>> I did this to leave Max credit for this patch and add more code to emulate a
>> ramdisk in later patches.
>
>You could update the commit description so it's clear that input
>validation is missing and will be added in the next commit.

I'll do it.

Thanks,
Stefano