2023-03-21 15:43:29

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 0/8] vdpa_sim: add support for user VA

This series adds support for the use of user virtual addresses in the
vDPA simulator devices.

The main reason for this change is to lift the pinning of all guest memory.
Especially with virtio devices implemented in software.

The next step would be to generalize the code in vdpa-sim to allow the
implementation of in-kernel software devices. Similar to vhost, but using vDPA
so we can reuse the same software stack (e.g. in QEMU) for both HW and SW
devices.

For example, we have never merged vhost-blk, and lately there has been interest.
So it would be nice to do it directly with vDPA to reuse the same code in the
VMM for both HW and SW vDPA block devices.

The main problem (addressed by this series) was due to the pinning of all
guest memory, which thus prevented the overcommit of guest memory.

Thanks,
Stefano

Changelog listed in each patch.
v2: https://lore.kernel.org/lkml/[email protected]/
RFC v1: https://lore.kernel.org/lkml/[email protected]/

Stefano Garzarella (8):
vdpa: add bind_mm/unbind_mm callbacks
vhost-vdpa: use bind_mm/unbind_mm device callbacks
vringh: replace kmap_atomic() with kmap_local_page()
vringh: support VA with iotlb
vdpa_sim: make devices agnostic for work management
vdpa_sim: use kthread worker
vdpa_sim: replace the spinlock with a mutex to protect the state
vdpa_sim: add support for user VA

drivers/vdpa/vdpa_sim/vdpa_sim.h | 11 +-
include/linux/vdpa.h | 10 ++
include/linux/vringh.h | 5 +-
drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 144 ++++++++++++++++++++-----
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 10 +-
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 10 +-
drivers/vhost/vdpa.c | 31 ++++++
drivers/vhost/vringh.c | 153 +++++++++++++++++++++------
9 files changed, 301 insertions(+), 75 deletions(-)

--
2.39.2



2023-03-21 15:43:33

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 1/8] vdpa: add bind_mm/unbind_mm callbacks

These new optional callbacks is used to bind/unbind the device to
a specific address space so the vDPA framework can use VA when
these callbacks are implemented.

Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---

Notes:
v2:
- removed `struct task_struct *owner` param (unused for now, maybe
useful to support cgroups) [Jason]
- add unbind_mm callback [Jason]

include/linux/vdpa.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 43f59ef10cc9..369c21394284 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -290,6 +290,14 @@ struct vdpa_map_file {
* @vdev: vdpa device
* @idx: virtqueue index
* Returns pointer to structure device or error (NULL)
+ * @bind_mm: Bind the device to a specific address space
+ * so the vDPA framework can use VA when this
+ * callback is implemented. (optional)
+ * @vdev: vdpa device
+ * @mm: address space to bind
+ * @unbind_mm: Unbind the device from the address space
+ * bound using the bind_mm callback. (optional)
+ * @vdev: vdpa device
* @free: Free resources that belongs to vDPA (optional)
* @vdev: vdpa device
*/
@@ -351,6 +359,8 @@ struct vdpa_config_ops {
int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
unsigned int asid);
struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
+ int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
+ void (*unbind_mm)(struct vdpa_device *vdev);

/* Free device resources */
void (*free)(struct vdpa_device *vdev);
--
2.39.2


2023-03-21 15:43:46

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks

When the user call VHOST_SET_OWNER ioctl and the vDPA device
has `use_va` set to true, let's call the bind_mm callback.
In this way we can bind the device to the user address space
and directly use the user VA.

The unbind_mm callback is called during the release after
stopping the device.

Signed-off-by: Stefano Garzarella <[email protected]>
---

Notes:
v3:
- added `case VHOST_SET_OWNER` in vhost_vdpa_unlocked_ioctl() [Jason]
v2:
- call the new unbind_mm callback during the release [Jason]
- avoid to call bind_mm callback after the reset, since the device
is not detaching it now during the reset

drivers/vhost/vdpa.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..20250c3418b2 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
return vdpa_reset(vdpa);
}

+static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (!vdpa->use_va || !ops->bind_mm)
+ return 0;
+
+ return ops->bind_mm(vdpa, v->vdev.mm);
+}
+
+static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (!vdpa->use_va || !ops->unbind_mm)
+ return;
+
+ ops->unbind_mm(vdpa);
+}
+
static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
{
struct vdpa_device *vdpa = v->vdpa;
@@ -709,6 +731,14 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_RESUME:
r = vhost_vdpa_resume(v);
break;
+ case VHOST_SET_OWNER:
+ r = vhost_dev_set_owner(d);
+ if (r)
+ break;
+ r = vhost_vdpa_bind_mm(v);
+ if (r)
+ vhost_dev_reset_owner(d, NULL);
+ break;
default:
r = vhost_dev_ioctl(&v->vdev, cmd, argp);
if (r == -ENOIOCTLCMD)
@@ -1287,6 +1317,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)
vhost_vdpa_clean_irq(v);
vhost_vdpa_reset(v);
vhost_dev_stop(&v->vdev);
+ vhost_vdpa_unbind_mm(v);
vhost_vdpa_config_put(v);
vhost_vdpa_cleanup(v);
mutex_unlock(&d->mutex);
--
2.39.2


2023-03-21 15:43:56

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 3/8] vringh: replace kmap_atomic() with kmap_local_page()

kmap_atomic() is deprecated in favor of kmap_local_page() since commit
f3ba3c710ac5 ("mm/highmem: Provide kmap_local*").

With kmap_local_page() the mappings are per thread, CPU local, can take
page-faults, and can be called from any context (including interrupts).
Furthermore, the tasks can be preempted and, when they are scheduled to
run again, the kernel virtual addresses are restored and still valid.

kmap_atomic() is implemented like a kmap_local_page() which also disables
page-faults and preemption (the latter only for !PREEMPT_RT kernels,
otherwise it only disables migration).

The code within the mappings/un-mappings in getu16_iotlb() and
putu16_iotlb() don't depend on the above-mentioned side effects of
kmap_atomic(), so that mere replacements of the old API with the new one
is all that is required (i.e., there is no need to explicitly add calls
to pagefault_disable() and/or preempt_disable()).

This commit reuses a "boiler plate" commit message from Fabio, who has
already did this change in several places.

Cc: "Fabio M. De Francesco" <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---

Notes:
v3:
- credited Fabio for the commit message
- added reference to the commit that deprecated kmap_atomic() [Jason]
v2:
- added this patch since checkpatch.pl complained about deprecation
of kmap_atomic() touched by next patch

drivers/vhost/vringh.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index a1e27da54481..0ba3ef809e48 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;

- kaddr = kmap_atomic(iov.bv_page);
+ kaddr = kmap_local_page(iov.bv_page);
from = kaddr + iov.bv_offset;
*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
- kunmap_atomic(kaddr);
+ kunmap_local(kaddr);

return 0;
}
@@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;

- kaddr = kmap_atomic(iov.bv_page);
+ kaddr = kmap_local_page(iov.bv_page);
to = kaddr + iov.bv_offset;
WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
- kunmap_atomic(kaddr);
+ kunmap_local(kaddr);

return 0;
}
--
2.39.2


2023-03-21 15:44:22

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 4/8] vringh: support VA with iotlb

vDPA supports the possibility to use user VA in the iotlb messages.
So, let's add support for user VA in vringh to use it in the vDPA
simulators.

Signed-off-by: Stefano Garzarella <[email protected]>
---

Notes:
v3:
- refactored avoiding code duplication [Eugenio]
v2:
- replace kmap_atomic() with kmap_local_page() [see previous patch]
- fix cast warnings when build with W=1 C=1

include/linux/vringh.h | 5 +-
drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
drivers/vhost/vringh.c | 153 +++++++++++++++++++++++-------
4 files changed, 127 insertions(+), 37 deletions(-)

diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 1991a02c6431..d39b9f2dcba0 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -32,6 +32,9 @@ struct vringh {
/* Can we get away with weak barriers? */
bool weak_barriers;

+ /* Use user's VA */
+ bool use_va;
+
/* Last available index we saw (ie. where we're up to). */
u16 last_avail_idx;

@@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
spinlock_t *iotlb_lock);

int vringh_init_iotlb(struct vringh *vrh, u64 features,
- unsigned int num, bool weak_barriers,
+ unsigned int num, bool weak_barriers, bool use_va,
struct vring_desc *desc,
struct vring_avail *avail,
struct vring_used *used);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 520646ae7fa0..dfd0e000217b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)

if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
- MLX5_CVQ_MAX_ENT, false,
+ MLX5_CVQ_MAX_ENT, false, false,
(struct vring_desc *)(uintptr_t)cvq->desc_addr,
(struct vring_avail *)(uintptr_t)cvq->driver_addr,
(struct vring_used *)(uintptr_t)cvq->device_addr);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index eea23c630f7c..47cdf2a1f5b8 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
uint16_t last_avail_idx = vq->vring.last_avail_idx;

- vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
+ vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
(struct vring_desc *)(uintptr_t)vq->desc_addr,
(struct vring_avail *)
(uintptr_t)vq->driver_addr,
@@ -92,7 +92,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
vq->cb = NULL;
vq->private = NULL;
vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
- VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
+ VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);

vq->vring.notify = NULL;
}
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 0ba3ef809e48..72c88519329a 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1094,10 +1094,18 @@ EXPORT_SYMBOL(vringh_need_notify_kern);

#if IS_REACHABLE(CONFIG_VHOST_IOTLB)

+struct iotlb_vec {
+ union {
+ struct iovec *iovec;
+ struct bio_vec *bvec;
+ } iov;
+ size_t count;
+ bool is_iovec;
+};
+
static int iotlb_translate(const struct vringh *vrh,
u64 addr, u64 len, u64 *translated,
- struct bio_vec iov[],
- int iov_size, u32 perm)
+ struct iotlb_vec *ivec, u32 perm)
{
struct vhost_iotlb_map *map;
struct vhost_iotlb *iotlb = vrh->iotlb;
@@ -1107,9 +1115,9 @@ static int iotlb_translate(const struct vringh *vrh,
spin_lock(vrh->iotlb_lock);

while (len > s) {
- u64 size, pa, pfn;
+ u64 size;

- if (unlikely(ret >= iov_size)) {
+ if (unlikely(ret >= ivec->count)) {
ret = -ENOBUFS;
break;
}
@@ -1124,10 +1132,22 @@ static int iotlb_translate(const struct vringh *vrh,
}

size = map->size - addr + map->start;
- pa = map->addr + addr - map->start;
- pfn = pa >> PAGE_SHIFT;
- bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
- pa & (PAGE_SIZE - 1));
+ if (ivec->is_iovec) {
+ struct iovec *iovec = ivec->iov.iovec;
+
+ iovec[ret].iov_len = min(len - s, size);
+ iovec[ret].iov_base = (void __user *)(unsigned long)
+ (map->addr + addr - map->start);
+ } else {
+ u64 pa = map->addr + addr - map->start;
+ u64 pfn = pa >> PAGE_SHIFT;
+ struct bio_vec *bvec = ivec->iov.bvec;
+
+ bvec_set_page(&bvec[ret], pfn_to_page(pfn),
+ min(len - s, size),
+ pa & (PAGE_SIZE - 1));
+ }
+
s += size;
addr += size;
++ret;
@@ -1141,26 +1161,42 @@ static int iotlb_translate(const struct vringh *vrh,
return ret;
}

+#define IOTLB_IOV_SIZE 16
+
static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
void *src, size_t len)
{
+ struct iotlb_vec ivec;
+ union {
+ struct iovec iovec[IOTLB_IOV_SIZE];
+ struct bio_vec bvec[IOTLB_IOV_SIZE];
+ } iov;
u64 total_translated = 0;

+ ivec.iov.iovec = iov.iovec;
+ ivec.count = IOTLB_IOV_SIZE;
+ ivec.is_iovec = vrh->use_va;
+
while (total_translated < len) {
- struct bio_vec iov[16];
struct iov_iter iter;
u64 translated;
int ret;

ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
len - total_translated, &translated,
- iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
+ &ivec, VHOST_MAP_RO);
if (ret == -ENOBUFS)
- ret = ARRAY_SIZE(iov);
+ ret = IOTLB_IOV_SIZE;
else if (ret < 0)
return ret;

- iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
+ if (ivec.is_iovec) {
+ iov_iter_init(&iter, ITER_SOURCE, ivec.iov.iovec, ret,
+ translated);
+ } else {
+ iov_iter_bvec(&iter, ITER_SOURCE, ivec.iov.bvec, ret,
+ translated);
+ }

ret = copy_from_iter(dst, translated, &iter);
if (ret < 0)
@@ -1177,23 +1213,37 @@ static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
void *src, size_t len)
{
+ struct iotlb_vec ivec;
+ union {
+ struct iovec iovec[IOTLB_IOV_SIZE];
+ struct bio_vec bvec[IOTLB_IOV_SIZE];
+ } iov;
u64 total_translated = 0;

+ ivec.iov.iovec = iov.iovec;
+ ivec.count = IOTLB_IOV_SIZE;
+ ivec.is_iovec = vrh->use_va;
+
while (total_translated < len) {
- struct bio_vec iov[16];
struct iov_iter iter;
u64 translated;
int ret;

ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
len - total_translated, &translated,
- iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
+ &ivec, VHOST_MAP_WO);
if (ret == -ENOBUFS)
- ret = ARRAY_SIZE(iov);
+ ret = IOTLB_IOV_SIZE;
else if (ret < 0)
return ret;

- iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
+ if (ivec.is_iovec) {
+ iov_iter_init(&iter, ITER_DEST, ivec.iov.iovec, ret,
+ translated);
+ } else {
+ iov_iter_bvec(&iter, ITER_DEST, ivec.iov.bvec, ret,
+ translated);
+ }

ret = copy_to_iter(src, translated, &iter);
if (ret < 0)
@@ -1210,20 +1260,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
static inline int getu16_iotlb(const struct vringh *vrh,
u16 *val, const __virtio16 *p)
{
- struct bio_vec iov;
- void *kaddr, *from;
+ struct iotlb_vec ivec;
+ union {
+ struct iovec iovec[1];
+ struct bio_vec bvec[1];
+ } iov;
+ __virtio16 tmp;
int ret;

+ ivec.iov.iovec = iov.iovec;
+ ivec.count = 1;
+ ivec.is_iovec = vrh->use_va;
+
/* Atomic read is needed for getu16 */
- ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
- &iov, 1, VHOST_MAP_RO);
+ ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
+ NULL, &ivec, VHOST_MAP_RO);
if (ret < 0)
return ret;

- kaddr = kmap_local_page(iov.bv_page);
- from = kaddr + iov.bv_offset;
- *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
- kunmap_local(kaddr);
+ if (ivec.is_iovec) {
+ ret = __get_user(tmp, (__virtio16 __user *)ivec.iov.iovec[0].iov_base);
+ if (ret)
+ return ret;
+ } else {
+ void *kaddr = kmap_local_page(ivec.iov.bvec[0].bv_page);
+ void *from = kaddr + ivec.iov.bvec[0].bv_offset;
+
+ tmp = READ_ONCE(*(__virtio16 *)from);
+ kunmap_local(kaddr);
+ }
+
+ *val = vringh16_to_cpu(vrh, tmp);

return 0;
}
@@ -1231,20 +1298,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
static inline int putu16_iotlb(const struct vringh *vrh,
__virtio16 *p, u16 val)
{
- struct bio_vec iov;
- void *kaddr, *to;
+ struct iotlb_vec ivec;
+ union {
+ struct iovec iovec;
+ struct bio_vec bvec;
+ } iov;
+ __virtio16 tmp;
int ret;

+ ivec.iov.iovec = &iov.iovec;
+ ivec.count = 1;
+ ivec.is_iovec = vrh->use_va;
+
/* Atomic write is needed for putu16 */
- ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
- &iov, 1, VHOST_MAP_WO);
+ ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
+ NULL, &ivec, VHOST_MAP_RO);
if (ret < 0)
return ret;

- kaddr = kmap_local_page(iov.bv_page);
- to = kaddr + iov.bv_offset;
- WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
- kunmap_local(kaddr);
+ tmp = cpu_to_vringh16(vrh, val);
+
+ if (ivec.is_iovec) {
+ ret = __put_user(tmp, (__virtio16 __user *)ivec.iov.iovec[0].iov_base);
+ if (ret)
+ return ret;
+ } else {
+ void *kaddr = kmap_local_page(ivec.iov.bvec[0].bv_page);
+ void *to = kaddr + ivec.iov.bvec[0].bv_offset;
+
+ WRITE_ONCE(*(__virtio16 *)to, tmp);
+ kunmap_local(kaddr);
+ }

return 0;
}
@@ -1306,6 +1390,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
* @features: the feature bits for this ring.
* @num: the number of elements.
* @weak_barriers: true if we only need memory barriers, not I/O.
+ * @use_va: true if IOTLB contains user VA
* @desc: the userpace descriptor pointer.
* @avail: the userpace avail pointer.
* @used: the userpace used pointer.
@@ -1313,11 +1398,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
* Returns an error if num is invalid.
*/
int vringh_init_iotlb(struct vringh *vrh, u64 features,
- unsigned int num, bool weak_barriers,
+ unsigned int num, bool weak_barriers, bool use_va,
struct vring_desc *desc,
struct vring_avail *avail,
struct vring_used *used)
{
+ vrh->use_va = use_va;
+
return vringh_init_kern(vrh, features, num, weak_barriers,
desc, avail, used);
}
--
2.39.2


2023-03-21 15:49:03

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 5/8] vdpa_sim: make devices agnostic for work management

Let's move work management inside the vdpa_sim core.
This way we can easily change how we manage the works, without
having to change the devices each time.

Acked-by: Eugenio Pérez Martin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 +++++++++++++++--
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 6 ++----
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 6 ++----
4 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 144858636c10..acee20faaf6a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
u32 ngroups;
u32 nas;

- work_func_t work_fn;
+ void (*work_fn)(struct vdpasim *vdpasim);
void (*get_config)(struct vdpasim *vdpasim, void *config);
void (*set_config)(struct vdpasim *vdpasim, const void *config);
int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
@@ -78,6 +78,7 @@ struct vdpasim {

struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
const struct vdpa_dev_set_config *config);
+void vdpasim_schedule_work(struct vdpasim *vdpasim);

/* TODO: cross-endian support */
static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 47cdf2a1f5b8..f6329900e61a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -127,6 +127,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
static const struct vdpa_config_ops vdpasim_config_ops;
static const struct vdpa_config_ops vdpasim_batch_config_ops;

+static void vdpasim_work_fn(struct work_struct *work)
+{
+ struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
+
+ vdpasim->dev_attr.work_fn(vdpasim);
+}
+
struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
const struct vdpa_dev_set_config *config)
{
@@ -163,7 +170,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,

vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
- INIT_WORK(&vdpasim->work, dev_attr->work_fn);
+ INIT_WORK(&vdpasim->work, vdpasim_work_fn);
spin_lock_init(&vdpasim->lock);
spin_lock_init(&vdpasim->iommu_lock);

@@ -214,6 +221,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
}
EXPORT_SYMBOL_GPL(vdpasim_create);

+void vdpasim_schedule_work(struct vdpasim *vdpasim)
+{
+ schedule_work(&vdpasim->work);
+}
+EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
+
static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
u64 desc_area, u64 driver_area,
u64 device_area)
@@ -248,7 +261,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
}

if (vq->ready)
- schedule_work(&vdpasim->work);
+ vdpasim_schedule_work(vdpasim);
}

static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index 5117959bed8a..eb4897c8541e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -11,7 +11,6 @@
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
-#include <linux/sched.h>
#include <linux/blkdev.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
@@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
return handled;
}

-static void vdpasim_blk_work(struct work_struct *work)
+static void vdpasim_blk_work(struct vdpasim *vdpasim)
{
- struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
bool reschedule = false;
int i;

@@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
spin_unlock(&vdpasim->lock);

if (reschedule)
- schedule_work(&vdpasim->work);
+ vdpasim_schedule_work(vdpasim);
}

static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index 862f405362de..e61a9ecbfafe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -11,7 +11,6 @@
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
-#include <linux/sched.h>
#include <linux/etherdevice.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
@@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
u64_stats_update_end(&net->cq_stats.syncp);
}

-static void vdpasim_net_work(struct work_struct *work)
+static void vdpasim_net_work(struct vdpasim *vdpasim)
{
- struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
struct vdpasim_net *net = sim_to_net(vdpasim);
@@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
vdpasim_net_complete(rxq, write);

if (tx_pkts > 4) {
- schedule_work(&vdpasim->work);
+ vdpasim_schedule_work(vdpasim);
goto out;
}
}
--
2.39.2


2023-03-21 15:49:08

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 6/8] vdpa_sim: use kthread worker

Let's use our own kthread to run device jobs.
This allows us more flexibility, especially we can attach the kthread
to the user address space when vDPA uses user's VA.

Acked-by: Jason Wang <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---

Notes:
v3:
- fix `dev` not initialized in the error path [Simon Horman]

drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 19 +++++++++++++------
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index acee20faaf6a..ce83f9130a5d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
struct vdpasim {
struct vdpa_device vdpa;
struct vdpasim_virtqueue *vqs;
- struct work_struct work;
+ struct kthread_worker *worker;
+ struct kthread_work work;
struct vdpasim_dev_attr dev_attr;
/* spinlock to synchronize virtqueue state */
spinlock_t lock;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index f6329900e61a..1cfa56c52e5a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -11,8 +11,8 @@
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
+#include <linux/kthread.h>
#include <linux/slab.h>
-#include <linux/sched.h>
#include <linux/dma-map-ops.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
@@ -127,7 +127,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
static const struct vdpa_config_ops vdpasim_config_ops;
static const struct vdpa_config_ops vdpasim_batch_config_ops;

-static void vdpasim_work_fn(struct work_struct *work)
+static void vdpasim_work_fn(struct kthread_work *work)
{
struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);

@@ -170,11 +170,17 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,

vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
- INIT_WORK(&vdpasim->work, vdpasim_work_fn);
+ dev = &vdpasim->vdpa.dev;
+
+ kthread_init_work(&vdpasim->work, vdpasim_work_fn);
+ vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
+ dev_attr->name);
+ if (IS_ERR(vdpasim->worker))
+ goto err_iommu;
+
spin_lock_init(&vdpasim->lock);
spin_lock_init(&vdpasim->iommu_lock);

- dev = &vdpasim->vdpa.dev;
dev->dma_mask = &dev->coherent_dma_mask;
if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
goto err_iommu;
@@ -223,7 +229,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);

void vdpasim_schedule_work(struct vdpasim *vdpasim)
{
- schedule_work(&vdpasim->work);
+ kthread_queue_work(vdpasim->worker, &vdpasim->work);
}
EXPORT_SYMBOL_GPL(vdpasim_schedule_work);

@@ -623,7 +629,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;

- cancel_work_sync(&vdpasim->work);
+ kthread_cancel_work_sync(&vdpasim->work);
+ kthread_destroy_worker(vdpasim->worker);

for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
--
2.39.2


2023-03-21 15:49:11

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 8/8] vdpa_sim: add support for user VA

The new "use_va" module parameter (default: true) is used in
vdpa_alloc_device() to inform the vDPA framework that the device
supports VA.

vringh is initialized to use VA only when "use_va" is true and the
user's mm has been bound. So, only when the bus supports user VA
(e.g. vhost-vdpa).

vdpasim_mm_work_fn work is used to serialize the binding to a new
address space when the .bind_mm callback is invoked, and unbinding
when the .unbind_mm callback is invoked.

Call mmget_not_zero()/kthread_use_mm() inside the worker function
to pin the address space only as long as needed, following the
documentation of mmget() in include/linux/sched/mm.h:

* Never use this function to pin this address space for an
* unbounded/indefinite amount of time.

Signed-off-by: Stefano Garzarella <[email protected]>
---

Notes:
v3:
- called mmget_not_zero() before kthread_use_mm() [Jason]
As the documentation of mmget() in include/linux/sched/mm.h says:

* Never use this function to pin this address space for an
* unbounded/indefinite amount of time.

I moved mmget_not_zero/kthread_use_mm inside the worker function,
this way we pin the address space only as long as needed.
This is similar to what vfio_iommu_type1_dma_rw_chunk() does in
drivers/vfio/vfio_iommu_type1.c
- simplified the mm bind/unbind [Jason]
- renamed vdpasim_worker_change_mm_sync() [Jason]
- fix commit message (s/default: false/default: true)
v2:
- `use_va` set to true by default [Eugenio]
- supported the new unbind_mm callback [Jason]
- removed the unbind_mm call in vdpasim_do_reset() [Jason]
- avoided to release the lock while call kthread_flush_work() since we
are now using a mutex to protect the device state

drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++++++++++++++++++++++++++++++-
2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 4774292fba8c..3a42887d05d9 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -59,6 +59,7 @@ struct vdpasim {
struct vdpasim_virtqueue *vqs;
struct kthread_worker *worker;
struct kthread_work work;
+ struct mm_struct *mm_bound;
struct vdpasim_dev_attr dev_attr;
/* mutex to synchronize virtqueue state */
struct mutex mutex;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index ab4cfb82c237..23c891cdcd54 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444);
MODULE_PARM_DESC(max_iotlb_entries,
"Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");

+static bool use_va = true;
+module_param(use_va, bool, 0444);
+MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
+
#define VDPASIM_QUEUE_ALIGN PAGE_SIZE
#define VDPASIM_QUEUE_MAX 256
#define VDPASIM_VENDOR_ID 0

+struct vdpasim_mm_work {
+ struct kthread_work work;
+ struct vdpasim *vdpasim;
+ struct mm_struct *mm_to_bind;
+ int ret;
+};
+
+static void vdpasim_mm_work_fn(struct kthread_work *work)
+{
+ struct vdpasim_mm_work *mm_work =
+ container_of(work, struct vdpasim_mm_work, work);
+ struct vdpasim *vdpasim = mm_work->vdpasim;
+
+ mm_work->ret = 0;
+
+ //TODO: should we attach the cgroup of the mm owner?
+ vdpasim->mm_bound = mm_work->mm_to_bind;
+}
+
+static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
+ struct vdpasim_mm_work *mm_work)
+{
+ struct kthread_work *work = &mm_work->work;
+
+ kthread_init_work(work, vdpasim_mm_work_fn);
+ kthread_queue_work(vdpasim->worker, work);
+
+ kthread_flush_work(work);
+}
+
static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
{
return container_of(vdpa, struct vdpasim, vdpa);
@@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
{
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
uint16_t last_avail_idx = vq->vring.last_avail_idx;
+ bool va_enabled = use_va && vdpasim->mm_bound;

- vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
+ vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
+ va_enabled,
(struct vring_desc *)(uintptr_t)vq->desc_addr,
(struct vring_avail *)
(uintptr_t)vq->driver_addr,
@@ -130,8 +166,20 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops;
static void vdpasim_work_fn(struct kthread_work *work)
{
struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
+ struct mm_struct *mm = vdpasim->mm_bound;
+
+ if (mm) {
+ if (!mmget_not_zero(mm))
+ return;
+ kthread_use_mm(mm);
+ }

vdpasim->dev_attr.work_fn(vdpasim);
+
+ if (mm) {
+ kthread_unuse_mm(mm);
+ mmput(mm);
+ }
}

struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
@@ -162,7 +210,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
vdpa = __vdpa_alloc_device(NULL, ops,
dev_attr->ngroups, dev_attr->nas,
dev_attr->alloc_size,
- dev_attr->name, false);
+ dev_attr->name, use_va);
if (IS_ERR(vdpa)) {
ret = PTR_ERR(vdpa);
goto err_alloc;
@@ -582,6 +630,30 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
return ret;
}

+static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
+{
+ struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+ struct vdpasim_mm_work mm_work;
+
+ mm_work.vdpasim = vdpasim;
+ mm_work.mm_to_bind = mm;
+
+ vdpasim_worker_change_mm_sync(vdpasim, &mm_work);
+
+ return mm_work.ret;
+}
+
+static void vdpasim_unbind_mm(struct vdpa_device *vdpa)
+{
+ struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+ struct vdpasim_mm_work mm_work;
+
+ mm_work.vdpasim = vdpasim;
+ mm_work.mm_to_bind = NULL;
+
+ vdpasim_worker_change_mm_sync(vdpasim, &mm_work);
+}
+
static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,
u64 iova, u64 size,
u64 pa, u32 perm, void *opaque)
@@ -678,6 +750,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
.set_group_asid = vdpasim_set_group_asid,
.dma_map = vdpasim_dma_map,
.dma_unmap = vdpasim_dma_unmap,
+ .bind_mm = vdpasim_bind_mm,
+ .unbind_mm = vdpasim_unbind_mm,
.free = vdpasim_free,
};

@@ -712,6 +786,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
.get_iova_range = vdpasim_get_iova_range,
.set_group_asid = vdpasim_set_group_asid,
.set_map = vdpasim_set_map,
+ .bind_mm = vdpasim_bind_mm,
+ .unbind_mm = vdpasim_unbind_mm,
.free = vdpasim_free,
};

--
2.39.2


2023-03-21 15:49:24

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v3 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state

The spinlock we use to protect the state of the simulator is sometimes
held for a long time (for example, when devices handle requests).

This also prevents us from calling functions that might sleep (such as
kthread_flush_work() in the next patch), and thus having to release
and retake the lock.

For these reasons, let's replace the spinlock with a mutex that gives
us more flexibility.

Suggested-by: Jason Wang <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vdpa/vdpa_sim/vdpa_sim.h | 4 ++--
drivers/vdpa/vdpa_sim/vdpa_sim.c | 34 ++++++++++++++--------------
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 ++--
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 4 ++--
4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index ce83f9130a5d..4774292fba8c 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -60,8 +60,8 @@ struct vdpasim {
struct kthread_worker *worker;
struct kthread_work work;
struct vdpasim_dev_attr dev_attr;
- /* spinlock to synchronize virtqueue state */
- spinlock_t lock;
+ /* mutex to synchronize virtqueue state */
+ struct mutex mutex;
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 1cfa56c52e5a..ab4cfb82c237 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -178,7 +178,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
if (IS_ERR(vdpasim->worker))
goto err_iommu;

- spin_lock_init(&vdpasim->lock);
+ mutex_init(&vdpasim->mutex);
spin_lock_init(&vdpasim->iommu_lock);

dev->dma_mask = &dev->coherent_dma_mask;
@@ -286,13 +286,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
bool old_ready;

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
old_ready = vq->ready;
vq->ready = ready;
if (vq->ready && !old_ready) {
vdpasim_queue_ready(vdpasim, idx);
}
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
}

static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
@@ -310,9 +310,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
struct vringh *vrh = &vq->vring;

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vrh->last_avail_idx = state->split.avail_index;
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);

return 0;
}
@@ -409,9 +409,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
u8 status;

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
status = vdpasim->status;
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);

return status;
}
@@ -420,19 +420,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vdpasim->status = status;
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
}

static int vdpasim_reset(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vdpasim->status = 0;
vdpasim_do_reset(vdpasim);
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);

return 0;
}
@@ -441,9 +441,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vdpasim->running = false;
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);

return 0;
}
@@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vdpasim->running = true;

if (vdpasim->pending_kick) {
@@ -464,7 +464,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
vdpasim->pending_kick = false;
}

- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);

return 0;
}
@@ -536,14 +536,14 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,

iommu = &vdpasim->iommu[asid];

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);

for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
if (vdpasim_get_vq_group(vdpa, i) == group)
vringh_set_iotlb(&vdpasim->vqs[i].vring, iommu,
&vdpasim->iommu_lock);

- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);

return 0;
}
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index eb4897c8541e..568119e1553f 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -290,7 +290,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
bool reschedule = false;
int i;

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);

if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
goto out;
@@ -321,7 +321,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
}
}
out:
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);

if (reschedule)
vdpasim_schedule_work(vdpasim);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index e61a9ecbfafe..7ab434592bfe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -201,7 +201,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0;
int err;

- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);

if (!vdpasim->running)
goto out;
@@ -264,7 +264,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
}

out:
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);

u64_stats_update_begin(&net->tx_stats.syncp);
net->tx_stats.pkts += tx_pkts;
--
2.39.2


2023-03-22 10:38:47

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] vringh: replace kmap_atomic() with kmap_local_page()

On marted? 21 marzo 2023 16:42:23 CET Stefano Garzarella wrote:
> kmap_atomic() is deprecated in favor of kmap_local_page() since commit
> f3ba3c710ac5 ("mm/highmem: Provide kmap_local*").
>
> With kmap_local_page() the mappings are per thread, CPU local, can take
> page-faults, and can be called from any context (including interrupts).
> Furthermore, the tasks can be preempted and, when they are scheduled to
> run again, the kernel virtual addresses are restored and still valid.
>
> kmap_atomic() is implemented like a kmap_local_page() which also disables
> page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> otherwise it only disables migration).
>
> The code within the mappings/un-mappings in getu16_iotlb() and
> putu16_iotlb() don't depend on the above-mentioned side effects of
> kmap_atomic(), so that mere replacements of the old API with the new one
> is all that is required (i.e., there is no need to explicitly add calls
> to pagefault_disable() and/or preempt_disable()).
>
> This commit reuses a "boiler plate" commit message from Fabio, who has
> already did this change in several places.
>

FWIW, I can confirm that the conversions here are safe...

Reviewed-by: Fabio M. De Francesco <[email protected]>

Thanks,

Fabio

P.S.: I had to send this message again because my former contained HTML parts
and so it was rejected by the mailing lists. I don't yet know how HTML crept
into my text.

> Cc: "Fabio M. De Francesco" <[email protected]>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v3:
> - credited Fabio for the commit message
> - added reference to the commit that deprecated kmap_atomic() [Jason]
> v2:
> - added this patch since checkpatch.pl complained about deprecation
> of kmap_atomic() touched by next patch
>
> drivers/vhost/vringh.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index a1e27da54481..0ba3ef809e48 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh
> *vrh, if (ret < 0)
> return ret;
>
> - kaddr = kmap_atomic(iov.bv_page);
> + kaddr = kmap_local_page(iov.bv_page);
> from = kaddr + iov.bv_offset;
> *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>
> return 0;
> }
> @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh
> *vrh, if (ret < 0)
> return ret;
>
> - kaddr = kmap_atomic(iov.bv_page);
> + kaddr = kmap_local_page(iov.bv_page);
> to = kaddr + iov.bv_offset;
> WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>
> return 0;
> }
> --
> 2.39.2




2023-03-23 03:03:55

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] vdpa: add bind_mm/unbind_mm callbacks

On Tue, Mar 21, 2023 at 11:42 PM Stefano Garzarella <[email protected]> wrote:
>
> These new optional callbacks is used to bind/unbind the device to
> a specific address space so the vDPA framework can use VA when
> these callbacks are implemented.
>
> Suggested-by: Jason Wang <[email protected]>
> Signed-off-by: Stefano Garzarella <[email protected]>

Acked-by: Jason Wang <[email protected]>

Thanks

> ---
>
> Notes:
> v2:
> - removed `struct task_struct *owner` param (unused for now, maybe
> useful to support cgroups) [Jason]
> - add unbind_mm callback [Jason]
>
> include/linux/vdpa.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 43f59ef10cc9..369c21394284 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -290,6 +290,14 @@ struct vdpa_map_file {
> * @vdev: vdpa device
> * @idx: virtqueue index
> * Returns pointer to structure device or error (NULL)
> + * @bind_mm: Bind the device to a specific address space
> + * so the vDPA framework can use VA when this
> + * callback is implemented. (optional)
> + * @vdev: vdpa device
> + * @mm: address space to bind
> + * @unbind_mm: Unbind the device from the address space
> + * bound using the bind_mm callback. (optional)
> + * @vdev: vdpa device
> * @free: Free resources that belongs to vDPA (optional)
> * @vdev: vdpa device
> */
> @@ -351,6 +359,8 @@ struct vdpa_config_ops {
> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
> unsigned int asid);
> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
> + int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
> + void (*unbind_mm)(struct vdpa_device *vdev);
>
> /* Free device resources */
> void (*free)(struct vdpa_device *vdev);
> --
> 2.39.2
>

2023-03-23 03:19:31

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] vringh: replace kmap_atomic() with kmap_local_page()

On Tue, Mar 21, 2023 at 11:42 PM Stefano Garzarella <[email protected]> wrote:
>
> kmap_atomic() is deprecated in favor of kmap_local_page() since commit
> f3ba3c710ac5 ("mm/highmem: Provide kmap_local*").
>
> With kmap_local_page() the mappings are per thread, CPU local, can take
> page-faults, and can be called from any context (including interrupts).
> Furthermore, the tasks can be preempted and, when they are scheduled to
> run again, the kernel virtual addresses are restored and still valid.
>
> kmap_atomic() is implemented like a kmap_local_page() which also disables
> page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> otherwise it only disables migration).
>
> The code within the mappings/un-mappings in getu16_iotlb() and
> putu16_iotlb() don't depend on the above-mentioned side effects of
> kmap_atomic(), so that mere replacements of the old API with the new one
> is all that is required (i.e., there is no need to explicitly add calls
> to pagefault_disable() and/or preempt_disable()).
>
> This commit reuses a "boiler plate" commit message from Fabio, who has
> already did this change in several places.
>
> Cc: "Fabio M. De Francesco" <[email protected]>
> Signed-off-by: Stefano Garzarella <[email protected]>

Acked-by: Jason Wang <[email protected]>

Thanks

> ---
>
> Notes:
> v3:
> - credited Fabio for the commit message
> - added reference to the commit that deprecated kmap_atomic() [Jason]
> v2:
> - added this patch since checkpatch.pl complained about deprecation
> of kmap_atomic() touched by next patch
>
> drivers/vhost/vringh.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index a1e27da54481..0ba3ef809e48 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh *vrh,
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_atomic(iov.bv_page);
> + kaddr = kmap_local_page(iov.bv_page);
> from = kaddr + iov.bv_offset;
> *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>
> return 0;
> }
> @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh *vrh,
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_atomic(iov.bv_page);
> + kaddr = kmap_local_page(iov.bv_page);
> to = kaddr + iov.bv_offset;
> WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>
> return 0;
> }
> --
> 2.39.2
>

2023-03-23 03:28:20

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks

On Tue, Mar 21, 2023 at 11:42 PM Stefano Garzarella <[email protected]> wrote:
>
> When the user call VHOST_SET_OWNER ioctl and the vDPA device
> has `use_va` set to true, let's call the bind_mm callback.
> In this way we can bind the device to the user address space
> and directly use the user VA.
>
> The unbind_mm callback is called during the release after
> stopping the device.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v3:
> - added `case VHOST_SET_OWNER` in vhost_vdpa_unlocked_ioctl() [Jason]
> v2:
> - call the new unbind_mm callback during the release [Jason]
> - avoid to call bind_mm callback after the reset, since the device
> is not detaching it now during the reset
>
> drivers/vhost/vdpa.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 7be9d9d8f01c..20250c3418b2 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
> return vdpa_reset(vdpa);
> }
>
> +static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> +{
> + struct vdpa_device *vdpa = v->vdpa;
> + const struct vdpa_config_ops *ops = vdpa->config;
> +
> + if (!vdpa->use_va || !ops->bind_mm)
> + return 0;
> +
> + return ops->bind_mm(vdpa, v->vdev.mm);
> +}
> +
> +static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
> +{
> + struct vdpa_device *vdpa = v->vdpa;
> + const struct vdpa_config_ops *ops = vdpa->config;
> +
> + if (!vdpa->use_va || !ops->unbind_mm)
> + return;
> +
> + ops->unbind_mm(vdpa);
> +}
> +
> static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
> {
> struct vdpa_device *vdpa = v->vdpa;
> @@ -709,6 +731,14 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> case VHOST_VDPA_RESUME:
> r = vhost_vdpa_resume(v);
> break;
> + case VHOST_SET_OWNER:
> + r = vhost_dev_set_owner(d);

Nit:

I'd stick to the current way of passing the cmd, argp to
vhost_dev_ioctl() and introduce a new switch after the
vhost_dev_ioctl().

In this way, we are immune to any possible changes of dealing with
VHOST_SET_OWNER in vhost core.

Others look good.

Thanks

> + if (r)
> + break;
> + r = vhost_vdpa_bind_mm(v);
> + if (r)
> + vhost_dev_reset_owner(d, NULL);
> + break;
> default:
> r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> if (r == -ENOIOCTLCMD)
> @@ -1287,6 +1317,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)
> vhost_vdpa_clean_irq(v);
> vhost_vdpa_reset(v);
> vhost_dev_stop(&v->vdev);
> + vhost_vdpa_unbind_mm(v);
> vhost_vdpa_config_put(v);
> vhost_vdpa_cleanup(v);
> mutex_unlock(&d->mutex);
> --
> 2.39.2
>

2023-03-23 03:43:33

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vringh: support VA with iotlb

On Tue, Mar 21, 2023 at 11:43 PM Stefano Garzarella <[email protected]> wrote:
>
> vDPA supports the possibility to use user VA in the iotlb messages.
> So, let's add support for user VA in vringh to use it in the vDPA
> simulators.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v3:
> - refactored avoiding code duplication [Eugenio]
> v2:
> - replace kmap_atomic() with kmap_local_page() [see previous patch]
> - fix cast warnings when build with W=1 C=1
>
> include/linux/vringh.h | 5 +-
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
> drivers/vhost/vringh.c | 153 +++++++++++++++++++++++-------
> 4 files changed, 127 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 1991a02c6431..d39b9f2dcba0 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -32,6 +32,9 @@ struct vringh {
> /* Can we get away with weak barriers? */
> bool weak_barriers;
>
> + /* Use user's VA */
> + bool use_va;
> +
> /* Last available index we saw (ie. where we're up to). */
> u16 last_avail_idx;
>
> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
> spinlock_t *iotlb_lock);
>
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> - unsigned int num, bool weak_barriers,
> + unsigned int num, bool weak_barriers, bool use_va,
> struct vring_desc *desc,
> struct vring_avail *avail,
> struct vring_used *used);
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 520646ae7fa0..dfd0e000217b 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>
> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> - MLX5_CVQ_MAX_ENT, false,
> + MLX5_CVQ_MAX_ENT, false, false,

To avoid those changes, would it be better to introduce
vringh_init_iotlb_va() so vringh_init_iotlb() can stick to pa.

> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
> (struct vring_used *)(uintptr_t)cvq->device_addr);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index eea23c630f7c..47cdf2a1f5b8 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>
> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> (struct vring_desc *)(uintptr_t)vq->desc_addr,
> (struct vring_avail *)
> (uintptr_t)vq->driver_addr,
> @@ -92,7 +92,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
> vq->cb = NULL;
> vq->private = NULL;
> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>
> vq->vring.notify = NULL;
> }
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..72c88519329a 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1094,10 +1094,18 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>
> +struct iotlb_vec {
> + union {
> + struct iovec *iovec;
> + struct bio_vec *bvec;
> + } iov;
> + size_t count;
> + bool is_iovec;

I wonder if this is needed (if vringh is passed to every iotlb_vec helper).

> +};
> +
> static int iotlb_translate(const struct vringh *vrh,
> u64 addr, u64 len, u64 *translated,
> - struct bio_vec iov[],
> - int iov_size, u32 perm)
> + struct iotlb_vec *ivec, u32 perm)
> {
> struct vhost_iotlb_map *map;
> struct vhost_iotlb *iotlb = vrh->iotlb;
> @@ -1107,9 +1115,9 @@ static int iotlb_translate(const struct vringh *vrh,
> spin_lock(vrh->iotlb_lock);
>
> while (len > s) {
> - u64 size, pa, pfn;
> + u64 size;
>
> - if (unlikely(ret >= iov_size)) {
> + if (unlikely(ret >= ivec->count)) {
> ret = -ENOBUFS;
> break;
> }
> @@ -1124,10 +1132,22 @@ static int iotlb_translate(const struct vringh *vrh,
> }
>
> size = map->size - addr + map->start;
> - pa = map->addr + addr - map->start;
> - pfn = pa >> PAGE_SHIFT;
> - bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
> - pa & (PAGE_SIZE - 1));
> + if (ivec->is_iovec) {
> + struct iovec *iovec = ivec->iov.iovec;
> +
> + iovec[ret].iov_len = min(len - s, size);
> + iovec[ret].iov_base = (void __user *)(unsigned long)
> + (map->addr + addr - map->start);

map->addr - map->start to avoid overflow?

> + } else {
> + u64 pa = map->addr + addr - map->start;
> + u64 pfn = pa >> PAGE_SHIFT;
> + struct bio_vec *bvec = ivec->iov.bvec;
> +
> + bvec_set_page(&bvec[ret], pfn_to_page(pfn),
> + min(len - s, size),
> + pa & (PAGE_SIZE - 1));
> + }
> +
> s += size;
> addr += size;
> ++ret;
> @@ -1141,26 +1161,42 @@ static int iotlb_translate(const struct vringh *vrh,
> return ret;
> }
>
> +#define IOTLB_IOV_SIZE 16
> +
> static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
> void *src, size_t len)
> {
> + struct iotlb_vec ivec;
> + union {
> + struct iovec iovec[IOTLB_IOV_SIZE];
> + struct bio_vec bvec[IOTLB_IOV_SIZE];
> + } iov;
> u64 total_translated = 0;
>
> + ivec.iov.iovec = iov.iovec;

ivc.iov = iov ?

Others look good.

Thanks

> + ivec.count = IOTLB_IOV_SIZE;
> + ivec.is_iovec = vrh->use_va;
> +
> while (total_translated < len) {
> - struct bio_vec iov[16];
> struct iov_iter iter;
> u64 translated;
> int ret;
>
> ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
> len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> + &ivec, VHOST_MAP_RO);
> if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> + ret = IOTLB_IOV_SIZE;
> else if (ret < 0)
> return ret;
>
> - iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> + if (ivec.is_iovec) {
> + iov_iter_init(&iter, ITER_SOURCE, ivec.iov.iovec, ret,
> + translated);
> + } else {
> + iov_iter_bvec(&iter, ITER_SOURCE, ivec.iov.bvec, ret,
> + translated);
> + }
>
> ret = copy_from_iter(dst, translated, &iter);
> if (ret < 0)
> @@ -1177,23 +1213,37 @@ static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
> static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> void *src, size_t len)
> {
> + struct iotlb_vec ivec;
> + union {
> + struct iovec iovec[IOTLB_IOV_SIZE];
> + struct bio_vec bvec[IOTLB_IOV_SIZE];
> + } iov;
> u64 total_translated = 0;
>
> + ivec.iov.iovec = iov.iovec;
> + ivec.count = IOTLB_IOV_SIZE;
> + ivec.is_iovec = vrh->use_va;
> +
> while (total_translated < len) {
> - struct bio_vec iov[16];
> struct iov_iter iter;
> u64 translated;
> int ret;
>
> ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
> len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> + &ivec, VHOST_MAP_WO);
> if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> + ret = IOTLB_IOV_SIZE;
> else if (ret < 0)
> return ret;
>
> - iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> + if (ivec.is_iovec) {
> + iov_iter_init(&iter, ITER_DEST, ivec.iov.iovec, ret,
> + translated);
> + } else {
> + iov_iter_bvec(&iter, ITER_DEST, ivec.iov.bvec, ret,
> + translated);
> + }
>
> ret = copy_to_iter(src, translated, &iter);
> if (ret < 0)
> @@ -1210,20 +1260,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> static inline int getu16_iotlb(const struct vringh *vrh,
> u16 *val, const __virtio16 *p)
> {
> - struct bio_vec iov;
> - void *kaddr, *from;
> + struct iotlb_vec ivec;
> + union {
> + struct iovec iovec[1];
> + struct bio_vec bvec[1];
> + } iov;
> + __virtio16 tmp;
> int ret;
>
> + ivec.iov.iovec = iov.iovec;
> + ivec.count = 1;
> + ivec.is_iovec = vrh->use_va;
> +
> /* Atomic read is needed for getu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_RO);
> + ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &ivec, VHOST_MAP_RO);
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - from = kaddr + iov.bv_offset;
> - *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_local(kaddr);
> + if (ivec.is_iovec) {
> + ret = __get_user(tmp, (__virtio16 __user *)ivec.iov.iovec[0].iov_base);
> + if (ret)
> + return ret;
> + } else {
> + void *kaddr = kmap_local_page(ivec.iov.bvec[0].bv_page);
> + void *from = kaddr + ivec.iov.bvec[0].bv_offset;
> +
> + tmp = READ_ONCE(*(__virtio16 *)from);
> + kunmap_local(kaddr);
> + }
> +
> + *val = vringh16_to_cpu(vrh, tmp);
>
> return 0;
> }
> @@ -1231,20 +1298,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
> static inline int putu16_iotlb(const struct vringh *vrh,
> __virtio16 *p, u16 val)
> {
> - struct bio_vec iov;
> - void *kaddr, *to;
> + struct iotlb_vec ivec;
> + union {
> + struct iovec iovec;
> + struct bio_vec bvec;
> + } iov;
> + __virtio16 tmp;
> int ret;
>
> + ivec.iov.iovec = &iov.iovec;
> + ivec.count = 1;
> + ivec.is_iovec = vrh->use_va;
> +
> /* Atomic write is needed for putu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_WO);
> + ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &ivec, VHOST_MAP_RO);
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - to = kaddr + iov.bv_offset;
> - WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_local(kaddr);
> + tmp = cpu_to_vringh16(vrh, val);
> +
> + if (ivec.is_iovec) {
> + ret = __put_user(tmp, (__virtio16 __user *)ivec.iov.iovec[0].iov_base);
> + if (ret)
> + return ret;
> + } else {
> + void *kaddr = kmap_local_page(ivec.iov.bvec[0].bv_page);
> + void *to = kaddr + ivec.iov.bvec[0].bv_offset;
> +
> + WRITE_ONCE(*(__virtio16 *)to, tmp);
> + kunmap_local(kaddr);
> + }
>
> return 0;
> }
> @@ -1306,6 +1390,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
> * @features: the feature bits for this ring.
> * @num: the number of elements.
> * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @use_va: true if IOTLB contains user VA
> * @desc: the userpace descriptor pointer.
> * @avail: the userpace avail pointer.
> * @used: the userpace used pointer.
> @@ -1313,11 +1398,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
> * Returns an error if num is invalid.
> */
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> - unsigned int num, bool weak_barriers,
> + unsigned int num, bool weak_barriers, bool use_va,
> struct vring_desc *desc,
> struct vring_avail *avail,
> struct vring_used *used)
> {
> + vrh->use_va = use_va;
> +
> return vringh_init_kern(vrh, features, num, weak_barriers,
> desc, avail, used);
> }
> --
> 2.39.2
>

2023-03-23 03:48:35

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] vdpa_sim: add support for user VA

On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <[email protected]> wrote:
>
> The new "use_va" module parameter (default: true) is used in
> vdpa_alloc_device() to inform the vDPA framework that the device
> supports VA.
>
> vringh is initialized to use VA only when "use_va" is true and the
> user's mm has been bound. So, only when the bus supports user VA
> (e.g. vhost-vdpa).
>
> vdpasim_mm_work_fn work is used to serialize the binding to a new
> address space when the .bind_mm callback is invoked, and unbinding
> when the .unbind_mm callback is invoked.
>
> Call mmget_not_zero()/kthread_use_mm() inside the worker function
> to pin the address space only as long as needed, following the
> documentation of mmget() in include/linux/sched/mm.h:
>
> * Never use this function to pin this address space for an
> * unbounded/indefinite amount of time.

I wonder if everything would be simplified if we just allow the parent
to advertise whether or not it requires the address space.

Then when vhost-vDPA probes the device it can simply advertise
use_work as true so vhost core can use get_task_mm() in this case?

Thanks

>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v3:
> - called mmget_not_zero() before kthread_use_mm() [Jason]
> As the documentation of mmget() in include/linux/sched/mm.h says:
>
> * Never use this function to pin this address space for an
> * unbounded/indefinite amount of time.
>
> I moved mmget_not_zero/kthread_use_mm inside the worker function,
> this way we pin the address space only as long as needed.
> This is similar to what vfio_iommu_type1_dma_rw_chunk() does in
> drivers/vfio/vfio_iommu_type1.c
> - simplified the mm bind/unbind [Jason]
> - renamed vdpasim_worker_change_mm_sync() [Jason]
> - fix commit message (s/default: false/default: true)
> v2:
> - `use_va` set to true by default [Eugenio]
> - supported the new unbind_mm callback [Jason]
> - removed the unbind_mm call in vdpasim_do_reset() [Jason]
> - avoided to release the lock while call kthread_flush_work() since we
> are now using a mutex to protect the device state
>
> drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++++++++++++++++++++++++++++++-
> 2 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 4774292fba8c..3a42887d05d9 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -59,6 +59,7 @@ struct vdpasim {
> struct vdpasim_virtqueue *vqs;
> struct kthread_worker *worker;
> struct kthread_work work;
> + struct mm_struct *mm_bound;
> struct vdpasim_dev_attr dev_attr;
> /* mutex to synchronize virtqueue state */
> struct mutex mutex;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index ab4cfb82c237..23c891cdcd54 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444);
> MODULE_PARM_DESC(max_iotlb_entries,
> "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
>
> +static bool use_va = true;
> +module_param(use_va, bool, 0444);
> +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
> +
> #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
> #define VDPASIM_QUEUE_MAX 256
> #define VDPASIM_VENDOR_ID 0
>
> +struct vdpasim_mm_work {
> + struct kthread_work work;
> + struct vdpasim *vdpasim;
> + struct mm_struct *mm_to_bind;
> + int ret;
> +};
> +
> +static void vdpasim_mm_work_fn(struct kthread_work *work)
> +{
> + struct vdpasim_mm_work *mm_work =
> + container_of(work, struct vdpasim_mm_work, work);
> + struct vdpasim *vdpasim = mm_work->vdpasim;
> +
> + mm_work->ret = 0;
> +
> + //TODO: should we attach the cgroup of the mm owner?
> + vdpasim->mm_bound = mm_work->mm_to_bind;
> +}
> +
> +static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
> + struct vdpasim_mm_work *mm_work)
> +{
> + struct kthread_work *work = &mm_work->work;
> +
> + kthread_init_work(work, vdpasim_mm_work_fn);
> + kthread_queue_work(vdpasim->worker, work);
> +
> + kthread_flush_work(work);
> +}
> +
> static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
> {
> return container_of(vdpa, struct vdpasim, vdpa);
> @@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> {
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> uint16_t last_avail_idx = vq->vring.last_avail_idx;
> + bool va_enabled = use_va && vdpasim->mm_bound;
>
> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> + va_enabled,
> (struct vring_desc *)(uintptr_t)vq->desc_addr,
> (struct vring_avail *)
> (uintptr_t)vq->driver_addr,
> @@ -130,8 +166,20 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops;
> static void vdpasim_work_fn(struct kthread_work *work)
> {
> struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> + struct mm_struct *mm = vdpasim->mm_bound;
> +
> + if (mm) {
> + if (!mmget_not_zero(mm))
> + return;
> + kthread_use_mm(mm);
> + }
>
> vdpasim->dev_attr.work_fn(vdpasim);
> +
> + if (mm) {
> + kthread_unuse_mm(mm);
> + mmput(mm);
> + }
> }
>
> struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> @@ -162,7 +210,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> vdpa = __vdpa_alloc_device(NULL, ops,
> dev_attr->ngroups, dev_attr->nas,
> dev_attr->alloc_size,
> - dev_attr->name, false);
> + dev_attr->name, use_va);
> if (IS_ERR(vdpa)) {
> ret = PTR_ERR(vdpa);
> goto err_alloc;
> @@ -582,6 +630,30 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
> return ret;
> }
>
> +static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
> +{
> + struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> + struct vdpasim_mm_work mm_work;
> +
> + mm_work.vdpasim = vdpasim;
> + mm_work.mm_to_bind = mm;
> +
> + vdpasim_worker_change_mm_sync(vdpasim, &mm_work);
> +
> + return mm_work.ret;
> +}
> +
> +static void vdpasim_unbind_mm(struct vdpa_device *vdpa)
> +{
> + struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> + struct vdpasim_mm_work mm_work;
> +
> + mm_work.vdpasim = vdpasim;
> + mm_work.mm_to_bind = NULL;
> +
> + vdpasim_worker_change_mm_sync(vdpasim, &mm_work);
> +}
> +
> static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,
> u64 iova, u64 size,
> u64 pa, u32 perm, void *opaque)
> @@ -678,6 +750,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
> .set_group_asid = vdpasim_set_group_asid,
> .dma_map = vdpasim_dma_map,
> .dma_unmap = vdpasim_dma_unmap,
> + .bind_mm = vdpasim_bind_mm,
> + .unbind_mm = vdpasim_unbind_mm,
> .free = vdpasim_free,
> };
>
> @@ -712,6 +786,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
> .get_iova_range = vdpasim_get_iova_range,
> .set_group_asid = vdpasim_set_group_asid,
> .set_map = vdpasim_set_map,
> + .bind_mm = vdpasim_bind_mm,
> + .unbind_mm = vdpasim_unbind_mm,
> .free = vdpasim_free,
> };
>
> --
> 2.39.2
>

2023-03-23 08:19:54

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vringh: support VA with iotlb

On Tue, Mar 21, 2023 at 4:43 PM Stefano Garzarella <[email protected]> wrote:
>
> vDPA supports the possibility to use user VA in the iotlb messages.
> So, let's add support for user VA in vringh to use it in the vDPA
> simulators.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v3:
> - refactored avoiding code duplication [Eugenio]
> v2:
> - replace kmap_atomic() with kmap_local_page() [see previous patch]
> - fix cast warnings when build with W=1 C=1
>
> include/linux/vringh.h | 5 +-
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
> drivers/vhost/vringh.c | 153 +++++++++++++++++++++++-------
> 4 files changed, 127 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 1991a02c6431..d39b9f2dcba0 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -32,6 +32,9 @@ struct vringh {
> /* Can we get away with weak barriers? */
> bool weak_barriers;
>
> + /* Use user's VA */
> + bool use_va;
> +
> /* Last available index we saw (ie. where we're up to). */
> u16 last_avail_idx;
>
> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
> spinlock_t *iotlb_lock);
>
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> - unsigned int num, bool weak_barriers,
> + unsigned int num, bool weak_barriers, bool use_va,
> struct vring_desc *desc,
> struct vring_avail *avail,
> struct vring_used *used);
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 520646ae7fa0..dfd0e000217b 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>
> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> - MLX5_CVQ_MAX_ENT, false,
> + MLX5_CVQ_MAX_ENT, false, false,
> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
> (struct vring_used *)(uintptr_t)cvq->device_addr);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index eea23c630f7c..47cdf2a1f5b8 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>
> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> (struct vring_desc *)(uintptr_t)vq->desc_addr,
> (struct vring_avail *)
> (uintptr_t)vq->driver_addr,
> @@ -92,7 +92,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
> vq->cb = NULL;
> vq->private = NULL;
> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>
> vq->vring.notify = NULL;
> }
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..72c88519329a 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1094,10 +1094,18 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>
> +struct iotlb_vec {
> + union {
> + struct iovec *iovec;
> + struct bio_vec *bvec;
> + } iov;
> + size_t count;
> + bool is_iovec;
> +};
> +
> static int iotlb_translate(const struct vringh *vrh,
> u64 addr, u64 len, u64 *translated,
> - struct bio_vec iov[],
> - int iov_size, u32 perm)
> + struct iotlb_vec *ivec, u32 perm)
> {
> struct vhost_iotlb_map *map;
> struct vhost_iotlb *iotlb = vrh->iotlb;
> @@ -1107,9 +1115,9 @@ static int iotlb_translate(const struct vringh *vrh,
> spin_lock(vrh->iotlb_lock);
>
> while (len > s) {
> - u64 size, pa, pfn;
> + u64 size;
>
> - if (unlikely(ret >= iov_size)) {
> + if (unlikely(ret >= ivec->count)) {
> ret = -ENOBUFS;
> break;
> }
> @@ -1124,10 +1132,22 @@ static int iotlb_translate(const struct vringh *vrh,
> }
>
> size = map->size - addr + map->start;
> - pa = map->addr + addr - map->start;
> - pfn = pa >> PAGE_SHIFT;
> - bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
> - pa & (PAGE_SIZE - 1));
> + if (ivec->is_iovec) {
> + struct iovec *iovec = ivec->iov.iovec;
> +
> + iovec[ret].iov_len = min(len - s, size);
> + iovec[ret].iov_base = (void __user *)(unsigned long)

s/unsigned long/uintptr_t ?



> + (map->addr + addr - map->start);
> + } else {
> + u64 pa = map->addr + addr - map->start;
> + u64 pfn = pa >> PAGE_SHIFT;
> + struct bio_vec *bvec = ivec->iov.bvec;
> +
> + bvec_set_page(&bvec[ret], pfn_to_page(pfn),
> + min(len - s, size),
> + pa & (PAGE_SIZE - 1));
> + }
> +
> s += size;
> addr += size;
> ++ret;
> @@ -1141,26 +1161,42 @@ static int iotlb_translate(const struct vringh *vrh,
> return ret;
> }
>
> +#define IOTLB_IOV_SIZE 16

I'm fine with defining here, but maybe it is better to isolate the
change in a previous patch or reuse another well known macro?

Other looks good, and I agree with Jason's comments so even if the
macro declaration is not moved:

Acked-by: Eugenio Pérez <[email protected]>

> +
> static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
> void *src, size_t len)
> {
> + struct iotlb_vec ivec;
> + union {
> + struct iovec iovec[IOTLB_IOV_SIZE];
> + struct bio_vec bvec[IOTLB_IOV_SIZE];
> + } iov;
> u64 total_translated = 0;
>
> + ivec.iov.iovec = iov.iovec;
> + ivec.count = IOTLB_IOV_SIZE;
> + ivec.is_iovec = vrh->use_va;
> +
> while (total_translated < len) {
> - struct bio_vec iov[16];
> struct iov_iter iter;
> u64 translated;
> int ret;
>
> ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
> len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> + &ivec, VHOST_MAP_RO);
> if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> + ret = IOTLB_IOV_SIZE;
> else if (ret < 0)
> return ret;
>
> - iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> + if (ivec.is_iovec) {
> + iov_iter_init(&iter, ITER_SOURCE, ivec.iov.iovec, ret,
> + translated);
> + } else {
> + iov_iter_bvec(&iter, ITER_SOURCE, ivec.iov.bvec, ret,
> + translated);
> + }
>
> ret = copy_from_iter(dst, translated, &iter);
> if (ret < 0)
> @@ -1177,23 +1213,37 @@ static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
> static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> void *src, size_t len)
> {
> + struct iotlb_vec ivec;
> + union {
> + struct iovec iovec[IOTLB_IOV_SIZE];
> + struct bio_vec bvec[IOTLB_IOV_SIZE];
> + } iov;
> u64 total_translated = 0;
>
> + ivec.iov.iovec = iov.iovec;
> + ivec.count = IOTLB_IOV_SIZE;
> + ivec.is_iovec = vrh->use_va;
> +
> while (total_translated < len) {
> - struct bio_vec iov[16];
> struct iov_iter iter;
> u64 translated;
> int ret;
>
> ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
> len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> + &ivec, VHOST_MAP_WO);
> if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> + ret = IOTLB_IOV_SIZE;
> else if (ret < 0)
> return ret;
>
> - iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> + if (ivec.is_iovec) {
> + iov_iter_init(&iter, ITER_DEST, ivec.iov.iovec, ret,
> + translated);
> + } else {
> + iov_iter_bvec(&iter, ITER_DEST, ivec.iov.bvec, ret,
> + translated);
> + }
>
> ret = copy_to_iter(src, translated, &iter);
> if (ret < 0)
> @@ -1210,20 +1260,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> static inline int getu16_iotlb(const struct vringh *vrh,
> u16 *val, const __virtio16 *p)
> {
> - struct bio_vec iov;
> - void *kaddr, *from;
> + struct iotlb_vec ivec;
> + union {
> + struct iovec iovec[1];
> + struct bio_vec bvec[1];
> + } iov;
> + __virtio16 tmp;
> int ret;
>
> + ivec.iov.iovec = iov.iovec;
> + ivec.count = 1;
> + ivec.is_iovec = vrh->use_va;
> +
> /* Atomic read is needed for getu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_RO);
> + ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &ivec, VHOST_MAP_RO);
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - from = kaddr + iov.bv_offset;
> - *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_local(kaddr);
> + if (ivec.is_iovec) {
> + ret = __get_user(tmp, (__virtio16 __user *)ivec.iov.iovec[0].iov_base);
> + if (ret)
> + return ret;
> + } else {
> + void *kaddr = kmap_local_page(ivec.iov.bvec[0].bv_page);
> + void *from = kaddr + ivec.iov.bvec[0].bv_offset;
> +
> + tmp = READ_ONCE(*(__virtio16 *)from);
> + kunmap_local(kaddr);
> + }
> +
> + *val = vringh16_to_cpu(vrh, tmp);
>
> return 0;
> }
> @@ -1231,20 +1298,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
> static inline int putu16_iotlb(const struct vringh *vrh,
> __virtio16 *p, u16 val)
> {
> - struct bio_vec iov;
> - void *kaddr, *to;
> + struct iotlb_vec ivec;
> + union {
> + struct iovec iovec;
> + struct bio_vec bvec;
> + } iov;
> + __virtio16 tmp;
> int ret;
>
> + ivec.iov.iovec = &iov.iovec;
> + ivec.count = 1;
> + ivec.is_iovec = vrh->use_va;
> +
> /* Atomic write is needed for putu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_WO);
> + ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &ivec, VHOST_MAP_RO);
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - to = kaddr + iov.bv_offset;
> - WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_local(kaddr);
> + tmp = cpu_to_vringh16(vrh, val);
> +
> + if (ivec.is_iovec) {
> + ret = __put_user(tmp, (__virtio16 __user *)ivec.iov.iovec[0].iov_base);
> + if (ret)
> + return ret;
> + } else {
> + void *kaddr = kmap_local_page(ivec.iov.bvec[0].bv_page);
> + void *to = kaddr + ivec.iov.bvec[0].bv_offset;
> +
> + WRITE_ONCE(*(__virtio16 *)to, tmp);
> + kunmap_local(kaddr);
> + }
>
> return 0;
> }
> @@ -1306,6 +1390,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
> * @features: the feature bits for this ring.
> * @num: the number of elements.
> * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @use_va: true if IOTLB contains user VA
> * @desc: the userpace descriptor pointer.
> * @avail: the userpace avail pointer.
> * @used: the userpace used pointer.
> @@ -1313,11 +1398,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
> * Returns an error if num is invalid.
> */
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> - unsigned int num, bool weak_barriers,
> + unsigned int num, bool weak_barriers, bool use_va,
> struct vring_desc *desc,
> struct vring_avail *avail,
> struct vring_used *used)
> {
> + vrh->use_va = use_va;
> +
> return vringh_init_kern(vrh, features, num, weak_barriers,
> desc, avail, used);
> }
> --
> 2.39.2
>

2023-03-23 09:41:52

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks

On Thu, Mar 23, 2023 at 11:01:39AM +0800, Jason Wang wrote:
>On Tue, Mar 21, 2023 at 11:42 PM Stefano Garzarella <[email protected]> wrote:
>>
>> When the user call VHOST_SET_OWNER ioctl and the vDPA device
>> has `use_va` set to true, let's call the bind_mm callback.
>> In this way we can bind the device to the user address space
>> and directly use the user VA.
>>
>> The unbind_mm callback is called during the release after
>> stopping the device.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>>
>> Notes:
>> v3:
>> - added `case VHOST_SET_OWNER` in vhost_vdpa_unlocked_ioctl() [Jason]
>> v2:
>> - call the new unbind_mm callback during the release [Jason]
>> - avoid to call bind_mm callback after the reset, since the device
>> is not detaching it now during the reset
>>
>> drivers/vhost/vdpa.c | 31 +++++++++++++++++++++++++++++++
>> 1 file changed, 31 insertions(+)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index 7be9d9d8f01c..20250c3418b2 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
>> return vdpa_reset(vdpa);
>> }
>>
>> +static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
>> +{
>> + struct vdpa_device *vdpa = v->vdpa;
>> + const struct vdpa_config_ops *ops = vdpa->config;
>> +
>> + if (!vdpa->use_va || !ops->bind_mm)
>> + return 0;
>> +
>> + return ops->bind_mm(vdpa, v->vdev.mm);
>> +}
>> +
>> +static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
>> +{
>> + struct vdpa_device *vdpa = v->vdpa;
>> + const struct vdpa_config_ops *ops = vdpa->config;
>> +
>> + if (!vdpa->use_va || !ops->unbind_mm)
>> + return;
>> +
>> + ops->unbind_mm(vdpa);
>> +}
>> +
>> static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
>> {
>> struct vdpa_device *vdpa = v->vdpa;
>> @@ -709,6 +731,14 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>> case VHOST_VDPA_RESUME:
>> r = vhost_vdpa_resume(v);
>> break;
>> + case VHOST_SET_OWNER:
>> + r = vhost_dev_set_owner(d);
>
>Nit:
>
>I'd stick to the current way of passing the cmd, argp to
>vhost_dev_ioctl() and introduce a new switch after the
>vhost_dev_ioctl().
>
>In this way, we are immune to any possible changes of dealing with
>VHOST_SET_OWNER in vhost core.

Good point, I'll change in v4.

>
>Others look good.

Thanks,
Stefano

2023-03-23 09:54:25

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] vdpa_sim: add support for user VA

On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote:
>On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <[email protected]> wrote:
>>
>> The new "use_va" module parameter (default: true) is used in
>> vdpa_alloc_device() to inform the vDPA framework that the device
>> supports VA.
>>
>> vringh is initialized to use VA only when "use_va" is true and the
>> user's mm has been bound. So, only when the bus supports user VA
>> (e.g. vhost-vdpa).
>>
>> vdpasim_mm_work_fn work is used to serialize the binding to a new
>> address space when the .bind_mm callback is invoked, and unbinding
>> when the .unbind_mm callback is invoked.
>>
>> Call mmget_not_zero()/kthread_use_mm() inside the worker function
>> to pin the address space only as long as needed, following the
>> documentation of mmget() in include/linux/sched/mm.h:
>>
>> * Never use this function to pin this address space for an
>> * unbounded/indefinite amount of time.
>
>I wonder if everything would be simplified if we just allow the parent
>to advertise whether or not it requires the address space.
>
>Then when vhost-vDPA probes the device it can simply advertise
>use_work as true so vhost core can use get_task_mm() in this case?

IIUC set user_worker to true, it also creates the kthread in the vhost
core (but we can add another variable to avoid this).

My biggest concern is the comment in include/linux/sched/mm.h.
get_task_mm() uses mmget(), but in the documentation they advise against
pinning the address space indefinitely, so I preferred in keeping
mmgrab() in the vhost core, then call mmget_not_zero() in the worker
only when it is running.

In the future maybe mm will be used differently from parent if somehow
it is supported by iommu, so I would leave it to the parent to handle
this.

Thanks,
Stefano

2023-03-23 10:44:26

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vringh: support VA with iotlb

On Thu, Mar 23, 2023 at 11:36:28AM +0800, Jason Wang wrote:
>On Tue, Mar 21, 2023 at 11:43 PM Stefano Garzarella <[email protected]> wrote:
>>
>> vDPA supports the possibility to use user VA in the iotlb messages.
>> So, let's add support for user VA in vringh to use it in the vDPA
>> simulators.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>>
>> Notes:
>> v3:
>> - refactored avoiding code duplication [Eugenio]
>> v2:
>> - replace kmap_atomic() with kmap_local_page() [see previous patch]
>> - fix cast warnings when build with W=1 C=1
>>
>> include/linux/vringh.h | 5 +-
>> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
>> drivers/vhost/vringh.c | 153 +++++++++++++++++++++++-------
>> 4 files changed, 127 insertions(+), 37 deletions(-)
>>
>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> index 1991a02c6431..d39b9f2dcba0 100644
>> --- a/include/linux/vringh.h
>> +++ b/include/linux/vringh.h
>> @@ -32,6 +32,9 @@ struct vringh {
>> /* Can we get away with weak barriers? */
>> bool weak_barriers;
>>
>> + /* Use user's VA */
>> + bool use_va;
>> +
>> /* Last available index we saw (ie. where we're up to). */
>> u16 last_avail_idx;
>>
>> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>> spinlock_t *iotlb_lock);
>>
>> int vringh_init_iotlb(struct vringh *vrh, u64 features,
>> - unsigned int num, bool weak_barriers,
>> + unsigned int num, bool weak_barriers, bool use_va,
>> struct vring_desc *desc,
>> struct vring_avail *avail,
>> struct vring_used *used);
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 520646ae7fa0..dfd0e000217b 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>>
>> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
>> - MLX5_CVQ_MAX_ENT, false,
>> + MLX5_CVQ_MAX_ENT, false, false,
>
>To avoid those changes, would it be better to introduce
>vringh_init_iotlb_va() so vringh_init_iotlb() can stick to pa.

Okay, I don't have a strong opinion, I'll add vringh_init_iotlb_va().

>
>> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>> (struct vring_used *)(uintptr_t)cvq->device_addr);
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index eea23c630f7c..47cdf2a1f5b8 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>>
>> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
>> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>> (struct vring_desc *)(uintptr_t)vq->desc_addr,
>> (struct vring_avail *)
>> (uintptr_t)vq->driver_addr,
>> @@ -92,7 +92,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>> vq->cb = NULL;
>> vq->private = NULL;
>> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
>> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
>> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>>
>> vq->vring.notify = NULL;
>> }
>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> index 0ba3ef809e48..72c88519329a 100644
>> --- a/drivers/vhost/vringh.c
>> +++ b/drivers/vhost/vringh.c
>> @@ -1094,10 +1094,18 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>>
>> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>>
>> +struct iotlb_vec {
>> + union {
>> + struct iovec *iovec;
>> + struct bio_vec *bvec;
>> + } iov;
>> + size_t count;
>> + bool is_iovec;
>
>I wonder if this is needed (if vringh is passed to every iotlb_vec helper).

Yep, I can use vringh->use_va.

>
>> +};
>> +
>> static int iotlb_translate(const struct vringh *vrh,
>> u64 addr, u64 len, u64 *translated,
>> - struct bio_vec iov[],
>> - int iov_size, u32 perm)
>> + struct iotlb_vec *ivec, u32 perm)
>> {
>> struct vhost_iotlb_map *map;
>> struct vhost_iotlb *iotlb = vrh->iotlb;
>> @@ -1107,9 +1115,9 @@ static int iotlb_translate(const struct vringh *vrh,
>> spin_lock(vrh->iotlb_lock);
>>
>> while (len > s) {
>> - u64 size, pa, pfn;
>> + u64 size;
>>
>> - if (unlikely(ret >= iov_size)) {
>> + if (unlikely(ret >= ivec->count)) {
>> ret = -ENOBUFS;
>> break;
>> }
>> @@ -1124,10 +1132,22 @@ static int iotlb_translate(const struct vringh *vrh,
>> }
>>
>> size = map->size - addr + map->start;
>> - pa = map->addr + addr - map->start;
>> - pfn = pa >> PAGE_SHIFT;
>> - bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
>> - pa & (PAGE_SIZE - 1));
>> + if (ivec->is_iovec) {
>> + struct iovec *iovec = ivec->iov.iovec;
>> +
>> + iovec[ret].iov_len = min(len - s, size);
>> + iovec[ret].iov_base = (void __user *)(unsigned long)
>> + (map->addr + addr - map->start);
>
>map->addr - map->start to avoid overflow?

Right, it was pre-existing, but I'll fix since I'm here (also in the
else branch).
And since it's duplicate code, I'll declare it outside!

>
>> + } else {
>> + u64 pa = map->addr + addr - map->start;
>> + u64 pfn = pa >> PAGE_SHIFT;
>> + struct bio_vec *bvec = ivec->iov.bvec;
>> +
>> + bvec_set_page(&bvec[ret], pfn_to_page(pfn),
>> + min(len - s, size),
>> + pa & (PAGE_SIZE - 1));
>> + }
>> +
>> s += size;
>> addr += size;
>> ++ret;
>> @@ -1141,26 +1161,42 @@ static int iotlb_translate(const struct vringh *vrh,
>> return ret;
>> }
>>
>> +#define IOTLB_IOV_SIZE 16
>> +
>> static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
>> void *src, size_t len)
>> {
>> + struct iotlb_vec ivec;
>> + union {
>> + struct iovec iovec[IOTLB_IOV_SIZE];
>> + struct bio_vec bvec[IOTLB_IOV_SIZE];
>> + } iov;
>> u64 total_translated = 0;
>>
>> + ivec.iov.iovec = iov.iovec;
>
>ivc.iov = iov ?

I tried, but they are both anonymous unions, so I cannot assign them.

Also, this inner union I need to allocate space in the stack to hold
both arrays, while the union in iotlb_vec to carry both pointers.

I don't know whether to add a third field (e.g. `void *raw`) to both and
assign it.

>
>Others look good.

Thanks,
Stefano

2023-03-23 10:55:55

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vringh: support VA with iotlb

On Thu, Mar 23, 2023 at 09:09:14AM +0100, Eugenio Perez Martin wrote:
>On Tue, Mar 21, 2023 at 4:43 PM Stefano Garzarella <[email protected]> wrote:
>>
>> vDPA supports the possibility to use user VA in the iotlb messages.
>> So, let's add support for user VA in vringh to use it in the vDPA
>> simulators.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>>
>> Notes:
>> v3:
>> - refactored avoiding code duplication [Eugenio]
>> v2:
>> - replace kmap_atomic() with kmap_local_page() [see previous patch]
>> - fix cast warnings when build with W=1 C=1
>>
>> include/linux/vringh.h | 5 +-
>> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
>> drivers/vhost/vringh.c | 153 +++++++++++++++++++++++-------
>> 4 files changed, 127 insertions(+), 37 deletions(-)
>>
>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> index 1991a02c6431..d39b9f2dcba0 100644
>> --- a/include/linux/vringh.h
>> +++ b/include/linux/vringh.h
>> @@ -32,6 +32,9 @@ struct vringh {
>> /* Can we get away with weak barriers? */
>> bool weak_barriers;
>>
>> + /* Use user's VA */
>> + bool use_va;
>> +
>> /* Last available index we saw (ie. where we're up to). */
>> u16 last_avail_idx;
>>
>> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>> spinlock_t *iotlb_lock);
>>
>> int vringh_init_iotlb(struct vringh *vrh, u64 features,
>> - unsigned int num, bool weak_barriers,
>> + unsigned int num, bool weak_barriers, bool use_va,
>> struct vring_desc *desc,
>> struct vring_avail *avail,
>> struct vring_used *used);
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 520646ae7fa0..dfd0e000217b 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>>
>> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
>> - MLX5_CVQ_MAX_ENT, false,
>> + MLX5_CVQ_MAX_ENT, false, false,
>> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>> (struct vring_used *)(uintptr_t)cvq->device_addr);
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index eea23c630f7c..47cdf2a1f5b8 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>>
>> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
>> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>> (struct vring_desc *)(uintptr_t)vq->desc_addr,
>> (struct vring_avail *)
>> (uintptr_t)vq->driver_addr,
>> @@ -92,7 +92,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>> vq->cb = NULL;
>> vq->private = NULL;
>> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
>> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
>> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>>
>> vq->vring.notify = NULL;
>> }
>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> index 0ba3ef809e48..72c88519329a 100644
>> --- a/drivers/vhost/vringh.c
>> +++ b/drivers/vhost/vringh.c
>> @@ -1094,10 +1094,18 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>>
>> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>>
>> +struct iotlb_vec {
>> + union {
>> + struct iovec *iovec;
>> + struct bio_vec *bvec;
>> + } iov;
>> + size_t count;
>> + bool is_iovec;
>> +};
>> +
>> static int iotlb_translate(const struct vringh *vrh,
>> u64 addr, u64 len, u64 *translated,
>> - struct bio_vec iov[],
>> - int iov_size, u32 perm)
>> + struct iotlb_vec *ivec, u32 perm)
>> {
>> struct vhost_iotlb_map *map;
>> struct vhost_iotlb *iotlb = vrh->iotlb;
>> @@ -1107,9 +1115,9 @@ static int iotlb_translate(const struct vringh *vrh,
>> spin_lock(vrh->iotlb_lock);
>>
>> while (len > s) {
>> - u64 size, pa, pfn;
>> + u64 size;
>>
>> - if (unlikely(ret >= iov_size)) {
>> + if (unlikely(ret >= ivec->count)) {
>> ret = -ENOBUFS;
>> break;
>> }
>> @@ -1124,10 +1132,22 @@ static int iotlb_translate(const struct vringh *vrh,
>> }
>>
>> size = map->size - addr + map->start;
>> - pa = map->addr + addr - map->start;
>> - pfn = pa >> PAGE_SHIFT;
>> - bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
>> - pa & (PAGE_SIZE - 1));
>> + if (ivec->is_iovec) {
>> + struct iovec *iovec = ivec->iov.iovec;
>> +
>> + iovec[ret].iov_len = min(len - s, size);
>> + iovec[ret].iov_base = (void __user *)(unsigned long)
>
>s/unsigned long/uintptr_t ?
>

yep, good catch!

As I wrote to Jason, I think I'll take it out of the if and just declare
an uintptr_t variable, since I'm using it also in the else branch.

>
>
>> + (map->addr + addr - map->start);
>> + } else {
>> + u64 pa = map->addr + addr - map->start;
>> + u64 pfn = pa >> PAGE_SHIFT;
>> + struct bio_vec *bvec = ivec->iov.bvec;
>> +
>> + bvec_set_page(&bvec[ret], pfn_to_page(pfn),
>> + min(len - s, size),
>> + pa & (PAGE_SIZE - 1));
>> + }
>> +
>> s += size;
>> addr += size;
>> ++ret;
>> @@ -1141,26 +1161,42 @@ static int iotlb_translate(const struct vringh *vrh,
>> return ret;
>> }
>>
>> +#define IOTLB_IOV_SIZE 16
>
>I'm fine with defining here, but maybe it is better to isolate the
>change in a previous patch or reuse another well known macro?

Yep, good point!

Do you have any well known macro to suggest?

>
>Other looks good, and I agree with Jason's comments so even if the
>macro declaration is not moved:
>
>Acked-by: Eugenio Pérez <[email protected]>

Thanks,
Stefano

2023-03-23 11:47:20

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] vdpa_sim: add support for user VA

On Thu, Mar 23, 2023 at 10:50:06AM +0100, Stefano Garzarella wrote:
> On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote:
> > On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <[email protected]> wrote:
> > >
> > > The new "use_va" module parameter (default: true) is used in
> > > vdpa_alloc_device() to inform the vDPA framework that the device
> > > supports VA.
> > >
> > > vringh is initialized to use VA only when "use_va" is true and the
> > > user's mm has been bound. So, only when the bus supports user VA
> > > (e.g. vhost-vdpa).
> > >
> > > vdpasim_mm_work_fn work is used to serialize the binding to a new
> > > address space when the .bind_mm callback is invoked, and unbinding
> > > when the .unbind_mm callback is invoked.
> > >
> > > Call mmget_not_zero()/kthread_use_mm() inside the worker function
> > > to pin the address space only as long as needed, following the
> > > documentation of mmget() in include/linux/sched/mm.h:
> > >
> > > * Never use this function to pin this address space for an
> > > * unbounded/indefinite amount of time.
> >
> > I wonder if everything would be simplified if we just allow the parent
> > to advertise whether or not it requires the address space.
> >
> > Then when vhost-vDPA probes the device it can simply advertise
> > use_work as true so vhost core can use get_task_mm() in this case?
>
> IIUC set user_worker to true, it also creates the kthread in the vhost
> core (but we can add another variable to avoid this).
>
> My biggest concern is the comment in include/linux/sched/mm.h.
> get_task_mm() uses mmget(), but in the documentation they advise against
> pinning the address space indefinitely, so I preferred in keeping
> mmgrab() in the vhost core, then call mmget_not_zero() in the worker
> only when it is running.
>
> In the future maybe mm will be used differently from parent if somehow
> it is supported by iommu, so I would leave it to the parent to handle
> this.
>
> Thanks,
> Stefano

I think iommufd is supposed to handle all this detail, yes.

2023-03-23 14:47:55

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vringh: support VA with iotlb

On Thu, Mar 23, 2023 at 11:46 AM Stefano Garzarella <[email protected]> wrote:
>
> On Thu, Mar 23, 2023 at 09:09:14AM +0100, Eugenio Perez Martin wrote:
> >On Tue, Mar 21, 2023 at 4:43 PM Stefano Garzarella <[email protected]> wrote:
> >>
> >> vDPA supports the possibility to use user VA in the iotlb messages.
> >> So, let's add support for user VA in vringh to use it in the vDPA
> >> simulators.
> >>
> >> Signed-off-by: Stefano Garzarella <[email protected]>
> >> ---
> >>
> >> Notes:
> >> v3:
> >> - refactored avoiding code duplication [Eugenio]
> >> v2:
> >> - replace kmap_atomic() with kmap_local_page() [see previous patch]
> >> - fix cast warnings when build with W=1 C=1
> >>
> >> include/linux/vringh.h | 5 +-
> >> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> >> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
> >> drivers/vhost/vringh.c | 153 +++++++++++++++++++++++-------
> >> 4 files changed, 127 insertions(+), 37 deletions(-)
> >>
> >> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> >> index 1991a02c6431..d39b9f2dcba0 100644
> >> --- a/include/linux/vringh.h
> >> +++ b/include/linux/vringh.h
> >> @@ -32,6 +32,9 @@ struct vringh {
> >> /* Can we get away with weak barriers? */
> >> bool weak_barriers;
> >>
> >> + /* Use user's VA */
> >> + bool use_va;
> >> +
> >> /* Last available index we saw (ie. where we're up to). */
> >> u16 last_avail_idx;
> >>
> >> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
> >> spinlock_t *iotlb_lock);
> >>
> >> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> >> - unsigned int num, bool weak_barriers,
> >> + unsigned int num, bool weak_barriers, bool use_va,
> >> struct vring_desc *desc,
> >> struct vring_avail *avail,
> >> struct vring_used *used);
> >> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> index 520646ae7fa0..dfd0e000217b 100644
> >> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
> >>
> >> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> >> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> >> - MLX5_CVQ_MAX_ENT, false,
> >> + MLX5_CVQ_MAX_ENT, false, false,
> >> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
> >> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
> >> (struct vring_used *)(uintptr_t)cvq->device_addr);
> >> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> >> index eea23c630f7c..47cdf2a1f5b8 100644
> >> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> >> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> >> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> >> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> >> uint16_t last_avail_idx = vq->vring.last_avail_idx;
> >>
> >> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> >> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> >> (struct vring_desc *)(uintptr_t)vq->desc_addr,
> >> (struct vring_avail *)
> >> (uintptr_t)vq->driver_addr,
> >> @@ -92,7 +92,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
> >> vq->cb = NULL;
> >> vq->private = NULL;
> >> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
> >> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
> >> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
> >>
> >> vq->vring.notify = NULL;
> >> }
> >> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> >> index 0ba3ef809e48..72c88519329a 100644
> >> --- a/drivers/vhost/vringh.c
> >> +++ b/drivers/vhost/vringh.c
> >> @@ -1094,10 +1094,18 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
> >>
> >> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
> >>
> >> +struct iotlb_vec {
> >> + union {
> >> + struct iovec *iovec;
> >> + struct bio_vec *bvec;
> >> + } iov;
> >> + size_t count;
> >> + bool is_iovec;
> >> +};
> >> +
> >> static int iotlb_translate(const struct vringh *vrh,
> >> u64 addr, u64 len, u64 *translated,
> >> - struct bio_vec iov[],
> >> - int iov_size, u32 perm)
> >> + struct iotlb_vec *ivec, u32 perm)
> >> {
> >> struct vhost_iotlb_map *map;
> >> struct vhost_iotlb *iotlb = vrh->iotlb;
> >> @@ -1107,9 +1115,9 @@ static int iotlb_translate(const struct vringh *vrh,
> >> spin_lock(vrh->iotlb_lock);
> >>
> >> while (len > s) {
> >> - u64 size, pa, pfn;
> >> + u64 size;
> >>
> >> - if (unlikely(ret >= iov_size)) {
> >> + if (unlikely(ret >= ivec->count)) {
> >> ret = -ENOBUFS;
> >> break;
> >> }
> >> @@ -1124,10 +1132,22 @@ static int iotlb_translate(const struct vringh *vrh,
> >> }
> >>
> >> size = map->size - addr + map->start;
> >> - pa = map->addr + addr - map->start;
> >> - pfn = pa >> PAGE_SHIFT;
> >> - bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
> >> - pa & (PAGE_SIZE - 1));
> >> + if (ivec->is_iovec) {
> >> + struct iovec *iovec = ivec->iov.iovec;
> >> +
> >> + iovec[ret].iov_len = min(len - s, size);
> >> + iovec[ret].iov_base = (void __user *)(unsigned long)
> >
> >s/unsigned long/uintptr_t ?
> >
>
> yep, good catch!
>
> As I wrote to Jason, I think I'll take it out of the if and just declare
> an uintptr_t variable, since I'm using it also in the else branch.
>
> >
> >
> >> + (map->addr + addr - map->start);
> >> + } else {
> >> + u64 pa = map->addr + addr - map->start;
> >> + u64 pfn = pa >> PAGE_SHIFT;
> >> + struct bio_vec *bvec = ivec->iov.bvec;
> >> +
> >> + bvec_set_page(&bvec[ret], pfn_to_page(pfn),
> >> + min(len - s, size),
> >> + pa & (PAGE_SIZE - 1));
> >> + }
> >> +
> >> s += size;
> >> addr += size;
> >> ++ret;
> >> @@ -1141,26 +1161,42 @@ static int iotlb_translate(const struct vringh *vrh,
> >> return ret;
> >> }
> >>
> >> +#define IOTLB_IOV_SIZE 16
> >
> >I'm fine with defining here, but maybe it is better to isolate the
> >change in a previous patch or reuse another well known macro?
>
> Yep, good point!
>
> Do you have any well known macro to suggest?
>

Not really, 16 seems like a convenience value here actually :). Maybe
replace _SIZE with _STRIDE or similar?

I keep the Acked-by even if the final name is IOTLB_IOV_SIZE though.

Thanks!

2023-03-24 02:56:13

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] vdpa_sim: add support for user VA

On Thu, Mar 23, 2023 at 5:50 PM Stefano Garzarella <[email protected]> wrote:
>
> On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote:
> >On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <[email protected]> wrote:
> >>
> >> The new "use_va" module parameter (default: true) is used in
> >> vdpa_alloc_device() to inform the vDPA framework that the device
> >> supports VA.
> >>
> >> vringh is initialized to use VA only when "use_va" is true and the
> >> user's mm has been bound. So, only when the bus supports user VA
> >> (e.g. vhost-vdpa).
> >>
> >> vdpasim_mm_work_fn work is used to serialize the binding to a new
> >> address space when the .bind_mm callback is invoked, and unbinding
> >> when the .unbind_mm callback is invoked.
> >>
> >> Call mmget_not_zero()/kthread_use_mm() inside the worker function
> >> to pin the address space only as long as needed, following the
> >> documentation of mmget() in include/linux/sched/mm.h:
> >>
> >> * Never use this function to pin this address space for an
> >> * unbounded/indefinite amount of time.
> >
> >I wonder if everything would be simplified if we just allow the parent
> >to advertise whether or not it requires the address space.
> >
> >Then when vhost-vDPA probes the device it can simply advertise
> >use_work as true so vhost core can use get_task_mm() in this case?
>
> IIUC set user_worker to true, it also creates the kthread in the vhost
> core (but we can add another variable to avoid this).
>
> My biggest concern is the comment in include/linux/sched/mm.h.
> get_task_mm() uses mmget(), but in the documentation they advise against
> pinning the address space indefinitely, so I preferred in keeping
> mmgrab() in the vhost core, then call mmget_not_zero() in the worker
> only when it is running.

Ok.

>
> In the future maybe mm will be used differently from parent if somehow
> it is supported by iommu, so I would leave it to the parent to handle
> this.

This should be possible, I was told by Intel that their IOMMU can
access the process page table for shared virtual memory.

Thanks

>
> Thanks,
> Stefano
>

2023-03-24 04:08:55

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] vdpa_sim: add support for user VA


在 2023/3/21 23:48, Stefano Garzarella 写道:
> The new "use_va" module parameter (default: true) is used in
> vdpa_alloc_device() to inform the vDPA framework that the device
> supports VA.
>
> vringh is initialized to use VA only when "use_va" is true and the
> user's mm has been bound. So, only when the bus supports user VA
> (e.g. vhost-vdpa).
>
> vdpasim_mm_work_fn work is used to serialize the binding to a new
> address space when the .bind_mm callback is invoked, and unbinding
> when the .unbind_mm callback is invoked.
>
> Call mmget_not_zero()/kthread_use_mm() inside the worker function
> to pin the address space only as long as needed, following the
> documentation of mmget() in include/linux/sched/mm.h:
>
> * Never use this function to pin this address space for an
> * unbounded/indefinite amount of time.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v3:
> - called mmget_not_zero() before kthread_use_mm() [Jason]
> As the documentation of mmget() in include/linux/sched/mm.h says:
>
> * Never use this function to pin this address space for an
> * unbounded/indefinite amount of time.
>
> I moved mmget_not_zero/kthread_use_mm inside the worker function,
> this way we pin the address space only as long as needed.
> This is similar to what vfio_iommu_type1_dma_rw_chunk() does in
> drivers/vfio/vfio_iommu_type1.c
> - simplified the mm bind/unbind [Jason]
> - renamed vdpasim_worker_change_mm_sync() [Jason]
> - fix commit message (s/default: false/default: true)
> v2:
> - `use_va` set to true by default [Eugenio]
> - supported the new unbind_mm callback [Jason]
> - removed the unbind_mm call in vdpasim_do_reset() [Jason]
> - avoided to release the lock while call kthread_flush_work() since we
> are now using a mutex to protect the device state
>
> drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++++++++++++++++++++++++++++++-
> 2 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 4774292fba8c..3a42887d05d9 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -59,6 +59,7 @@ struct vdpasim {
> struct vdpasim_virtqueue *vqs;
> struct kthread_worker *worker;
> struct kthread_work work;
> + struct mm_struct *mm_bound;
> struct vdpasim_dev_attr dev_attr;
> /* mutex to synchronize virtqueue state */
> struct mutex mutex;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index ab4cfb82c237..23c891cdcd54 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444);
> MODULE_PARM_DESC(max_iotlb_entries,
> "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
>
> +static bool use_va = true;
> +module_param(use_va, bool, 0444);
> +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
> +
> #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
> #define VDPASIM_QUEUE_MAX 256
> #define VDPASIM_VENDOR_ID 0
>
> +struct vdpasim_mm_work {
> + struct kthread_work work;
> + struct vdpasim *vdpasim;
> + struct mm_struct *mm_to_bind;
> + int ret;
> +};
> +
> +static void vdpasim_mm_work_fn(struct kthread_work *work)
> +{
> + struct vdpasim_mm_work *mm_work =
> + container_of(work, struct vdpasim_mm_work, work);
> + struct vdpasim *vdpasim = mm_work->vdpasim;
> +
> + mm_work->ret = 0;
> +
> + //TODO: should we attach the cgroup of the mm owner?
> + vdpasim->mm_bound = mm_work->mm_to_bind;
> +}
> +
> +static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
> + struct vdpasim_mm_work *mm_work)
> +{
> + struct kthread_work *work = &mm_work->work;
> +
> + kthread_init_work(work, vdpasim_mm_work_fn);
> + kthread_queue_work(vdpasim->worker, work);
> +
> + kthread_flush_work(work);
> +}
> +
> static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
> {
> return container_of(vdpa, struct vdpasim, vdpa);
> @@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> {
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> uint16_t last_avail_idx = vq->vring.last_avail_idx;
> + bool va_enabled = use_va && vdpasim->mm_bound;
>
> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> + va_enabled,
> (struct vring_desc *)(uintptr_t)vq->desc_addr,
> (struct vring_avail *)
> (uintptr_t)vq->driver_addr,
> @@ -130,8 +166,20 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops;
> static void vdpasim_work_fn(struct kthread_work *work)
> {
> struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> + struct mm_struct *mm = vdpasim->mm_bound;
> +
> + if (mm) {
> + if (!mmget_not_zero(mm))
> + return;


Do we need to check use_va here.

Other than this

Acked-by: Jason Wang <[email protected]>

Thanks


> + kthread_use_mm(mm);
> + }
>
> vdpasim->dev_attr.work_fn(vdpasim);
> +
> + if (mm) {
> + kthread_unuse_mm(mm);
> + mmput(mm);
> + }
> }
>
> struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> @@ -162,7 +210,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> vdpa = __vdpa_alloc_device(NULL, ops,
> dev_attr->ngroups, dev_attr->nas,
> dev_attr->alloc_size,
> - dev_attr->name, false);
> + dev_attr->name, use_va);
> if (IS_ERR(vdpa)) {
> ret = PTR_ERR(vdpa);
> goto err_alloc;
> @@ -582,6 +630,30 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
> return ret;
> }
>
> +static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
> +{
> + struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> + struct vdpasim_mm_work mm_work;
> +
> + mm_work.vdpasim = vdpasim;
> + mm_work.mm_to_bind = mm;
> +
> + vdpasim_worker_change_mm_sync(vdpasim, &mm_work);
> +
> + return mm_work.ret;
> +}
> +
> +static void vdpasim_unbind_mm(struct vdpa_device *vdpa)
> +{
> + struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> + struct vdpasim_mm_work mm_work;
> +
> + mm_work.vdpasim = vdpasim;
> + mm_work.mm_to_bind = NULL;
> +
> + vdpasim_worker_change_mm_sync(vdpasim, &mm_work);
> +}
> +
> static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,
> u64 iova, u64 size,
> u64 pa, u32 perm, void *opaque)
> @@ -678,6 +750,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
> .set_group_asid = vdpasim_set_group_asid,
> .dma_map = vdpasim_dma_map,
> .dma_unmap = vdpasim_dma_unmap,
> + .bind_mm = vdpasim_bind_mm,
> + .unbind_mm = vdpasim_unbind_mm,
> .free = vdpasim_free,
> };
>
> @@ -712,6 +786,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
> .get_iova_range = vdpasim_get_iova_range,
> .set_group_asid = vdpasim_set_group_asid,
> .set_map = vdpasim_set_map,
> + .bind_mm = vdpasim_bind_mm,
> + .unbind_mm = vdpasim_unbind_mm,
> .free = vdpasim_free,
> };
>

2023-03-24 14:47:38

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vringh: support VA with iotlb

On Thu, Mar 23, 2023 at 03:43:34PM +0100, Eugenio Perez Martin wrote:
>On Thu, Mar 23, 2023 at 11:46 AM Stefano Garzarella <[email protected]> wrote:
>>
>> On Thu, Mar 23, 2023 at 09:09:14AM +0100, Eugenio Perez Martin wrote:
>> >On Tue, Mar 21, 2023 at 4:43 PM Stefano Garzarella <[email protected]> wrote:
>> >>
>> >> vDPA supports the possibility to use user VA in the iotlb messages.
>> >> So, let's add support for user VA in vringh to use it in the vDPA
>> >> simulators.
>> >>
>> >> Signed-off-by: Stefano Garzarella <[email protected]>
>> >> ---
>> >>
>> >> Notes:
>> >> v3:
>> >> - refactored avoiding code duplication [Eugenio]
>> >> v2:
>> >> - replace kmap_atomic() with kmap_local_page() [see previous patch]
>> >> - fix cast warnings when build with W=1 C=1
>> >>
>> >> include/linux/vringh.h | 5 +-
>> >> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>> >> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
>> >> drivers/vhost/vringh.c | 153 +++++++++++++++++++++++-------
>> >> 4 files changed, 127 insertions(+), 37 deletions(-)
>> >>
>> >> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> >> index 1991a02c6431..d39b9f2dcba0 100644
>> >> --- a/include/linux/vringh.h
>> >> +++ b/include/linux/vringh.h
>> >> @@ -32,6 +32,9 @@ struct vringh {
>> >> /* Can we get away with weak barriers? */
>> >> bool weak_barriers;
>> >>
>> >> + /* Use user's VA */
>> >> + bool use_va;
>> >> +
>> >> /* Last available index we saw (ie. where we're up to). */
>> >> u16 last_avail_idx;
>> >>
>> >> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>> >> spinlock_t *iotlb_lock);
>> >>
>> >> int vringh_init_iotlb(struct vringh *vrh, u64 features,
>> >> - unsigned int num, bool weak_barriers,
>> >> + unsigned int num, bool weak_barriers, bool use_va,
>> >> struct vring_desc *desc,
>> >> struct vring_avail *avail,
>> >> struct vring_used *used);
>> >> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> >> index 520646ae7fa0..dfd0e000217b 100644
>> >> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> >> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> >> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>> >>
>> >> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>> >> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
>> >> - MLX5_CVQ_MAX_ENT, false,
>> >> + MLX5_CVQ_MAX_ENT, false, false,
>> >> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>> >> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>> >> (struct vring_used *)(uintptr_t)cvq->device_addr);
>> >> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> >> index eea23c630f7c..47cdf2a1f5b8 100644
>> >> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> >> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> >> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>> >> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>> >> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>> >>
>> >> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
>> >> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>> >> (struct vring_desc *)(uintptr_t)vq->desc_addr,
>> >> (struct vring_avail *)
>> >> (uintptr_t)vq->driver_addr,
>> >> @@ -92,7 +92,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>> >> vq->cb = NULL;
>> >> vq->private = NULL;
>> >> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
>> >> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
>> >> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>> >>
>> >> vq->vring.notify = NULL;
>> >> }
>> >> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> >> index 0ba3ef809e48..72c88519329a 100644
>> >> --- a/drivers/vhost/vringh.c
>> >> +++ b/drivers/vhost/vringh.c
>> >> @@ -1094,10 +1094,18 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>> >>
>> >> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>> >>
>> >> +struct iotlb_vec {
>> >> + union {
>> >> + struct iovec *iovec;
>> >> + struct bio_vec *bvec;
>> >> + } iov;
>> >> + size_t count;
>> >> + bool is_iovec;
>> >> +};
>> >> +
>> >> static int iotlb_translate(const struct vringh *vrh,
>> >> u64 addr, u64 len, u64 *translated,
>> >> - struct bio_vec iov[],
>> >> - int iov_size, u32 perm)
>> >> + struct iotlb_vec *ivec, u32 perm)
>> >> {
>> >> struct vhost_iotlb_map *map;
>> >> struct vhost_iotlb *iotlb = vrh->iotlb;
>> >> @@ -1107,9 +1115,9 @@ static int iotlb_translate(const struct vringh *vrh,
>> >> spin_lock(vrh->iotlb_lock);
>> >>
>> >> while (len > s) {
>> >> - u64 size, pa, pfn;
>> >> + u64 size;
>> >>
>> >> - if (unlikely(ret >= iov_size)) {
>> >> + if (unlikely(ret >= ivec->count)) {
>> >> ret = -ENOBUFS;
>> >> break;
>> >> }
>> >> @@ -1124,10 +1132,22 @@ static int iotlb_translate(const struct vringh *vrh,
>> >> }
>> >>
>> >> size = map->size - addr + map->start;
>> >> - pa = map->addr + addr - map->start;
>> >> - pfn = pa >> PAGE_SHIFT;
>> >> - bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
>> >> - pa & (PAGE_SIZE - 1));
>> >> + if (ivec->is_iovec) {
>> >> + struct iovec *iovec = ivec->iov.iovec;
>> >> +
>> >> + iovec[ret].iov_len = min(len - s, size);
>> >> + iovec[ret].iov_base = (void __user *)(unsigned long)
>> >
>> >s/unsigned long/uintptr_t ?
>> >
>>
>> yep, good catch!
>>
>> As I wrote to Jason, I think I'll take it out of the if and just declare
>> an uintptr_t variable, since I'm using it also in the else branch.
>>
>> >
>> >
>> >> + (map->addr + addr - map->start);
>> >> + } else {
>> >> + u64 pa = map->addr + addr - map->start;
>> >> + u64 pfn = pa >> PAGE_SHIFT;
>> >> + struct bio_vec *bvec = ivec->iov.bvec;
>> >> +
>> >> + bvec_set_page(&bvec[ret], pfn_to_page(pfn),
>> >> + min(len - s, size),
>> >> + pa & (PAGE_SIZE - 1));
>> >> + }
>> >> +
>> >> s += size;
>> >> addr += size;
>> >> ++ret;
>> >> @@ -1141,26 +1161,42 @@ static int iotlb_translate(const struct vringh *vrh,
>> >> return ret;
>> >> }
>> >>
>> >> +#define IOTLB_IOV_SIZE 16
>> >
>> >I'm fine with defining here, but maybe it is better to isolate the
>> >change in a previous patch or reuse another well known macro?
>>
>> Yep, good point!
>>
>> Do you have any well known macro to suggest?
>>
>
>Not really, 16 seems like a convenience value here actually :). Maybe
>replace _SIZE with _STRIDE or similar?

Ack, I will add IOTLB_IOV_STRIDE in a preparation patch before this
one.

>
>I keep the Acked-by even if the final name is IOTLB_IOV_SIZE though.

Thanks,
I changed a bit this patch following Jason's and your suggestions.

I'd like an explicit Acked-by on the next version if it is okay with
you.

Thanks,
Stefano

2023-03-24 14:48:42

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] vdpa_sim: add support for user VA

On Fri, Mar 24, 2023 at 10:54:39AM +0800, Jason Wang wrote:
>On Thu, Mar 23, 2023 at 5:50 PM Stefano Garzarella <[email protected]> wrote:
>>
>> On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote:
>> >On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <[email protected]> wrote:
>> >>
>> >> The new "use_va" module parameter (default: true) is used in
>> >> vdpa_alloc_device() to inform the vDPA framework that the device
>> >> supports VA.
>> >>
>> >> vringh is initialized to use VA only when "use_va" is true and the
>> >> user's mm has been bound. So, only when the bus supports user VA
>> >> (e.g. vhost-vdpa).
>> >>
>> >> vdpasim_mm_work_fn work is used to serialize the binding to a new
>> >> address space when the .bind_mm callback is invoked, and unbinding
>> >> when the .unbind_mm callback is invoked.
>> >>
>> >> Call mmget_not_zero()/kthread_use_mm() inside the worker function
>> >> to pin the address space only as long as needed, following the
>> >> documentation of mmget() in include/linux/sched/mm.h:
>> >>
>> >> * Never use this function to pin this address space for an
>> >> * unbounded/indefinite amount of time.
>> >
>> >I wonder if everything would be simplified if we just allow the parent
>> >to advertise whether or not it requires the address space.
>> >
>> >Then when vhost-vDPA probes the device it can simply advertise
>> >use_work as true so vhost core can use get_task_mm() in this case?
>>
>> IIUC set user_worker to true, it also creates the kthread in the vhost
>> core (but we can add another variable to avoid this).
>>
>> My biggest concern is the comment in include/linux/sched/mm.h.
>> get_task_mm() uses mmget(), but in the documentation they advise against
>> pinning the address space indefinitely, so I preferred in keeping
>> mmgrab() in the vhost core, then call mmget_not_zero() in the worker
>> only when it is running.
>
>Ok.
>
>>
>> In the future maybe mm will be used differently from parent if somehow
>> it is supported by iommu, so I would leave it to the parent to handle
>> this.
>
>This should be possible, I was told by Intel that their IOMMU can
>access the process page table for shared virtual memory.

Cool, we should investigate this. Do you have any pointers to their
documentation?

Thanks,
Stefano

2023-03-24 14:53:21

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] vdpa_sim: add support for user VA

On Fri, Mar 24, 2023 at 11:49:32AM +0800, Jason Wang wrote:
>
>在 2023/3/21 23:48, Stefano Garzarella 写道:
>>The new "use_va" module parameter (default: true) is used in
>>vdpa_alloc_device() to inform the vDPA framework that the device
>>supports VA.
>>
>>vringh is initialized to use VA only when "use_va" is true and the
>>user's mm has been bound. So, only when the bus supports user VA
>>(e.g. vhost-vdpa).
>>
>>vdpasim_mm_work_fn work is used to serialize the binding to a new
>>address space when the .bind_mm callback is invoked, and unbinding
>>when the .unbind_mm callback is invoked.
>>
>>Call mmget_not_zero()/kthread_use_mm() inside the worker function
>>to pin the address space only as long as needed, following the
>>documentation of mmget() in include/linux/sched/mm.h:
>>
>> * Never use this function to pin this address space for an
>> * unbounded/indefinite amount of time.
>>
>>Signed-off-by: Stefano Garzarella <[email protected]>
>>---
>>
>>Notes:
>> v3:
>> - called mmget_not_zero() before kthread_use_mm() [Jason]
>> As the documentation of mmget() in include/linux/sched/mm.h says:
>> * Never use this function to pin this address space for an
>> * unbounded/indefinite amount of time.
>> I moved mmget_not_zero/kthread_use_mm inside the worker function,
>> this way we pin the address space only as long as needed.
>> This is similar to what vfio_iommu_type1_dma_rw_chunk() does in
>> drivers/vfio/vfio_iommu_type1.c
>> - simplified the mm bind/unbind [Jason]
>> - renamed vdpasim_worker_change_mm_sync() [Jason]
>> - fix commit message (s/default: false/default: true)
>> v2:
>> - `use_va` set to true by default [Eugenio]
>> - supported the new unbind_mm callback [Jason]
>> - removed the unbind_mm call in vdpasim_do_reset() [Jason]
>> - avoided to release the lock while call kthread_flush_work() since we
>> are now using a mutex to protect the device state
>>
>> drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
>> drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++++++++++++++++++++++++++++++-
>> 2 files changed, 79 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>>index 4774292fba8c..3a42887d05d9 100644
>>--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
>>+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>>@@ -59,6 +59,7 @@ struct vdpasim {
>> struct vdpasim_virtqueue *vqs;
>> struct kthread_worker *worker;
>> struct kthread_work work;
>>+ struct mm_struct *mm_bound;
>> struct vdpasim_dev_attr dev_attr;
>> /* mutex to synchronize virtqueue state */
>> struct mutex mutex;
>>diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>>index ab4cfb82c237..23c891cdcd54 100644
>>--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>>+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>>@@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444);
>> MODULE_PARM_DESC(max_iotlb_entries,
>> "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
>>+static bool use_va = true;
>>+module_param(use_va, bool, 0444);
>>+MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
>>+
>> #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
>> #define VDPASIM_QUEUE_MAX 256
>> #define VDPASIM_VENDOR_ID 0
>>+struct vdpasim_mm_work {
>>+ struct kthread_work work;
>>+ struct vdpasim *vdpasim;
>>+ struct mm_struct *mm_to_bind;
>>+ int ret;
>>+};
>>+
>>+static void vdpasim_mm_work_fn(struct kthread_work *work)
>>+{
>>+ struct vdpasim_mm_work *mm_work =
>>+ container_of(work, struct vdpasim_mm_work, work);
>>+ struct vdpasim *vdpasim = mm_work->vdpasim;
>>+
>>+ mm_work->ret = 0;
>>+
>>+ //TODO: should we attach the cgroup of the mm owner?
>>+ vdpasim->mm_bound = mm_work->mm_to_bind;
>>+}
>>+
>>+static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
>>+ struct vdpasim_mm_work *mm_work)
>>+{
>>+ struct kthread_work *work = &mm_work->work;
>>+
>>+ kthread_init_work(work, vdpasim_mm_work_fn);
>>+ kthread_queue_work(vdpasim->worker, work);
>>+
>>+ kthread_flush_work(work);
>>+}
>>+
>> static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
>> {
>> return container_of(vdpa, struct vdpasim, vdpa);
>>@@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>> {
>> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>>+ bool va_enabled = use_va && vdpasim->mm_bound;
>>- vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>>+ vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
>>+ va_enabled,
>> (struct vring_desc *)(uintptr_t)vq->desc_addr,
>> (struct vring_avail *)
>> (uintptr_t)vq->driver_addr,
>>@@ -130,8 +166,20 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops;
>> static void vdpasim_work_fn(struct kthread_work *work)
>> {
>> struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>>+ struct mm_struct *mm = vdpasim->mm_bound;
>>+
>>+ if (mm) {
>>+ if (!mmget_not_zero(mm))
>>+ return;
>
>
>Do we need to check use_va here.

Yep, right!

>
>Other than this
>
>Acked-by: Jason Wang <[email protected]>

Thanks for the reviews,
Stefano

2023-03-27 03:15:07

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] vdpa_sim: add support for user VA

On Fri, Mar 24, 2023 at 10:43 PM Stefano Garzarella <[email protected]> wrote:
>
> On Fri, Mar 24, 2023 at 10:54:39AM +0800, Jason Wang wrote:
> >On Thu, Mar 23, 2023 at 5:50 PM Stefano Garzarella <[email protected]> wrote:
> >>
> >> On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote:
> >> >On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <[email protected]> wrote:
> >> >>
> >> >> The new "use_va" module parameter (default: true) is used in
> >> >> vdpa_alloc_device() to inform the vDPA framework that the device
> >> >> supports VA.
> >> >>
> >> >> vringh is initialized to use VA only when "use_va" is true and the
> >> >> user's mm has been bound. So, only when the bus supports user VA
> >> >> (e.g. vhost-vdpa).
> >> >>
> >> >> vdpasim_mm_work_fn work is used to serialize the binding to a new
> >> >> address space when the .bind_mm callback is invoked, and unbinding
> >> >> when the .unbind_mm callback is invoked.
> >> >>
> >> >> Call mmget_not_zero()/kthread_use_mm() inside the worker function
> >> >> to pin the address space only as long as needed, following the
> >> >> documentation of mmget() in include/linux/sched/mm.h:
> >> >>
> >> >> * Never use this function to pin this address space for an
> >> >> * unbounded/indefinite amount of time.
> >> >
> >> >I wonder if everything would be simplified if we just allow the parent
> >> >to advertise whether or not it requires the address space.
> >> >
> >> >Then when vhost-vDPA probes the device it can simply advertise
> >> >use_work as true so vhost core can use get_task_mm() in this case?
> >>
> >> IIUC set user_worker to true, it also creates the kthread in the vhost
> >> core (but we can add another variable to avoid this).
> >>
> >> My biggest concern is the comment in include/linux/sched/mm.h.
> >> get_task_mm() uses mmget(), but in the documentation they advise against
> >> pinning the address space indefinitely, so I preferred in keeping
> >> mmgrab() in the vhost core, then call mmget_not_zero() in the worker
> >> only when it is running.
> >
> >Ok.
> >
> >>
> >> In the future maybe mm will be used differently from parent if somehow
> >> it is supported by iommu, so I would leave it to the parent to handle
> >> this.
> >
> >This should be possible, I was told by Intel that their IOMMU can
> >access the process page table for shared virtual memory.
>
> Cool, we should investigate this. Do you have any pointers to their
> documentation?

The vtd-spec I think.

Thanks

>
> Thanks,
> Stefano
>