v2:
- rebased on Linus' tree, commit ae3419fbac84 ("vc_screen: don't clobber
return value in vcs_read")
- removed `struct task_struct *owner` param (unused for now, maybe
useful to support cgroups) [Jason]
- add unbind_mm callback [Jason]
- call the new unbind_mm callback during the release [Jason]
- avoid to call bind_mm callback after the reset, since the device
is not detaching it now during the reset
- added new patch replace kmap_atomic() with kmap_local_page() since
checkpatch.pl complained about deprecation of kmap_atomic() touched
by a patch in this series
- fix cast warnings when build with W=1 C=1
- added new patch to replace the spinlock with a mutex [Jason]
- `use_va` set to true by default [Eugenio]
- supported the new unbind_mm callback [Jason]
- removed the unbind_mm call in vdpasim_do_reset() [Jason]
- avoided to release the lock while call kthread_flush_work() since
we are now using a mutex to protect the device state
RFC v1: https://lore.kernel.org/lkml/[email protected]/
This series adds support for the use of user virtual addresses in the
vDPA simulator devices.
The main reason for this change is to lift the pinning of all guest memory.
Especially with virtio devices implemented in software.
The next step would be to generalize the code in vdpa-sim to allow the
implementation of in-kernel software devices. Similar to vhost, but using vDPA
so we can reuse the same software stack (e.g. in QEMU) for both HW and SW
devices.
For example, we have never merged vhost-blk, and lately there has been interest.
So it would be nice to do it directly with vDPA to reuse the same code in the
VMM for both HW and SW vDPA block devices.
The main problem (addressed by this series) was due to the pinning of all
guest memory, which thus prevented the overcommit of guest memory.
Thanks,
Stefano
Stefano Garzarella (8):
vdpa: add bind_mm/unbind_mm callbacks
vhost-vdpa: use bind_mm/unbind_mm device callbacks
vringh: replace kmap_atomic() with kmap_local_page()
vringh: support VA with iotlb
vdpa_sim: make devices agnostic for work management
vdpa_sim: use kthread worker
vdpa_sim: replace the spinlock with a mutex to protect the state
vdpa_sim: add support for user VA
drivers/vdpa/vdpa_sim/vdpa_sim.h | 11 +-
include/linux/vdpa.h | 10 ++
include/linux/vringh.h | 5 +-
drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 160 ++++++++++++++---
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 10 +-
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 10 +-
drivers/vhost/vdpa.c | 30 ++++
drivers/vhost/vringh.c | 247 +++++++++++++++++++++------
9 files changed, 395 insertions(+), 90 deletions(-)
--
2.39.2
These new optional callbacks is used to bind/unbind the device to
a specific address space so the vDPA framework can use VA when
these callbacks are implemented.
Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---
Notes:
v2:
- removed `struct task_struct *owner` param (unused for now, maybe
useful to support cgroups) [Jason]
- add unbind_mm callback [Jason]
include/linux/vdpa.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 43f59ef10cc9..369c21394284 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -290,6 +290,14 @@ struct vdpa_map_file {
* @vdev: vdpa device
* @idx: virtqueue index
* Returns pointer to structure device or error (NULL)
+ * @bind_mm: Bind the device to a specific address space
+ * so the vDPA framework can use VA when this
+ * callback is implemented. (optional)
+ * @vdev: vdpa device
+ * @mm: address space to bind
+ * @unbind_mm: Unbind the device from the address space
+ * bound using the bind_mm callback. (optional)
+ * @vdev: vdpa device
* @free: Free resources that belongs to vDPA (optional)
* @vdev: vdpa device
*/
@@ -351,6 +359,8 @@ struct vdpa_config_ops {
int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
unsigned int asid);
struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
+ int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
+ void (*unbind_mm)(struct vdpa_device *vdev);
/* Free device resources */
void (*free)(struct vdpa_device *vdev);
--
2.39.2
When the user call VHOST_SET_OWNER ioctl and the vDPA device
has `use_va` set to true, let's call the bind_mm callback.
In this way we can bind the device to the user address space
and directly use the user VA.
The unbind_mm callback is called during the release after
stopping the device.
Signed-off-by: Stefano Garzarella <[email protected]>
---
Notes:
v2:
- call the new unbind_mm callback during the release [Jason]
- avoid to call bind_mm callback after the reset, since the device
is not detaching it now during the reset
drivers/vhost/vdpa.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index dc12dbd5b43b..1ab89fccd825 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
return vdpa_reset(vdpa);
}
+static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (!vdpa->use_va || !ops->bind_mm)
+ return 0;
+
+ return ops->bind_mm(vdpa, v->vdev.mm);
+}
+
+static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (!vdpa->use_va || !ops->unbind_mm)
+ return;
+
+ ops->unbind_mm(vdpa);
+}
+
static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
{
struct vdpa_device *vdpa = v->vdpa;
@@ -711,6 +733,13 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
break;
default:
r = vhost_dev_ioctl(&v->vdev, cmd, argp);
+ if (!r && cmd == VHOST_SET_OWNER) {
+ r = vhost_vdpa_bind_mm(v);
+ if (r) {
+ vhost_dev_reset_owner(&v->vdev, NULL);
+ break;
+ }
+ }
if (r == -ENOIOCTLCMD)
r = vhost_vdpa_vring_ioctl(v, cmd, argp);
break;
@@ -1285,6 +1314,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)
vhost_vdpa_clean_irq(v);
vhost_vdpa_reset(v);
vhost_dev_stop(&v->vdev);
+ vhost_vdpa_unbind_mm(v);
vhost_vdpa_free_domain(v);
vhost_vdpa_config_put(v);
vhost_vdpa_cleanup(v);
--
2.39.2
kmap_atomic() is deprecated in favor of kmap_local_page().
With kmap_local_page() the mappings are per thread, CPU local, can take
page-faults, and can be called from any context (including interrupts).
Furthermore, the tasks can be preempted and, when they are scheduled to
run again, the kernel virtual addresses are restored and still valid.
kmap_atomic() is implemented like a kmap_local_page() which also disables
page-faults and preemption (the latter only for !PREEMPT_RT kernels,
otherwise it only disables migration).
The code within the mappings/un-mappings in getu16_iotlb() and
putu16_iotlb() don't depend on the above-mentioned side effects of
kmap_atomic(), so that mere replacements of the old API with the new one
is all that is required (i.e., there is no need to explicitly add calls
to pagefault_disable() and/or preempt_disable()).
Signed-off-by: Stefano Garzarella <[email protected]>
---
Notes:
v2:
- added this patch since checkpatch.pl complained about deprecation
of kmap_atomic() touched by next patch
drivers/vhost/vringh.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index a1e27da54481..0ba3ef809e48 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;
- kaddr = kmap_atomic(iov.bv_page);
+ kaddr = kmap_local_page(iov.bv_page);
from = kaddr + iov.bv_offset;
*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
- kunmap_atomic(kaddr);
+ kunmap_local(kaddr);
return 0;
}
@@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;
- kaddr = kmap_atomic(iov.bv_page);
+ kaddr = kmap_local_page(iov.bv_page);
to = kaddr + iov.bv_offset;
WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
- kunmap_atomic(kaddr);
+ kunmap_local(kaddr);
return 0;
}
--
2.39.2
vDPA supports the possibility to use user VA in the iotlb messages.
So, let's add support for user VA in vringh to use it in the vDPA
simulators.
Signed-off-by: Stefano Garzarella <[email protected]>
---
Notes:
v2:
- replace kmap_atomic() with kmap_local_page() [see previous patch]
- fix cast warnings when build with W=1 C=1
include/linux/vringh.h | 5 +-
drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
4 files changed, 205 insertions(+), 53 deletions(-)
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 1991a02c6431..d39b9f2dcba0 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -32,6 +32,9 @@ struct vringh {
/* Can we get away with weak barriers? */
bool weak_barriers;
+ /* Use user's VA */
+ bool use_va;
+
/* Last available index we saw (ie. where we're up to). */
u16 last_avail_idx;
@@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
spinlock_t *iotlb_lock);
int vringh_init_iotlb(struct vringh *vrh, u64 features,
- unsigned int num, bool weak_barriers,
+ unsigned int num, bool weak_barriers, bool use_va,
struct vring_desc *desc,
struct vring_avail *avail,
struct vring_used *used);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 3a0e721aef05..babc8dd171a6 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
- MLX5_CVQ_MAX_ENT, false,
+ MLX5_CVQ_MAX_ENT, false, false,
(struct vring_desc *)(uintptr_t)cvq->desc_addr,
(struct vring_avail *)(uintptr_t)cvq->driver_addr,
(struct vring_used *)(uintptr_t)cvq->device_addr);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 6a0a65814626..481eb156658b 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
uint16_t last_avail_idx = vq->vring.last_avail_idx;
- vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
+ vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
(struct vring_desc *)(uintptr_t)vq->desc_addr,
(struct vring_avail *)
(uintptr_t)vq->driver_addr,
@@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
vq->cb = NULL;
vq->private = NULL;
vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
- VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
+ VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
vq->vring.notify = NULL;
}
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 0ba3ef809e48..61c79cea44ca 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
#if IS_REACHABLE(CONFIG_VHOST_IOTLB)
-static int iotlb_translate(const struct vringh *vrh,
- u64 addr, u64 len, u64 *translated,
- struct bio_vec iov[],
- int iov_size, u32 perm)
+static int iotlb_translate_va(const struct vringh *vrh,
+ u64 addr, u64 len, u64 *translated,
+ struct iovec iov[],
+ int iov_size, u32 perm)
{
struct vhost_iotlb_map *map;
struct vhost_iotlb *iotlb = vrh->iotlb;
+ u64 s = 0, last = addr + len - 1;
int ret = 0;
+
+ spin_lock(vrh->iotlb_lock);
+
+ while (len > s) {
+ u64 size;
+
+ if (unlikely(ret >= iov_size)) {
+ ret = -ENOBUFS;
+ break;
+ }
+
+ map = vhost_iotlb_itree_first(iotlb, addr, last);
+ if (!map || map->start > addr) {
+ ret = -EINVAL;
+ break;
+ } else if (!(map->perm & perm)) {
+ ret = -EPERM;
+ break;
+ }
+
+ size = map->size - addr + map->start;
+ iov[ret].iov_len = min(len - s, size);
+ iov[ret].iov_base = (void __user *)(unsigned long)
+ (map->addr + addr - map->start);
+ s += size;
+ addr += size;
+ ++ret;
+ }
+
+ spin_unlock(vrh->iotlb_lock);
+
+ if (translated)
+ *translated = min(len, s);
+
+ return ret;
+}
+
+static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
+ u64 len, u64 *translated)
+{
+ struct iovec iov[16];
+ struct iov_iter iter;
+ int ret;
+
+ ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
+ ARRAY_SIZE(iov), VHOST_MAP_RO);
+ if (ret == -ENOBUFS)
+ ret = ARRAY_SIZE(iov);
+ else if (ret < 0)
+ return ret;
+
+ iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
+
+ return copy_from_iter(dst, *translated, &iter);
+}
+
+static inline int copy_to_va(const struct vringh *vrh, void *dst, void *src,
+ u64 len, u64 *translated)
+{
+ struct iovec iov[16];
+ struct iov_iter iter;
+ int ret;
+
+ ret = iotlb_translate_va(vrh, (u64)(uintptr_t)dst, len, translated, iov,
+ ARRAY_SIZE(iov), VHOST_MAP_WO);
+ if (ret == -ENOBUFS)
+ ret = ARRAY_SIZE(iov);
+ else if (ret < 0)
+ return ret;
+
+ iov_iter_init(&iter, ITER_DEST, iov, ret, *translated);
+
+ return copy_to_iter(src, *translated, &iter);
+}
+
+static int iotlb_translate_pa(const struct vringh *vrh,
+ u64 addr, u64 len, u64 *translated,
+ struct bio_vec iov[],
+ int iov_size, u32 perm)
+{
+ struct vhost_iotlb_map *map;
+ struct vhost_iotlb *iotlb = vrh->iotlb;
u64 s = 0, last = addr + len - 1;
+ int ret = 0;
spin_lock(vrh->iotlb_lock);
@@ -1141,28 +1225,61 @@ static int iotlb_translate(const struct vringh *vrh,
return ret;
}
+static inline int copy_from_pa(const struct vringh *vrh, void *dst, void *src,
+ u64 len, u64 *translated)
+{
+ struct bio_vec iov[16];
+ struct iov_iter iter;
+ int ret;
+
+ ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)src, len, translated, iov,
+ ARRAY_SIZE(iov), VHOST_MAP_RO);
+ if (ret == -ENOBUFS)
+ ret = ARRAY_SIZE(iov);
+ else if (ret < 0)
+ return ret;
+
+ iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, *translated);
+
+ return copy_from_iter(dst, *translated, &iter);
+}
+
+static inline int copy_to_pa(const struct vringh *vrh, void *dst, void *src,
+ u64 len, u64 *translated)
+{
+ struct bio_vec iov[16];
+ struct iov_iter iter;
+ int ret;
+
+ ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)dst, len, translated, iov,
+ ARRAY_SIZE(iov), VHOST_MAP_WO);
+ if (ret == -ENOBUFS)
+ ret = ARRAY_SIZE(iov);
+ else if (ret < 0)
+ return ret;
+
+ iov_iter_bvec(&iter, ITER_DEST, iov, ret, *translated);
+
+ return copy_to_iter(src, *translated, &iter);
+}
+
static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
void *src, size_t len)
{
u64 total_translated = 0;
while (total_translated < len) {
- struct bio_vec iov[16];
- struct iov_iter iter;
u64 translated;
int ret;
- ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
- len - total_translated, &translated,
- iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
- if (ret == -ENOBUFS)
- ret = ARRAY_SIZE(iov);
- else if (ret < 0)
- return ret;
-
- iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
+ if (vrh->use_va) {
+ ret = copy_from_va(vrh, dst, src,
+ len - total_translated, &translated);
+ } else {
+ ret = copy_from_pa(vrh, dst, src,
+ len - total_translated, &translated);
+ }
- ret = copy_from_iter(dst, translated, &iter);
if (ret < 0)
return ret;
@@ -1180,22 +1297,17 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
u64 total_translated = 0;
while (total_translated < len) {
- struct bio_vec iov[16];
- struct iov_iter iter;
u64 translated;
int ret;
- ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
- len - total_translated, &translated,
- iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
- if (ret == -ENOBUFS)
- ret = ARRAY_SIZE(iov);
- else if (ret < 0)
- return ret;
-
- iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
+ if (vrh->use_va) {
+ ret = copy_to_va(vrh, dst, src,
+ len - total_translated, &translated);
+ } else {
+ ret = copy_to_pa(vrh, dst, src,
+ len - total_translated, &translated);
+ }
- ret = copy_to_iter(src, translated, &iter);
if (ret < 0)
return ret;
@@ -1210,20 +1322,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
static inline int getu16_iotlb(const struct vringh *vrh,
u16 *val, const __virtio16 *p)
{
- struct bio_vec iov;
- void *kaddr, *from;
int ret;
/* Atomic read is needed for getu16 */
- ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
- &iov, 1, VHOST_MAP_RO);
- if (ret < 0)
- return ret;
+ if (vrh->use_va) {
+ struct iovec iov;
+ __virtio16 tmp;
+
+ ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
+ NULL, &iov, 1, VHOST_MAP_RO);
+ if (ret < 0)
+ return ret;
- kaddr = kmap_local_page(iov.bv_page);
- from = kaddr + iov.bv_offset;
- *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
- kunmap_local(kaddr);
+ ret = __get_user(tmp, (__virtio16 __user *)iov.iov_base);
+ if (ret)
+ return ret;
+
+ *val = vringh16_to_cpu(vrh, tmp);
+ } else {
+ struct bio_vec iov;
+ void *kaddr, *from;
+
+ ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p),
+ NULL, &iov, 1, VHOST_MAP_RO);
+ if (ret < 0)
+ return ret;
+
+ kaddr = kmap_local_page(iov.bv_page);
+ from = kaddr + iov.bv_offset;
+ *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
+ kunmap_local(kaddr);
+ }
return 0;
}
@@ -1231,20 +1360,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
static inline int putu16_iotlb(const struct vringh *vrh,
__virtio16 *p, u16 val)
{
- struct bio_vec iov;
- void *kaddr, *to;
int ret;
/* Atomic write is needed for putu16 */
- ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
- &iov, 1, VHOST_MAP_WO);
- if (ret < 0)
- return ret;
+ if (vrh->use_va) {
+ struct iovec iov;
+ __virtio16 tmp;
- kaddr = kmap_local_page(iov.bv_page);
- to = kaddr + iov.bv_offset;
- WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
- kunmap_local(kaddr);
+ ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
+ NULL, &iov, 1, VHOST_MAP_RO);
+ if (ret < 0)
+ return ret;
+
+ tmp = cpu_to_vringh16(vrh, val);
+
+ ret = __put_user(tmp, (__virtio16 __user *)iov.iov_base);
+ if (ret)
+ return ret;
+ } else {
+ struct bio_vec iov;
+ void *kaddr, *to;
+
+ ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
+ &iov, 1, VHOST_MAP_WO);
+ if (ret < 0)
+ return ret;
+
+ kaddr = kmap_local_page(iov.bv_page);
+ to = kaddr + iov.bv_offset;
+ WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
+ kunmap_local(kaddr);
+ }
return 0;
}
@@ -1306,6 +1452,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
* @features: the feature bits for this ring.
* @num: the number of elements.
* @weak_barriers: true if we only need memory barriers, not I/O.
+ * @use_va: true if IOTLB contains user VA
* @desc: the userpace descriptor pointer.
* @avail: the userpace avail pointer.
* @used: the userpace used pointer.
@@ -1313,11 +1460,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
* Returns an error if num is invalid.
*/
int vringh_init_iotlb(struct vringh *vrh, u64 features,
- unsigned int num, bool weak_barriers,
+ unsigned int num, bool weak_barriers, bool use_va,
struct vring_desc *desc,
struct vring_avail *avail,
struct vring_used *used)
{
+ vrh->use_va = use_va;
+
return vringh_init_kern(vrh, features, num, weak_barriers,
desc, avail, used);
}
--
2.39.2
Let's move work management inside the vdpa_sim core.
This way we can easily change how we manage the works, without
having to change the devices each time.
Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 +++++++++++++++--
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 6 ++----
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 6 ++----
4 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 144858636c10..acee20faaf6a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
u32 ngroups;
u32 nas;
- work_func_t work_fn;
+ void (*work_fn)(struct vdpasim *vdpasim);
void (*get_config)(struct vdpasim *vdpasim, void *config);
void (*set_config)(struct vdpasim *vdpasim, const void *config);
int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
@@ -78,6 +78,7 @@ struct vdpasim {
struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
const struct vdpa_dev_set_config *config);
+void vdpasim_schedule_work(struct vdpasim *vdpasim);
/* TODO: cross-endian support */
static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 481eb156658b..a6ee830efc38 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -116,6 +116,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
static const struct vdpa_config_ops vdpasim_config_ops;
static const struct vdpa_config_ops vdpasim_batch_config_ops;
+static void vdpasim_work_fn(struct work_struct *work)
+{
+ struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
+
+ vdpasim->dev_attr.work_fn(vdpasim);
+}
+
struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
const struct vdpa_dev_set_config *config)
{
@@ -152,7 +159,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
- INIT_WORK(&vdpasim->work, dev_attr->work_fn);
+ INIT_WORK(&vdpasim->work, vdpasim_work_fn);
spin_lock_init(&vdpasim->lock);
spin_lock_init(&vdpasim->iommu_lock);
@@ -203,6 +210,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
}
EXPORT_SYMBOL_GPL(vdpasim_create);
+void vdpasim_schedule_work(struct vdpasim *vdpasim)
+{
+ schedule_work(&vdpasim->work);
+}
+EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
+
static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
u64 desc_area, u64 driver_area,
u64 device_area)
@@ -237,7 +250,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
}
if (vq->ready)
- schedule_work(&vdpasim->work);
+ vdpasim_schedule_work(vdpasim);
}
static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index 5117959bed8a..eb4897c8541e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -11,7 +11,6 @@
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
-#include <linux/sched.h>
#include <linux/blkdev.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
@@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
return handled;
}
-static void vdpasim_blk_work(struct work_struct *work)
+static void vdpasim_blk_work(struct vdpasim *vdpasim)
{
- struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
bool reschedule = false;
int i;
@@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
spin_unlock(&vdpasim->lock);
if (reschedule)
- schedule_work(&vdpasim->work);
+ vdpasim_schedule_work(vdpasim);
}
static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index 862f405362de..e61a9ecbfafe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -11,7 +11,6 @@
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
-#include <linux/sched.h>
#include <linux/etherdevice.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
@@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
u64_stats_update_end(&net->cq_stats.syncp);
}
-static void vdpasim_net_work(struct work_struct *work)
+static void vdpasim_net_work(struct vdpasim *vdpasim)
{
- struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
struct vdpasim_net *net = sim_to_net(vdpasim);
@@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
vdpasim_net_complete(rxq, write);
if (tx_pkts > 4) {
- schedule_work(&vdpasim->work);
+ vdpasim_schedule_work(vdpasim);
goto out;
}
}
--
2.39.2
Let's use our own kthread to run device jobs.
This allows us more flexibility, especially we can attach the kthread
to the user address space when vDPA uses user's VA.
Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 ++++++++++++-----
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index acee20faaf6a..ce83f9130a5d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
struct vdpasim {
struct vdpa_device vdpa;
struct vdpasim_virtqueue *vqs;
- struct work_struct work;
+ struct kthread_worker *worker;
+ struct kthread_work work;
struct vdpasim_dev_attr dev_attr;
/* spinlock to synchronize virtqueue state */
spinlock_t lock;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a6ee830efc38..6feb29726c2a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -11,8 +11,8 @@
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
+#include <linux/kthread.h>
#include <linux/slab.h>
-#include <linux/sched.h>
#include <linux/dma-map-ops.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
@@ -116,7 +116,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
static const struct vdpa_config_ops vdpasim_config_ops;
static const struct vdpa_config_ops vdpasim_batch_config_ops;
-static void vdpasim_work_fn(struct work_struct *work)
+static void vdpasim_work_fn(struct kthread_work *work)
{
struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
@@ -159,7 +159,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
- INIT_WORK(&vdpasim->work, vdpasim_work_fn);
+
+ kthread_init_work(&vdpasim->work, vdpasim_work_fn);
+ vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
+ dev_attr->name);
+ if (IS_ERR(vdpasim->worker))
+ goto err_iommu;
+
spin_lock_init(&vdpasim->lock);
spin_lock_init(&vdpasim->iommu_lock);
@@ -212,7 +218,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
void vdpasim_schedule_work(struct vdpasim *vdpasim)
{
- schedule_work(&vdpasim->work);
+ kthread_queue_work(vdpasim->worker, &vdpasim->work);
}
EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
@@ -612,7 +618,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;
- cancel_work_sync(&vdpasim->work);
+ kthread_cancel_work_sync(&vdpasim->work);
+ kthread_destroy_worker(vdpasim->worker);
for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
--
2.39.2
The spinlock we use to protect the state of the simulator is sometimes
held for a long time (for example, when devices handle requests).
This also prevents us from calling functions that might sleep (such as
kthread_flush_work() in the next patch), and thus having to release
and retake the lock.
For these reasons, let's replace the spinlock with a mutex that gives
us more flexibility.
Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vdpa/vdpa_sim/vdpa_sim.h | 4 ++--
drivers/vdpa/vdpa_sim/vdpa_sim.c | 34 ++++++++++++++--------------
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 ++--
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 4 ++--
4 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index ce83f9130a5d..4774292fba8c 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -60,8 +60,8 @@ struct vdpasim {
struct kthread_worker *worker;
struct kthread_work work;
struct vdpasim_dev_attr dev_attr;
- /* spinlock to synchronize virtqueue state */
- spinlock_t lock;
+ /* mutex to synchronize virtqueue state */
+ struct mutex mutex;
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 6feb29726c2a..a28103a67ae7 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -166,7 +166,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
if (IS_ERR(vdpasim->worker))
goto err_iommu;
- spin_lock_init(&vdpasim->lock);
+ mutex_init(&vdpasim->mutex);
spin_lock_init(&vdpasim->iommu_lock);
dev = &vdpasim->vdpa.dev;
@@ -275,13 +275,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
bool old_ready;
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
old_ready = vq->ready;
vq->ready = ready;
if (vq->ready && !old_ready) {
vdpasim_queue_ready(vdpasim, idx);
}
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
}
static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
@@ -299,9 +299,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
struct vringh *vrh = &vq->vring;
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vrh->last_avail_idx = state->split.avail_index;
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
return 0;
}
@@ -398,9 +398,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
u8 status;
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
status = vdpasim->status;
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
return status;
}
@@ -409,19 +409,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vdpasim->status = status;
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
}
static int vdpasim_reset(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vdpasim->status = 0;
vdpasim_do_reset(vdpasim);
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
return 0;
}
@@ -430,9 +430,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vdpasim->running = false;
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
return 0;
}
@@ -442,7 +442,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
vdpasim->running = true;
if (vdpasim->pending_kick) {
@@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
vdpasim->pending_kick = false;
}
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
return 0;
}
@@ -525,14 +525,14 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
iommu = &vdpasim->iommu[asid];
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
if (vdpasim_get_vq_group(vdpa, i) == group)
vringh_set_iotlb(&vdpasim->vqs[i].vring, iommu,
&vdpasim->iommu_lock);
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
return 0;
}
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index eb4897c8541e..568119e1553f 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -290,7 +290,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
bool reschedule = false;
int i;
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
goto out;
@@ -321,7 +321,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
}
}
out:
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
if (reschedule)
vdpasim_schedule_work(vdpasim);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index e61a9ecbfafe..7ab434592bfe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -201,7 +201,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0;
int err;
- spin_lock(&vdpasim->lock);
+ mutex_lock(&vdpasim->mutex);
if (!vdpasim->running)
goto out;
@@ -264,7 +264,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
}
out:
- spin_unlock(&vdpasim->lock);
+ mutex_unlock(&vdpasim->mutex);
u64_stats_update_begin(&net->tx_stats.syncp);
net->tx_stats.pkts += tx_pkts;
--
2.39.2
The new "use_va" module parameter (default: false) is used in
vdpa_alloc_device() to inform the vDPA framework that the device
supports VA.
vringh is initialized to use VA only when "use_va" is true and the
user's mm has been bound. So, only when the bus supports user VA
(e.g. vhost-vdpa).
vdpasim_mm_work_fn work is used to attach the kthread to the user
address space when the .bind_mm callback is invoked, and to detach
it when the .unbind_mm callback is invoked.
Signed-off-by: Stefano Garzarella <[email protected]>
---
Notes:
v2:
- `use_va` set to true by default [Eugenio]
- supported the new unbind_mm callback [Jason]
- removed the unbind_mm call in vdpasim_do_reset() [Jason]
- avoided to release the lock while call kthread_flush_work() since we
are now using a mutex to protect the device state
drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
drivers/vdpa/vdpa_sim/vdpa_sim.c | 98 +++++++++++++++++++++++++++++++-
2 files changed, 97 insertions(+), 2 deletions(-)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 4774292fba8c..3a42887d05d9 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -59,6 +59,7 @@ struct vdpasim {
struct vdpasim_virtqueue *vqs;
struct kthread_worker *worker;
struct kthread_work work;
+ struct mm_struct *mm_bound;
struct vdpasim_dev_attr dev_attr;
/* mutex to synchronize virtqueue state */
struct mutex mutex;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a28103a67ae7..eda26bc33df5 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -35,10 +35,77 @@ module_param(max_iotlb_entries, int, 0444);
MODULE_PARM_DESC(max_iotlb_entries,
"Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
+static bool use_va = true;
+module_param(use_va, bool, 0444);
+MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
+
#define VDPASIM_QUEUE_ALIGN PAGE_SIZE
#define VDPASIM_QUEUE_MAX 256
#define VDPASIM_VENDOR_ID 0
+struct vdpasim_mm_work {
+ struct kthread_work work;
+ struct mm_struct *mm;
+ bool bind;
+ int ret;
+};
+
+static void vdpasim_mm_work_fn(struct kthread_work *work)
+{
+ struct vdpasim_mm_work *mm_work =
+ container_of(work, struct vdpasim_mm_work, work);
+
+ mm_work->ret = 0;
+
+ if (mm_work->bind) {
+ kthread_use_mm(mm_work->mm);
+ //TODO: should we attach the cgroup of the mm owner?
+ } else {
+ kthread_unuse_mm(mm_work->mm);
+ }
+}
+
+static void vdpasim_worker_queue_mm(struct vdpasim *vdpasim,
+ struct vdpasim_mm_work *mm_work)
+{
+ struct kthread_work *work = &mm_work->work;
+
+ kthread_init_work(work, vdpasim_mm_work_fn);
+ kthread_queue_work(vdpasim->worker, work);
+
+ kthread_flush_work(work);
+}
+
+static int vdpasim_worker_bind_mm(struct vdpasim *vdpasim,
+ struct mm_struct *new_mm)
+{
+ struct vdpasim_mm_work mm_work;
+
+ mm_work.mm = new_mm;
+ mm_work.bind = true;
+
+ vdpasim_worker_queue_mm(vdpasim, &mm_work);
+
+ if (!mm_work.ret)
+ vdpasim->mm_bound = new_mm;
+
+ return mm_work.ret;
+}
+
+static void vdpasim_worker_unbind_mm(struct vdpasim *vdpasim)
+{
+ struct vdpasim_mm_work mm_work;
+
+ if (!vdpasim->mm_bound)
+ return;
+
+ mm_work.mm = vdpasim->mm_bound;
+ mm_work.bind = false;
+
+ vdpasim_worker_queue_mm(vdpasim, &mm_work);
+
+ vdpasim->mm_bound = NULL;
+}
static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
{
return container_of(vdpa, struct vdpasim, vdpa);
@@ -59,8 +126,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
{
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
uint16_t last_avail_idx = vq->vring.last_avail_idx;
+ bool va_enabled = use_va && vdpasim->mm_bound;
- vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
+ vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
+ va_enabled,
(struct vring_desc *)(uintptr_t)vq->desc_addr,
(struct vring_avail *)
(uintptr_t)vq->driver_addr,
@@ -151,7 +220,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
vdpa = __vdpa_alloc_device(NULL, ops,
dev_attr->ngroups, dev_attr->nas,
dev_attr->alloc_size,
- dev_attr->name, false);
+ dev_attr->name, use_va);
if (IS_ERR(vdpa)) {
ret = PTR_ERR(vdpa);
goto err_alloc;
@@ -571,6 +640,27 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
return ret;
}
+static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
+{
+ struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+ int ret;
+
+ mutex_lock(&vdpasim->mutex);
+ ret = vdpasim_worker_bind_mm(vdpasim, mm);
+ mutex_unlock(&vdpasim->mutex);
+
+ return ret;
+}
+
+static void vdpasim_unbind_mm(struct vdpa_device *vdpa)
+{
+ struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+ mutex_lock(&vdpasim->mutex);
+ vdpasim_worker_unbind_mm(vdpasim);
+ mutex_unlock(&vdpasim->mutex);
+}
+
static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,
u64 iova, u64 size,
u64 pa, u32 perm, void *opaque)
@@ -667,6 +757,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
.set_group_asid = vdpasim_set_group_asid,
.dma_map = vdpasim_dma_map,
.dma_unmap = vdpasim_dma_unmap,
+ .bind_mm = vdpasim_bind_mm,
+ .unbind_mm = vdpasim_unbind_mm,
.free = vdpasim_free,
};
@@ -701,6 +793,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
.get_iova_range = vdpasim_get_iova_range,
.set_group_asid = vdpasim_set_group_asid,
.set_map = vdpasim_set_map,
+ .bind_mm = vdpasim_bind_mm,
+ .unbind_mm = vdpasim_unbind_mm,
.free = vdpasim_free,
};
--
2.39.2
On Thu, Mar 02, 2023 at 12:34:19PM +0100, Stefano Garzarella wrote:
> Let's use our own kthread to run device jobs.
> This allows us more flexibility, especially we can attach the kthread
> to the user address space when vDPA uses user's VA.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 ++++++++++++-----
> 2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index acee20faaf6a..ce83f9130a5d 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
> struct vdpasim {
> struct vdpa_device vdpa;
> struct vdpasim_virtqueue *vqs;
> - struct work_struct work;
> + struct kthread_worker *worker;
> + struct kthread_work work;
> struct vdpasim_dev_attr dev_attr;
> /* spinlock to synchronize virtqueue state */
> spinlock_t lock;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index a6ee830efc38..6feb29726c2a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -11,8 +11,8 @@
> #include <linux/module.h>
> #include <linux/device.h>
> #include <linux/kernel.h>
> +#include <linux/kthread.h>
> #include <linux/slab.h>
> -#include <linux/sched.h>
> #include <linux/dma-map-ops.h>
> #include <linux/vringh.h>
> #include <linux/vdpa.h>
> @@ -116,7 +116,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
> static const struct vdpa_config_ops vdpasim_config_ops;
> static const struct vdpa_config_ops vdpasim_batch_config_ops;
>
> -static void vdpasim_work_fn(struct work_struct *work)
> +static void vdpasim_work_fn(struct kthread_work *work)
> {
> struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>
> @@ -159,7 +159,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>
> vdpasim = vdpa_to_sim(vdpa);
> vdpasim->dev_attr = *dev_attr;
> - INIT_WORK(&vdpasim->work, vdpasim_work_fn);
> +
> + kthread_init_work(&vdpasim->work, vdpasim_work_fn);
> + vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
> + dev_attr->name);
> + if (IS_ERR(vdpasim->worker))
> + goto err_iommu;
Branching to err_iommu will result in a call to put_device(dev)...
> +
> spin_lock_init(&vdpasim->lock);
> spin_lock_init(&vdpasim->iommu_lock);
... but dev is not initialised until the line following this hunk,
which is:
dev = &vdpasim->vdpa.dev;
In order to avoid leaking dev I _think_ the correct approach
is to move the initialisation of dev above the branch to
err_iommu, perhaps above the call to kthread_init_work()
is a good place.
This does move the assignment outside the locks above.
But I _think_ that is ok.
> @@ -212,7 +218,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
>
> void vdpasim_schedule_work(struct vdpasim *vdpasim)
> {
> - schedule_work(&vdpasim->work);
> + kthread_queue_work(vdpasim->worker, &vdpasim->work);
> }
> EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
>
> @@ -612,7 +618,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> int i;
>
> - cancel_work_sync(&vdpasim->work);
> + kthread_cancel_work_sync(&vdpasim->work);
> + kthread_destroy_worker(vdpasim->worker);
>
> for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
> vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
> --
> 2.39.2
>
On Thu, Mar 02, 2023 at 04:30:04PM +0100, Simon Horman wrote:
>On Thu, Mar 02, 2023 at 12:34:19PM +0100, Stefano Garzarella wrote:
>> Let's use our own kthread to run device jobs.
>> This allows us more flexibility, especially we can attach the kthread
>> to the user address space when vDPA uses user's VA.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>> drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
>> drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 ++++++++++++-----
>> 2 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> index acee20faaf6a..ce83f9130a5d 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> @@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
>> struct vdpasim {
>> struct vdpa_device vdpa;
>> struct vdpasim_virtqueue *vqs;
>> - struct work_struct work;
>> + struct kthread_worker *worker;
>> + struct kthread_work work;
>> struct vdpasim_dev_attr dev_attr;
>> /* spinlock to synchronize virtqueue state */
>> spinlock_t lock;
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index a6ee830efc38..6feb29726c2a 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -11,8 +11,8 @@
>> #include <linux/module.h>
>> #include <linux/device.h>
>> #include <linux/kernel.h>
>> +#include <linux/kthread.h>
>> #include <linux/slab.h>
>> -#include <linux/sched.h>
>> #include <linux/dma-map-ops.h>
>> #include <linux/vringh.h>
>> #include <linux/vdpa.h>
>> @@ -116,7 +116,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
>> static const struct vdpa_config_ops vdpasim_config_ops;
>> static const struct vdpa_config_ops vdpasim_batch_config_ops;
>>
>> -static void vdpasim_work_fn(struct work_struct *work)
>> +static void vdpasim_work_fn(struct kthread_work *work)
>> {
>> struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>>
>> @@ -159,7 +159,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>>
>> vdpasim = vdpa_to_sim(vdpa);
>> vdpasim->dev_attr = *dev_attr;
>> - INIT_WORK(&vdpasim->work, vdpasim_work_fn);
>> +
>> + kthread_init_work(&vdpasim->work, vdpasim_work_fn);
>> + vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
>> + dev_attr->name);
>> + if (IS_ERR(vdpasim->worker))
>> + goto err_iommu;
>
>Branching to err_iommu will result in a call to put_device(dev)...
Good catch!
>
>> +
>> spin_lock_init(&vdpasim->lock);
>> spin_lock_init(&vdpasim->iommu_lock);
>
>... but dev is not initialised until the line following this hunk,
>which is:
>
> dev = &vdpasim->vdpa.dev;
>
>In order to avoid leaking dev I _think_ the correct approach
>is to move the initialisation of dev above the branch to
>err_iommu, perhaps above the call to kthread_init_work()
>is a good place.
Yep, I agree. I'll fix in v3.
Thanks,
Stefano
>
>This does move the assignment outside the locks above.
>But I _think_ that is ok.
>
>> @@ -212,7 +218,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
>>
>> void vdpasim_schedule_work(struct vdpasim *vdpasim)
>> {
>> - schedule_work(&vdpasim->work);
>> + kthread_queue_work(vdpasim->worker, &vdpasim->work);
>> }
>> EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
>>
>> @@ -612,7 +618,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
>> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>> int i;
>>
>> - cancel_work_sync(&vdpasim->work);
>> + kthread_cancel_work_sync(&vdpasim->work);
>> + kthread_destroy_worker(vdpasim->worker);
>>
>> for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
>> vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
>> --
>> 2.39.2
>>
>
On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <[email protected]> wrote:
>
> vDPA supports the possibility to use user VA in the iotlb messages.
> So, let's add support for user VA in vringh to use it in the vDPA
> simulators.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v2:
> - replace kmap_atomic() with kmap_local_page() [see previous patch]
> - fix cast warnings when build with W=1 C=1
>
> include/linux/vringh.h | 5 +-
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
> drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
> 4 files changed, 205 insertions(+), 53 deletions(-)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 1991a02c6431..d39b9f2dcba0 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -32,6 +32,9 @@ struct vringh {
> /* Can we get away with weak barriers? */
> bool weak_barriers;
>
> + /* Use user's VA */
> + bool use_va;
> +
> /* Last available index we saw (ie. where we're up to). */
> u16 last_avail_idx;
>
> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
> spinlock_t *iotlb_lock);
>
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> - unsigned int num, bool weak_barriers,
> + unsigned int num, bool weak_barriers, bool use_va,
> struct vring_desc *desc,
> struct vring_avail *avail,
> struct vring_used *used);
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 3a0e721aef05..babc8dd171a6 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>
> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> - MLX5_CVQ_MAX_ENT, false,
> + MLX5_CVQ_MAX_ENT, false, false,
> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
> (struct vring_used *)(uintptr_t)cvq->device_addr);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 6a0a65814626..481eb156658b 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>
> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> (struct vring_desc *)(uintptr_t)vq->desc_addr,
> (struct vring_avail *)
> (uintptr_t)vq->driver_addr,
> @@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
> vq->cb = NULL;
> vq->private = NULL;
> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>
> vq->vring.notify = NULL;
> }
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..61c79cea44ca 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>
> -static int iotlb_translate(const struct vringh *vrh,
> - u64 addr, u64 len, u64 *translated,
> - struct bio_vec iov[],
> - int iov_size, u32 perm)
> +static int iotlb_translate_va(const struct vringh *vrh,
> + u64 addr, u64 len, u64 *translated,
> + struct iovec iov[],
> + int iov_size, u32 perm)
> {
> struct vhost_iotlb_map *map;
> struct vhost_iotlb *iotlb = vrh->iotlb;
> + u64 s = 0, last = addr + len - 1;
> int ret = 0;
> +
> + spin_lock(vrh->iotlb_lock);
> +
> + while (len > s) {
> + u64 size;
> +
> + if (unlikely(ret >= iov_size)) {
> + ret = -ENOBUFS;
> + break;
> + }
> +
> + map = vhost_iotlb_itree_first(iotlb, addr, last);
> + if (!map || map->start > addr) {
> + ret = -EINVAL;
> + break;
> + } else if (!(map->perm & perm)) {
> + ret = -EPERM;
> + break;
> + }
> +
> + size = map->size - addr + map->start;
> + iov[ret].iov_len = min(len - s, size);
> + iov[ret].iov_base = (void __user *)(unsigned long)
> + (map->addr + addr - map->start);
> + s += size;
> + addr += size;
> + ++ret;
> + }
> +
> + spin_unlock(vrh->iotlb_lock);
> +
> + if (translated)
> + *translated = min(len, s);
> +
> + return ret;
> +}
It seems to me iotlb_translate_va and iotlb_translate_pa are very
similar, their only difference is that the argument is that iov is
iovec instead of bio_vec. And how to fill it, obviously.
It would be great to merge both functions, only differing with a
conditional on vrh->use_va, or generics, or similar. Or, if following
the style of the rest of vringh code, to provide a callback to fill
iovec (although I like conditional more).
However I cannot think of an easy way to perform that without long
macros or type erasure.
> +
> +static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
> + u64 len, u64 *translated)
> +{
> + struct iovec iov[16];
> + struct iov_iter iter;
> + int ret;
> +
> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
> + ARRAY_SIZE(iov), VHOST_MAP_RO);
> + if (ret == -ENOBUFS)
> + ret = ARRAY_SIZE(iov);
> + else if (ret < 0)
> + return ret;
> +
> + iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
> +
> + return copy_from_iter(dst, *translated, &iter);
Maybe a good baby step for DRY is to return the iov_iter in
copy_from/to_va/pa here?
But I'm ok with this version too.
Acked-by: Eugenio Pérez Martin <[email protected]>
> +}
> +
> +static inline int copy_to_va(const struct vringh *vrh, void *dst, void *src,
> + u64 len, u64 *translated)
> +{
> + struct iovec iov[16];
> + struct iov_iter iter;
> + int ret;
> +
> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)dst, len, translated, iov,
> + ARRAY_SIZE(iov), VHOST_MAP_WO);
> + if (ret == -ENOBUFS)
> + ret = ARRAY_SIZE(iov);
> + else if (ret < 0)
> + return ret;
> +
> + iov_iter_init(&iter, ITER_DEST, iov, ret, *translated);
> +
> + return copy_to_iter(src, *translated, &iter);
> +}
> +
> +static int iotlb_translate_pa(const struct vringh *vrh,
> + u64 addr, u64 len, u64 *translated,
> + struct bio_vec iov[],
> + int iov_size, u32 perm)
> +{
> + struct vhost_iotlb_map *map;
> + struct vhost_iotlb *iotlb = vrh->iotlb;
> u64 s = 0, last = addr + len - 1;
> + int ret = 0;
>
> spin_lock(vrh->iotlb_lock);
>
> @@ -1141,28 +1225,61 @@ static int iotlb_translate(const struct vringh *vrh,
> return ret;
> }
>
> +static inline int copy_from_pa(const struct vringh *vrh, void *dst, void *src,
> + u64 len, u64 *translated)
> +{
> + struct bio_vec iov[16];
> + struct iov_iter iter;
> + int ret;
> +
> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)src, len, translated, iov,
> + ARRAY_SIZE(iov), VHOST_MAP_RO);
> + if (ret == -ENOBUFS)
> + ret = ARRAY_SIZE(iov);
> + else if (ret < 0)
> + return ret;
> +
> + iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, *translated);
> +
> + return copy_from_iter(dst, *translated, &iter);
> +}
> +
> +static inline int copy_to_pa(const struct vringh *vrh, void *dst, void *src,
> + u64 len, u64 *translated)
> +{
> + struct bio_vec iov[16];
> + struct iov_iter iter;
> + int ret;
> +
> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)dst, len, translated, iov,
> + ARRAY_SIZE(iov), VHOST_MAP_WO);
> + if (ret == -ENOBUFS)
> + ret = ARRAY_SIZE(iov);
> + else if (ret < 0)
> + return ret;
> +
> + iov_iter_bvec(&iter, ITER_DEST, iov, ret, *translated);
> +
> + return copy_to_iter(src, *translated, &iter);
> +}
> +
> static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
> void *src, size_t len)
> {
> u64 total_translated = 0;
>
> while (total_translated < len) {
> - struct bio_vec iov[16];
> - struct iov_iter iter;
> u64 translated;
> int ret;
>
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
> - len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> - if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> - else if (ret < 0)
> - return ret;
> -
> - iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> + if (vrh->use_va) {
> + ret = copy_from_va(vrh, dst, src,
> + len - total_translated, &translated);
> + } else {
> + ret = copy_from_pa(vrh, dst, src,
> + len - total_translated, &translated);
> + }
>
> - ret = copy_from_iter(dst, translated, &iter);
> if (ret < 0)
> return ret;
>
> @@ -1180,22 +1297,17 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> u64 total_translated = 0;
>
> while (total_translated < len) {
> - struct bio_vec iov[16];
> - struct iov_iter iter;
> u64 translated;
> int ret;
>
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
> - len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> - if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> - else if (ret < 0)
> - return ret;
> -
> - iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> + if (vrh->use_va) {
> + ret = copy_to_va(vrh, dst, src,
> + len - total_translated, &translated);
> + } else {
> + ret = copy_to_pa(vrh, dst, src,
> + len - total_translated, &translated);
> + }
>
> - ret = copy_to_iter(src, translated, &iter);
> if (ret < 0)
> return ret;
>
> @@ -1210,20 +1322,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> static inline int getu16_iotlb(const struct vringh *vrh,
> u16 *val, const __virtio16 *p)
> {
> - struct bio_vec iov;
> - void *kaddr, *from;
> int ret;
>
> /* Atomic read is needed for getu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_RO);
> - if (ret < 0)
> - return ret;
> + if (vrh->use_va) {
> + struct iovec iov;
> + __virtio16 tmp;
> +
> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, VHOST_MAP_RO);
> + if (ret < 0)
> + return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - from = kaddr + iov.bv_offset;
> - *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_local(kaddr);
> + ret = __get_user(tmp, (__virtio16 __user *)iov.iov_base);
> + if (ret)
> + return ret;
> +
> + *val = vringh16_to_cpu(vrh, tmp);
> + } else {
> + struct bio_vec iov;
> + void *kaddr, *from;
> +
> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, VHOST_MAP_RO);
> + if (ret < 0)
> + return ret;
> +
> + kaddr = kmap_local_page(iov.bv_page);
> + from = kaddr + iov.bv_offset;
> + *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> + kunmap_local(kaddr);
> + }
>
> return 0;
> }
> @@ -1231,20 +1360,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
> static inline int putu16_iotlb(const struct vringh *vrh,
> __virtio16 *p, u16 val)
> {
> - struct bio_vec iov;
> - void *kaddr, *to;
> int ret;
>
> /* Atomic write is needed for putu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_WO);
> - if (ret < 0)
> - return ret;
> + if (vrh->use_va) {
> + struct iovec iov;
> + __virtio16 tmp;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - to = kaddr + iov.bv_offset;
> - WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_local(kaddr);
> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, VHOST_MAP_RO);
> + if (ret < 0)
> + return ret;
> +
> + tmp = cpu_to_vringh16(vrh, val);
> +
> + ret = __put_user(tmp, (__virtio16 __user *)iov.iov_base);
> + if (ret)
> + return ret;
> + } else {
> + struct bio_vec iov;
> + void *kaddr, *to;
> +
> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> + &iov, 1, VHOST_MAP_WO);
> + if (ret < 0)
> + return ret;
> +
> + kaddr = kmap_local_page(iov.bv_page);
> + to = kaddr + iov.bv_offset;
> + WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> + kunmap_local(kaddr);
> + }
>
> return 0;
> }
> @@ -1306,6 +1452,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
> * @features: the feature bits for this ring.
> * @num: the number of elements.
> * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @use_va: true if IOTLB contains user VA
> * @desc: the userpace descriptor pointer.
> * @avail: the userpace avail pointer.
> * @used: the userpace used pointer.
> @@ -1313,11 +1460,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
> * Returns an error if num is invalid.
> */
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> - unsigned int num, bool weak_barriers,
> + unsigned int num, bool weak_barriers, bool use_va,
> struct vring_desc *desc,
> struct vring_avail *avail,
> struct vring_used *used)
> {
> + vrh->use_va = use_va;
> +
> return vringh_init_kern(vrh, features, num, weak_barriers,
> desc, avail, used);
> }
> --
> 2.39.2
>
On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <[email protected]> wrote:
>
> Let's move work management inside the vdpa_sim core.
> This way we can easily change how we manage the works, without
> having to change the devices each time.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Eugenio Pérez Martin <[email protected]>
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 +++++++++++++++--
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 6 ++----
> drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 6 ++----
> 4 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 144858636c10..acee20faaf6a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
> u32 ngroups;
> u32 nas;
>
> - work_func_t work_fn;
> + void (*work_fn)(struct vdpasim *vdpasim);
> void (*get_config)(struct vdpasim *vdpasim, void *config);
> void (*set_config)(struct vdpasim *vdpasim, const void *config);
> int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
> @@ -78,6 +78,7 @@ struct vdpasim {
>
> struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
> const struct vdpa_dev_set_config *config);
> +void vdpasim_schedule_work(struct vdpasim *vdpasim);
>
> /* TODO: cross-endian support */
> static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 481eb156658b..a6ee830efc38 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -116,6 +116,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
> static const struct vdpa_config_ops vdpasim_config_ops;
> static const struct vdpa_config_ops vdpasim_batch_config_ops;
>
> +static void vdpasim_work_fn(struct work_struct *work)
> +{
> + struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> +
> + vdpasim->dev_attr.work_fn(vdpasim);
> +}
> +
> struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> const struct vdpa_dev_set_config *config)
> {
> @@ -152,7 +159,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>
> vdpasim = vdpa_to_sim(vdpa);
> vdpasim->dev_attr = *dev_attr;
> - INIT_WORK(&vdpasim->work, dev_attr->work_fn);
> + INIT_WORK(&vdpasim->work, vdpasim_work_fn);
> spin_lock_init(&vdpasim->lock);
> spin_lock_init(&vdpasim->iommu_lock);
>
> @@ -203,6 +210,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> }
> EXPORT_SYMBOL_GPL(vdpasim_create);
>
> +void vdpasim_schedule_work(struct vdpasim *vdpasim)
> +{
> + schedule_work(&vdpasim->work);
> +}
> +EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
> +
> static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> u64 desc_area, u64 driver_area,
> u64 device_area)
> @@ -237,7 +250,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
> }
>
> if (vq->ready)
> - schedule_work(&vdpasim->work);
> + vdpasim_schedule_work(vdpasim);
> }
>
> static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index 5117959bed8a..eb4897c8541e 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -11,7 +11,6 @@
> #include <linux/module.h>
> #include <linux/device.h>
> #include <linux/kernel.h>
> -#include <linux/sched.h>
> #include <linux/blkdev.h>
> #include <linux/vringh.h>
> #include <linux/vdpa.h>
> @@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
> return handled;
> }
>
> -static void vdpasim_blk_work(struct work_struct *work)
> +static void vdpasim_blk_work(struct vdpasim *vdpasim)
> {
> - struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> bool reschedule = false;
> int i;
>
> @@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
> spin_unlock(&vdpasim->lock);
>
> if (reschedule)
> - schedule_work(&vdpasim->work);
> + vdpasim_schedule_work(vdpasim);
> }
>
> static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> index 862f405362de..e61a9ecbfafe 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> @@ -11,7 +11,6 @@
> #include <linux/module.h>
> #include <linux/device.h>
> #include <linux/kernel.h>
> -#include <linux/sched.h>
> #include <linux/etherdevice.h>
> #include <linux/vringh.h>
> #include <linux/vdpa.h>
> @@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
> u64_stats_update_end(&net->cq_stats.syncp);
> }
>
> -static void vdpasim_net_work(struct work_struct *work)
> +static void vdpasim_net_work(struct vdpasim *vdpasim)
> {
> - struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
> struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
> struct vdpasim_net *net = sim_to_net(vdpasim);
> @@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
> vdpasim_net_complete(rxq, write);
>
> if (tx_pkts > 4) {
> - schedule_work(&vdpasim->work);
> + vdpasim_schedule_work(vdpasim);
> goto out;
> }
> }
> --
> 2.39.2
>
Hi Stefano,
I love your patch! Perhaps something to improve:
[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master next-20230303]
[cannot apply to v6.2]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Stefano-Garzarella/vdpa-add-bind_mm-unbind_mm-callbacks/20230302-193850
base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link: https://lore.kernel.org/r/20230302113421.174582-7-sgarzare%40redhat.com
patch subject: [PATCH v2 6/8] vdpa_sim: use kthread worker
config: i386-allyesconfig (https://download.01.org/0day-ci/archive/20230305/[email protected]/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/5b2107457ac0e7b1bb0aa3635ebf13b02e82bb78
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Stefano-Garzarella/vdpa-add-bind_mm-unbind_mm-callbacks/20230302-193850
git checkout 5b2107457ac0e7b1bb0aa3635ebf13b02e82bb78
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/net/wireless/ath/ath10k/ drivers/vdpa/vdpa_sim/ fs/erofs/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All warnings (new ones prefixed by >>):
>> drivers/vdpa/vdpa_sim/vdpa_sim.c:166:6: warning: variable 'dev' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (IS_ERR(vdpasim->worker))
^~~~~~~~~~~~~~~~~~~~~~~
drivers/vdpa/vdpa_sim/vdpa_sim.c:213:13: note: uninitialized use occurs here
put_device(dev);
^~~
drivers/vdpa/vdpa_sim/vdpa_sim.c:166:2: note: remove the 'if' if its condition is always false
if (IS_ERR(vdpasim->worker))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/vdpa/vdpa_sim/vdpa_sim.c:132:20: note: initialize the variable 'dev' to silence this warning
struct device *dev;
^
= NULL
1 warning generated.
vim +166 drivers/vdpa/vdpa_sim/vdpa_sim.c
125
126 struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
127 const struct vdpa_dev_set_config *config)
128 {
129 const struct vdpa_config_ops *ops;
130 struct vdpa_device *vdpa;
131 struct vdpasim *vdpasim;
132 struct device *dev;
133 int i, ret = -ENOMEM;
134
135 if (!dev_attr->alloc_size)
136 return ERR_PTR(-EINVAL);
137
138 if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) {
139 if (config->device_features &
140 ~dev_attr->supported_features)
141 return ERR_PTR(-EINVAL);
142 dev_attr->supported_features =
143 config->device_features;
144 }
145
146 if (batch_mapping)
147 ops = &vdpasim_batch_config_ops;
148 else
149 ops = &vdpasim_config_ops;
150
151 vdpa = __vdpa_alloc_device(NULL, ops,
152 dev_attr->ngroups, dev_attr->nas,
153 dev_attr->alloc_size,
154 dev_attr->name, false);
155 if (IS_ERR(vdpa)) {
156 ret = PTR_ERR(vdpa);
157 goto err_alloc;
158 }
159
160 vdpasim = vdpa_to_sim(vdpa);
161 vdpasim->dev_attr = *dev_attr;
162
163 kthread_init_work(&vdpasim->work, vdpasim_work_fn);
164 vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
165 dev_attr->name);
> 166 if (IS_ERR(vdpasim->worker))
167 goto err_iommu;
168
169 spin_lock_init(&vdpasim->lock);
170 spin_lock_init(&vdpasim->iommu_lock);
171
172 dev = &vdpasim->vdpa.dev;
173 dev->dma_mask = &dev->coherent_dma_mask;
174 if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
175 goto err_iommu;
176 vdpasim->vdpa.mdev = dev_attr->mgmt_dev;
177
178 vdpasim->config = kzalloc(dev_attr->config_size, GFP_KERNEL);
179 if (!vdpasim->config)
180 goto err_iommu;
181
182 vdpasim->vqs = kcalloc(dev_attr->nvqs, sizeof(struct vdpasim_virtqueue),
183 GFP_KERNEL);
184 if (!vdpasim->vqs)
185 goto err_iommu;
186
187 vdpasim->iommu = kmalloc_array(vdpasim->dev_attr.nas,
188 sizeof(*vdpasim->iommu), GFP_KERNEL);
189 if (!vdpasim->iommu)
190 goto err_iommu;
191
192 vdpasim->iommu_pt = kmalloc_array(vdpasim->dev_attr.nas,
193 sizeof(*vdpasim->iommu_pt), GFP_KERNEL);
194 if (!vdpasim->iommu_pt)
195 goto err_iommu;
196
197 for (i = 0; i < vdpasim->dev_attr.nas; i++)
198 vhost_iotlb_init(&vdpasim->iommu[i], max_iotlb_entries, 0);
199
200 vdpasim->buffer = kvmalloc(dev_attr->buffer_size, GFP_KERNEL);
201 if (!vdpasim->buffer)
202 goto err_iommu;
203
204 for (i = 0; i < dev_attr->nvqs; i++)
205 vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0],
206 &vdpasim->iommu_lock);
207
208 vdpasim->vdpa.dma_dev = dev;
209
210 return vdpasim;
211
212 err_iommu:
213 put_device(dev);
214 err_alloc:
215 return ERR_PTR(ret);
216 }
217 EXPORT_SYMBOL_GPL(vdpasim_create);
218
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
On Fri, Mar 03, 2023 at 03:38:57PM +0100, Eugenio Perez Martin wrote:
>On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <[email protected]> wrote:
>>
>> vDPA supports the possibility to use user VA in the iotlb messages.
>> So, let's add support for user VA in vringh to use it in the vDPA
>> simulators.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>>
>> Notes:
>> v2:
>> - replace kmap_atomic() with kmap_local_page() [see previous patch]
>> - fix cast warnings when build with W=1 C=1
>>
>> include/linux/vringh.h | 5 +-
>> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
>> drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
>> 4 files changed, 205 insertions(+), 53 deletions(-)
>>
>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> index 1991a02c6431..d39b9f2dcba0 100644
>> --- a/include/linux/vringh.h
>> +++ b/include/linux/vringh.h
>> @@ -32,6 +32,9 @@ struct vringh {
>> /* Can we get away with weak barriers? */
>> bool weak_barriers;
>>
>> + /* Use user's VA */
>> + bool use_va;
>> +
>> /* Last available index we saw (ie. where we're up to). */
>> u16 last_avail_idx;
>>
>> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>> spinlock_t *iotlb_lock);
>>
>> int vringh_init_iotlb(struct vringh *vrh, u64 features,
>> - unsigned int num, bool weak_barriers,
>> + unsigned int num, bool weak_barriers, bool use_va,
>> struct vring_desc *desc,
>> struct vring_avail *avail,
>> struct vring_used *used);
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 3a0e721aef05..babc8dd171a6 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>>
>> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
>> - MLX5_CVQ_MAX_ENT, false,
>> + MLX5_CVQ_MAX_ENT, false, false,
>> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>> (struct vring_used *)(uintptr_t)cvq->device_addr);
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index 6a0a65814626..481eb156658b 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>>
>> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
>> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>> (struct vring_desc *)(uintptr_t)vq->desc_addr,
>> (struct vring_avail *)
>> (uintptr_t)vq->driver_addr,
>> @@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>> vq->cb = NULL;
>> vq->private = NULL;
>> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
>> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
>> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>>
>> vq->vring.notify = NULL;
>> }
>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> index 0ba3ef809e48..61c79cea44ca 100644
>> --- a/drivers/vhost/vringh.c
>> +++ b/drivers/vhost/vringh.c
>> @@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>>
>> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>>
>> -static int iotlb_translate(const struct vringh *vrh,
>> - u64 addr, u64 len, u64 *translated,
>> - struct bio_vec iov[],
>> - int iov_size, u32 perm)
>> +static int iotlb_translate_va(const struct vringh *vrh,
>> + u64 addr, u64 len, u64 *translated,
>> + struct iovec iov[],
>> + int iov_size, u32 perm)
>> {
>> struct vhost_iotlb_map *map;
>> struct vhost_iotlb *iotlb = vrh->iotlb;
>> + u64 s = 0, last = addr + len - 1;
>> int ret = 0;
>> +
>> + spin_lock(vrh->iotlb_lock);
>> +
>> + while (len > s) {
>> + u64 size;
>> +
>> + if (unlikely(ret >= iov_size)) {
>> + ret = -ENOBUFS;
>> + break;
>> + }
>> +
>> + map = vhost_iotlb_itree_first(iotlb, addr, last);
>> + if (!map || map->start > addr) {
>> + ret = -EINVAL;
>> + break;
>> + } else if (!(map->perm & perm)) {
>> + ret = -EPERM;
>> + break;
>> + }
>> +
>> + size = map->size - addr + map->start;
>> + iov[ret].iov_len = min(len - s, size);
>> + iov[ret].iov_base = (void __user *)(unsigned long)
>> + (map->addr + addr - map->start);
>> + s += size;
>> + addr += size;
>> + ++ret;
>> + }
>> +
>> + spin_unlock(vrh->iotlb_lock);
>> +
>> + if (translated)
>> + *translated = min(len, s);
>> +
>> + return ret;
>> +}
>
>It seems to me iotlb_translate_va and iotlb_translate_pa are very
>similar, their only difference is that the argument is that iov is
>iovec instead of bio_vec. And how to fill it, obviously.
>
>It would be great to merge both functions, only differing with a
>conditional on vrh->use_va, or generics, or similar. Or, if following
>the style of the rest of vringh code, to provide a callback to fill
>iovec (although I like conditional more).
>
>However I cannot think of an easy way to perform that without long
>macros or type erasure.
I agree and I tried, but then I got messed up and let it go.
But maybe with the callback it shouldn't be too messy, I can try it and
see what comes out :-)
>
>> +
>> +static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
>> + u64 len, u64 *translated)
>> +{
>> + struct iovec iov[16];
>> + struct iov_iter iter;
>> + int ret;
>> +
>> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
>> + ARRAY_SIZE(iov), VHOST_MAP_RO);
>> + if (ret == -ENOBUFS)
>> + ret = ARRAY_SIZE(iov);
>> + else if (ret < 0)
>> + return ret;
>> +
>> + iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
>> +
>> + return copy_from_iter(dst, *translated, &iter);
>
>Maybe a good baby step for DRY is to return the iov_iter in
>copy_from/to_va/pa here?
Good point! I'll try it.
>
>But I'm ok with this version too.
>
>Acked-by: Eugenio P?rez Martin <[email protected]>
Thanks for the review!
Stefano
On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]> wrote:
>
> These new optional callbacks is used to bind/unbind the device to
> a specific address space so the vDPA framework can use VA when
> these callbacks are implemented.
>
> Suggested-by: Jason Wang <[email protected]>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
One thing that came into my mind is that after this commit:
commit 5ce995f313ce56c0c62425c3ddc37c5c50fc33db
Author: Jason Wang <[email protected]>
Date: Fri May 29 16:02:59 2020 +0800
vhost: use mmgrab() instead of mmget() for non worker device
For the device that doesn't use vhost worker and use_mm(), mmget() is
too heavy weight and it may brings troubles for implementing mmap()
support for vDPA device.
We don't hold the address space after this commit, so the userspace
mapping could be invalid if the owner exits?
Thanks
>
> Notes:
> v2:
> - removed `struct task_struct *owner` param (unused for now, maybe
> useful to support cgroups) [Jason]
> - add unbind_mm callback [Jason]
>
> include/linux/vdpa.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 43f59ef10cc9..369c21394284 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -290,6 +290,14 @@ struct vdpa_map_file {
> * @vdev: vdpa device
> * @idx: virtqueue index
> * Returns pointer to structure device or error (NULL)
> + * @bind_mm: Bind the device to a specific address space
> + * so the vDPA framework can use VA when this
> + * callback is implemented. (optional)
> + * @vdev: vdpa device
> + * @mm: address space to bind
> + * @unbind_mm: Unbind the device from the address space
> + * bound using the bind_mm callback. (optional)
> + * @vdev: vdpa device
> * @free: Free resources that belongs to vDPA (optional)
> * @vdev: vdpa device
> */
> @@ -351,6 +359,8 @@ struct vdpa_config_ops {
> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
> unsigned int asid);
> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
> + int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
> + void (*unbind_mm)(struct vdpa_device *vdev);
>
> /* Free device resources */
> void (*free)(struct vdpa_device *vdev);
> --
> 2.39.2
>
On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]> wrote:
>
> When the user call VHOST_SET_OWNER ioctl and the vDPA device
> has `use_va` set to true, let's call the bind_mm callback.
> In this way we can bind the device to the user address space
> and directly use the user VA.
>
> The unbind_mm callback is called during the release after
> stopping the device.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v2:
> - call the new unbind_mm callback during the release [Jason]
> - avoid to call bind_mm callback after the reset, since the device
> is not detaching it now during the reset
>
> drivers/vhost/vdpa.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index dc12dbd5b43b..1ab89fccd825 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
> return vdpa_reset(vdpa);
> }
>
> +static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> +{
> + struct vdpa_device *vdpa = v->vdpa;
> + const struct vdpa_config_ops *ops = vdpa->config;
> +
> + if (!vdpa->use_va || !ops->bind_mm)
> + return 0;
> +
> + return ops->bind_mm(vdpa, v->vdev.mm);
> +}
> +
> +static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
> +{
> + struct vdpa_device *vdpa = v->vdpa;
> + const struct vdpa_config_ops *ops = vdpa->config;
> +
> + if (!vdpa->use_va || !ops->unbind_mm)
> + return;
> +
> + ops->unbind_mm(vdpa);
> +}
> +
> static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
> {
> struct vdpa_device *vdpa = v->vdpa;
> @@ -711,6 +733,13 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> break;
> default:
> r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> + if (!r && cmd == VHOST_SET_OWNER) {
> + r = vhost_vdpa_bind_mm(v);
> + if (r) {
> + vhost_dev_reset_owner(&v->vdev, NULL);
> + break;
> + }
> + }
Nit: is it better to have a new condition/switch branch instead of
putting them under default? (as what vring_ioctl did).
Thanks
> if (r == -ENOIOCTLCMD)
> r = vhost_vdpa_vring_ioctl(v, cmd, argp);
> break;
> @@ -1285,6 +1314,7 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)
> vhost_vdpa_clean_irq(v);
> vhost_vdpa_reset(v);
> vhost_dev_stop(&v->vdev);
> + vhost_vdpa_unbind_mm(v);
> vhost_vdpa_free_domain(v);
> vhost_vdpa_config_put(v);
> vhost_vdpa_cleanup(v);
> --
> 2.39.2
>
On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]> wrote:
>
> kmap_atomic() is deprecated in favor of kmap_local_page().
It's better to mention the commit or code that introduces this.
>
> With kmap_local_page() the mappings are per thread, CPU local, can take
> page-faults, and can be called from any context (including interrupts).
> Furthermore, the tasks can be preempted and, when they are scheduled to
> run again, the kernel virtual addresses are restored and still valid.
>
> kmap_atomic() is implemented like a kmap_local_page() which also disables
> page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> otherwise it only disables migration).
>
> The code within the mappings/un-mappings in getu16_iotlb() and
> putu16_iotlb() don't depend on the above-mentioned side effects of
> kmap_atomic(),
Note we used to use spinlock to protect simulators (at least until
patch 7, so we probably need to re-order the patches at least) so I
think this is only valid when:
The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
interrupts).
If yes, should we document this? (Or should we introduce a boolean to
say whether an IOTLB variant can be used in an atomic context)?
Thanks
> so that mere replacements of the old API with the new one
> is all that is required (i.e., there is no need to explicitly add calls
> to pagefault_disable() and/or preempt_disable()).
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v2:
> - added this patch since checkpatch.pl complained about deprecation
> of kmap_atomic() touched by next patch
>
> drivers/vhost/vringh.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index a1e27da54481..0ba3ef809e48 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh *vrh,
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_atomic(iov.bv_page);
> + kaddr = kmap_local_page(iov.bv_page);
> from = kaddr + iov.bv_offset;
> *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>
> return 0;
> }
> @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh *vrh,
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_atomic(iov.bv_page);
> + kaddr = kmap_local_page(iov.bv_page);
> to = kaddr + iov.bv_offset;
> WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>
> return 0;
> }
> --
> 2.39.2
>
On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
>
> vDPA supports the possibility to use user VA in the iotlb messages.
> So, let's add support for user VA in vringh to use it in the vDPA
> simulators.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v2:
> - replace kmap_atomic() with kmap_local_page() [see previous patch]
> - fix cast warnings when build with W=1 C=1
>
> include/linux/vringh.h | 5 +-
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
> drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
> 4 files changed, 205 insertions(+), 53 deletions(-)
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index 1991a02c6431..d39b9f2dcba0 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -32,6 +32,9 @@ struct vringh {
> /* Can we get away with weak barriers? */
> bool weak_barriers;
>
> + /* Use user's VA */
> + bool use_va;
> +
> /* Last available index we saw (ie. where we're up to). */
> u16 last_avail_idx;
>
> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
> spinlock_t *iotlb_lock);
>
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> - unsigned int num, bool weak_barriers,
> + unsigned int num, bool weak_barriers, bool use_va,
> struct vring_desc *desc,
> struct vring_avail *avail,
> struct vring_used *used);
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 3a0e721aef05..babc8dd171a6 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>
> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> - MLX5_CVQ_MAX_ENT, false,
> + MLX5_CVQ_MAX_ENT, false, false,
> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
> (struct vring_used *)(uintptr_t)cvq->device_addr);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 6a0a65814626..481eb156658b 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>
> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> (struct vring_desc *)(uintptr_t)vq->desc_addr,
> (struct vring_avail *)
> (uintptr_t)vq->driver_addr,
> @@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
> vq->cb = NULL;
> vq->private = NULL;
> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>
> vq->vring.notify = NULL;
> }
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..61c79cea44ca 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>
> -static int iotlb_translate(const struct vringh *vrh,
> - u64 addr, u64 len, u64 *translated,
> - struct bio_vec iov[],
> - int iov_size, u32 perm)
> +static int iotlb_translate_va(const struct vringh *vrh,
> + u64 addr, u64 len, u64 *translated,
> + struct iovec iov[],
> + int iov_size, u32 perm)
> {
> struct vhost_iotlb_map *map;
> struct vhost_iotlb *iotlb = vrh->iotlb;
> + u64 s = 0, last = addr + len - 1;
> int ret = 0;
> +
> + spin_lock(vrh->iotlb_lock);
> +
> + while (len > s) {
> + u64 size;
> +
> + if (unlikely(ret >= iov_size)) {
> + ret = -ENOBUFS;
> + break;
> + }
> +
> + map = vhost_iotlb_itree_first(iotlb, addr, last);
> + if (!map || map->start > addr) {
> + ret = -EINVAL;
> + break;
> + } else if (!(map->perm & perm)) {
> + ret = -EPERM;
> + break;
> + }
> +
> + size = map->size - addr + map->start;
> + iov[ret].iov_len = min(len - s, size);
> + iov[ret].iov_base = (void __user *)(unsigned long)
> + (map->addr + addr - map->start);
> + s += size;
> + addr += size;
> + ++ret;
> + }
> +
> + spin_unlock(vrh->iotlb_lock);
> +
> + if (translated)
> + *translated = min(len, s);
> +
> + return ret;
> +}
> +
> +static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
> + u64 len, u64 *translated)
> +{
> + struct iovec iov[16];
> + struct iov_iter iter;
> + int ret;
> +
> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
> + ARRAY_SIZE(iov), VHOST_MAP_RO);
> + if (ret == -ENOBUFS)
> + ret = ARRAY_SIZE(iov);
> + else if (ret < 0)
> + return ret;
> +
> + iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
> +
> + return copy_from_iter(dst, *translated, &iter);
> +}
> +
> +static inline int copy_to_va(const struct vringh *vrh, void *dst, void *src,
> + u64 len, u64 *translated)
> +{
> + struct iovec iov[16];
> + struct iov_iter iter;
> + int ret;
> +
> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)dst, len, translated, iov,
> + ARRAY_SIZE(iov), VHOST_MAP_WO);
> + if (ret == -ENOBUFS)
> + ret = ARRAY_SIZE(iov);
> + else if (ret < 0)
> + return ret;
> +
> + iov_iter_init(&iter, ITER_DEST, iov, ret, *translated);
> +
> + return copy_to_iter(src, *translated, &iter);
> +}
> +
> +static int iotlb_translate_pa(const struct vringh *vrh,
> + u64 addr, u64 len, u64 *translated,
> + struct bio_vec iov[],
> + int iov_size, u32 perm)
> +{
> + struct vhost_iotlb_map *map;
> + struct vhost_iotlb *iotlb = vrh->iotlb;
> u64 s = 0, last = addr + len - 1;
> + int ret = 0;
>
> spin_lock(vrh->iotlb_lock);
>
> @@ -1141,28 +1225,61 @@ static int iotlb_translate(const struct vringh *vrh,
> return ret;
> }
>
> +static inline int copy_from_pa(const struct vringh *vrh, void *dst, void *src,
> + u64 len, u64 *translated)
> +{
> + struct bio_vec iov[16];
> + struct iov_iter iter;
> + int ret;
> +
> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)src, len, translated, iov,
> + ARRAY_SIZE(iov), VHOST_MAP_RO);
> + if (ret == -ENOBUFS)
> + ret = ARRAY_SIZE(iov);
> + else if (ret < 0)
> + return ret;
> +
> + iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, *translated);
> +
> + return copy_from_iter(dst, *translated, &iter);
> +}
> +
> +static inline int copy_to_pa(const struct vringh *vrh, void *dst, void *src,
> + u64 len, u64 *translated)
> +{
> + struct bio_vec iov[16];
> + struct iov_iter iter;
> + int ret;
> +
> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)dst, len, translated, iov,
> + ARRAY_SIZE(iov), VHOST_MAP_WO);
> + if (ret == -ENOBUFS)
> + ret = ARRAY_SIZE(iov);
> + else if (ret < 0)
> + return ret;
> +
> + iov_iter_bvec(&iter, ITER_DEST, iov, ret, *translated);
> +
> + return copy_to_iter(src, *translated, &iter);
> +}
> +
> static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
> void *src, size_t len)
> {
> u64 total_translated = 0;
>
> while (total_translated < len) {
> - struct bio_vec iov[16];
> - struct iov_iter iter;
> u64 translated;
> int ret;
>
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
> - len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> - if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> - else if (ret < 0)
> - return ret;
> -
> - iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> + if (vrh->use_va) {
> + ret = copy_from_va(vrh, dst, src,
> + len - total_translated, &translated);
> + } else {
> + ret = copy_from_pa(vrh, dst, src,
> + len - total_translated, &translated);
> + }
>
> - ret = copy_from_iter(dst, translated, &iter);
> if (ret < 0)
> return ret;
>
> @@ -1180,22 +1297,17 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> u64 total_translated = 0;
>
> while (total_translated < len) {
> - struct bio_vec iov[16];
> - struct iov_iter iter;
> u64 translated;
> int ret;
>
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
> - len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> - if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> - else if (ret < 0)
> - return ret;
> -
> - iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> + if (vrh->use_va) {
> + ret = copy_to_va(vrh, dst, src,
> + len - total_translated, &translated);
> + } else {
> + ret = copy_to_pa(vrh, dst, src,
> + len - total_translated, &translated);
> + }
>
> - ret = copy_to_iter(src, translated, &iter);
> if (ret < 0)
> return ret;
>
> @@ -1210,20 +1322,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> static inline int getu16_iotlb(const struct vringh *vrh,
> u16 *val, const __virtio16 *p)
> {
> - struct bio_vec iov;
> - void *kaddr, *from;
> int ret;
>
> /* Atomic read is needed for getu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_RO);
> - if (ret < 0)
> - return ret;
> + if (vrh->use_va) {
> + struct iovec iov;
> + __virtio16 tmp;
> +
> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, VHOST_MAP_RO);
> + if (ret < 0)
> + return ret;
Nit: since we have copy_to_va/copy_to_pa variants, let's introduce
getu16_iotlb_va/pa variants?
>
> - kaddr = kmap_local_page(iov.bv_page);
> - from = kaddr + iov.bv_offset;
> - *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_local(kaddr);
> + ret = __get_user(tmp, (__virtio16 __user *)iov.iov_base);
> + if (ret)
> + return ret;
> +
> + *val = vringh16_to_cpu(vrh, tmp);
> + } else {
> + struct bio_vec iov;
> + void *kaddr, *from;
> +
> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, VHOST_MAP_RO);
> + if (ret < 0)
> + return ret;
> +
> + kaddr = kmap_local_page(iov.bv_page);
If we decide to have a use_va switch, is kmap_local_page() still required here?
Other looks good.
Thanks
> + from = kaddr + iov.bv_offset;
> + *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> + kunmap_local(kaddr);
> + }
>
> return 0;
> }
> @@ -1231,20 +1360,37 @@ static inline int getu16_iotlb(const struct vringh *vrh,
> static inline int putu16_iotlb(const struct vringh *vrh,
> __virtio16 *p, u16 val)
> {
> - struct bio_vec iov;
> - void *kaddr, *to;
> int ret;
>
> /* Atomic write is needed for putu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_WO);
> - if (ret < 0)
> - return ret;
> + if (vrh->use_va) {
> + struct iovec iov;
> + __virtio16 tmp;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - to = kaddr + iov.bv_offset;
> - WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_local(kaddr);
> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, VHOST_MAP_RO);
> + if (ret < 0)
> + return ret;
> +
> + tmp = cpu_to_vringh16(vrh, val);
> +
> + ret = __put_user(tmp, (__virtio16 __user *)iov.iov_base);
> + if (ret)
> + return ret;
> + } else {
> + struct bio_vec iov;
> + void *kaddr, *to;
> +
> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> + &iov, 1, VHOST_MAP_WO);
> + if (ret < 0)
> + return ret;
> +
> + kaddr = kmap_local_page(iov.bv_page);
> + to = kaddr + iov.bv_offset;
> + WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> + kunmap_local(kaddr);
> + }
>
> return 0;
> }
> @@ -1306,6 +1452,7 @@ static inline int putused_iotlb(const struct vringh *vrh,
> * @features: the feature bits for this ring.
> * @num: the number of elements.
> * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @use_va: true if IOTLB contains user VA
> * @desc: the userpace descriptor pointer.
> * @avail: the userpace avail pointer.
> * @used: the userpace used pointer.
> @@ -1313,11 +1460,13 @@ static inline int putused_iotlb(const struct vringh *vrh,
> * Returns an error if num is invalid.
> */
> int vringh_init_iotlb(struct vringh *vrh, u64 features,
> - unsigned int num, bool weak_barriers,
> + unsigned int num, bool weak_barriers, bool use_va,
> struct vring_desc *desc,
> struct vring_avail *avail,
> struct vring_used *used)
> {
> + vrh->use_va = use_va;
> +
> return vringh_init_kern(vrh, features, num, weak_barriers,
> desc, avail, used);
> }
> --
> 2.39.2
>
On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
>
> Let's move work management inside the vdpa_sim core.
> This way we can easily change how we manage the works, without
> having to change the devices each time.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Jason Wang <[email protected]>
Thanks
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 +++++++++++++++--
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 6 ++----
> drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 6 ++----
> 4 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 144858636c10..acee20faaf6a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
> u32 ngroups;
> u32 nas;
>
> - work_func_t work_fn;
> + void (*work_fn)(struct vdpasim *vdpasim);
> void (*get_config)(struct vdpasim *vdpasim, void *config);
> void (*set_config)(struct vdpasim *vdpasim, const void *config);
> int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
> @@ -78,6 +78,7 @@ struct vdpasim {
>
> struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
> const struct vdpa_dev_set_config *config);
> +void vdpasim_schedule_work(struct vdpasim *vdpasim);
>
> /* TODO: cross-endian support */
> static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 481eb156658b..a6ee830efc38 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -116,6 +116,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
> static const struct vdpa_config_ops vdpasim_config_ops;
> static const struct vdpa_config_ops vdpasim_batch_config_ops;
>
> +static void vdpasim_work_fn(struct work_struct *work)
> +{
> + struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> +
> + vdpasim->dev_attr.work_fn(vdpasim);
> +}
> +
> struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> const struct vdpa_dev_set_config *config)
> {
> @@ -152,7 +159,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>
> vdpasim = vdpa_to_sim(vdpa);
> vdpasim->dev_attr = *dev_attr;
> - INIT_WORK(&vdpasim->work, dev_attr->work_fn);
> + INIT_WORK(&vdpasim->work, vdpasim_work_fn);
> spin_lock_init(&vdpasim->lock);
> spin_lock_init(&vdpasim->iommu_lock);
>
> @@ -203,6 +210,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> }
> EXPORT_SYMBOL_GPL(vdpasim_create);
>
> +void vdpasim_schedule_work(struct vdpasim *vdpasim)
> +{
> + schedule_work(&vdpasim->work);
> +}
> +EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
> +
> static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> u64 desc_area, u64 driver_area,
> u64 device_area)
> @@ -237,7 +250,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
> }
>
> if (vq->ready)
> - schedule_work(&vdpasim->work);
> + vdpasim_schedule_work(vdpasim);
> }
>
> static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index 5117959bed8a..eb4897c8541e 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -11,7 +11,6 @@
> #include <linux/module.h>
> #include <linux/device.h>
> #include <linux/kernel.h>
> -#include <linux/sched.h>
> #include <linux/blkdev.h>
> #include <linux/vringh.h>
> #include <linux/vdpa.h>
> @@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
> return handled;
> }
>
> -static void vdpasim_blk_work(struct work_struct *work)
> +static void vdpasim_blk_work(struct vdpasim *vdpasim)
> {
> - struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> bool reschedule = false;
> int i;
>
> @@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
> spin_unlock(&vdpasim->lock);
>
> if (reschedule)
> - schedule_work(&vdpasim->work);
> + vdpasim_schedule_work(vdpasim);
> }
>
> static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> index 862f405362de..e61a9ecbfafe 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> @@ -11,7 +11,6 @@
> #include <linux/module.h>
> #include <linux/device.h>
> #include <linux/kernel.h>
> -#include <linux/sched.h>
> #include <linux/etherdevice.h>
> #include <linux/vringh.h>
> #include <linux/vdpa.h>
> @@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
> u64_stats_update_end(&net->cq_stats.syncp);
> }
>
> -static void vdpasim_net_work(struct work_struct *work)
> +static void vdpasim_net_work(struct vdpasim *vdpasim)
> {
> - struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
> struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
> struct vdpasim_net *net = sim_to_net(vdpasim);
> @@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
> vdpasim_net_complete(rxq, write);
>
> if (tx_pkts > 4) {
> - schedule_work(&vdpasim->work);
> + vdpasim_schedule_work(vdpasim);
> goto out;
> }
> }
> --
> 2.39.2
>
On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
>
> The spinlock we use to protect the state of the simulator is sometimes
> held for a long time (for example, when devices handle requests).
>
> This also prevents us from calling functions that might sleep (such as
> kthread_flush_work() in the next patch), and thus having to release
> and retake the lock.
>
> For these reasons, let's replace the spinlock with a mutex that gives
> us more flexibility.
>
> Suggested-by: Jason Wang <[email protected]>
> Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Jason Wang <[email protected]>
Thanks
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim.h | 4 ++--
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 34 ++++++++++++++--------------
> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 ++--
> drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 4 ++--
> 4 files changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index ce83f9130a5d..4774292fba8c 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -60,8 +60,8 @@ struct vdpasim {
> struct kthread_worker *worker;
> struct kthread_work work;
> struct vdpasim_dev_attr dev_attr;
> - /* spinlock to synchronize virtqueue state */
> - spinlock_t lock;
> + /* mutex to synchronize virtqueue state */
> + struct mutex mutex;
> /* virtio config according to device type */
> void *config;
> struct vhost_iotlb *iommu;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 6feb29726c2a..a28103a67ae7 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -166,7 +166,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> if (IS_ERR(vdpasim->worker))
> goto err_iommu;
>
> - spin_lock_init(&vdpasim->lock);
> + mutex_init(&vdpasim->mutex);
> spin_lock_init(&vdpasim->iommu_lock);
>
> dev = &vdpasim->vdpa.dev;
> @@ -275,13 +275,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> bool old_ready;
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
> old_ready = vq->ready;
> vq->ready = ready;
> if (vq->ready && !old_ready) {
> vdpasim_queue_ready(vdpasim, idx);
> }
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
> }
>
> static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
> @@ -299,9 +299,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> struct vringh *vrh = &vq->vring;
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
> vrh->last_avail_idx = state->split.avail_index;
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
>
> return 0;
> }
> @@ -398,9 +398,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> u8 status;
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
> status = vdpasim->status;
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
>
> return status;
> }
> @@ -409,19 +409,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
> {
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
> vdpasim->status = status;
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
> }
>
> static int vdpasim_reset(struct vdpa_device *vdpa)
> {
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
> vdpasim->status = 0;
> vdpasim_do_reset(vdpasim);
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
>
> return 0;
> }
> @@ -430,9 +430,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
> {
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
> vdpasim->running = false;
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
>
> return 0;
> }
> @@ -442,7 +442,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> int i;
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
> vdpasim->running = true;
>
> if (vdpasim->pending_kick) {
> @@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
> vdpasim->pending_kick = false;
> }
>
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
>
> return 0;
> }
> @@ -525,14 +525,14 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
>
> iommu = &vdpasim->iommu[asid];
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
>
> for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
> if (vdpasim_get_vq_group(vdpa, i) == group)
> vringh_set_iotlb(&vdpasim->vqs[i].vring, iommu,
> &vdpasim->iommu_lock);
>
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
>
> return 0;
> }
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index eb4897c8541e..568119e1553f 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -290,7 +290,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
> bool reschedule = false;
> int i;
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
>
> if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> goto out;
> @@ -321,7 +321,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
> }
> }
> out:
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
>
> if (reschedule)
> vdpasim_schedule_work(vdpasim);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> index e61a9ecbfafe..7ab434592bfe 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> @@ -201,7 +201,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
> u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0;
> int err;
>
> - spin_lock(&vdpasim->lock);
> + mutex_lock(&vdpasim->mutex);
>
> if (!vdpasim->running)
> goto out;
> @@ -264,7 +264,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
> }
>
> out:
> - spin_unlock(&vdpasim->lock);
> + mutex_unlock(&vdpasim->mutex);
>
> u64_stats_update_begin(&net->tx_stats.syncp);
> net->tx_stats.pkts += tx_pkts;
> --
> 2.39.2
>
On Tue, Mar 14, 2023 at 1:29 PM Jason Wang <[email protected]> wrote:
>
> On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
> >
> > The spinlock we use to protect the state of the simulator is sometimes
> > held for a long time (for example, when devices handle requests).
> >
> > This also prevents us from calling functions that might sleep (such as
> > kthread_flush_work() in the next patch), and thus having to release
> > and retake the lock.
> >
> > For these reasons, let's replace the spinlock with a mutex that gives
> > us more flexibility.
> >
> > Suggested-by: Jason Wang <[email protected]>
> > Signed-off-by: Stefano Garzarella <[email protected]>
>
> Acked-by: Jason Wang <[email protected]>
>
> Thanks
Btw, though it looks fine but we'd better double confirm virtio_vdpa works well.
(I think so since there's transport that might sleep).
Thanks
>
> > ---
> > drivers/vdpa/vdpa_sim/vdpa_sim.h | 4 ++--
> > drivers/vdpa/vdpa_sim/vdpa_sim.c | 34 ++++++++++++++--------------
> > drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 ++--
> > drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 4 ++--
> > 4 files changed, 23 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> > index ce83f9130a5d..4774292fba8c 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> > @@ -60,8 +60,8 @@ struct vdpasim {
> > struct kthread_worker *worker;
> > struct kthread_work work;
> > struct vdpasim_dev_attr dev_attr;
> > - /* spinlock to synchronize virtqueue state */
> > - spinlock_t lock;
> > + /* mutex to synchronize virtqueue state */
> > + struct mutex mutex;
> > /* virtio config according to device type */
> > void *config;
> > struct vhost_iotlb *iommu;
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > index 6feb29726c2a..a28103a67ae7 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> > @@ -166,7 +166,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> > if (IS_ERR(vdpasim->worker))
> > goto err_iommu;
> >
> > - spin_lock_init(&vdpasim->lock);
> > + mutex_init(&vdpasim->mutex);
> > spin_lock_init(&vdpasim->iommu_lock);
> >
> > dev = &vdpasim->vdpa.dev;
> > @@ -275,13 +275,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
> > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > bool old_ready;
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> > old_ready = vq->ready;
> > vq->ready = ready;
> > if (vq->ready && !old_ready) {
> > vdpasim_queue_ready(vdpasim, idx);
> > }
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> > }
> >
> > static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
> > @@ -299,9 +299,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
> > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > struct vringh *vrh = &vq->vring;
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> > vrh->last_avail_idx = state->split.avail_index;
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> >
> > return 0;
> > }
> > @@ -398,9 +398,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
> > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > u8 status;
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> > status = vdpasim->status;
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> >
> > return status;
> > }
> > @@ -409,19 +409,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
> > {
> > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> > vdpasim->status = status;
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> > }
> >
> > static int vdpasim_reset(struct vdpa_device *vdpa)
> > {
> > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> > vdpasim->status = 0;
> > vdpasim_do_reset(vdpasim);
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> >
> > return 0;
> > }
> > @@ -430,9 +430,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
> > {
> > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> > vdpasim->running = false;
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> >
> > return 0;
> > }
> > @@ -442,7 +442,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
> > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > int i;
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> > vdpasim->running = true;
> >
> > if (vdpasim->pending_kick) {
> > @@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
> > vdpasim->pending_kick = false;
> > }
> >
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> >
> > return 0;
> > }
> > @@ -525,14 +525,14 @@ static int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,
> >
> > iommu = &vdpasim->iommu[asid];
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> >
> > for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
> > if (vdpasim_get_vq_group(vdpa, i) == group)
> > vringh_set_iotlb(&vdpasim->vqs[i].vring, iommu,
> > &vdpasim->iommu_lock);
> >
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> >
> > return 0;
> > }
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > index eb4897c8541e..568119e1553f 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > @@ -290,7 +290,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
> > bool reschedule = false;
> > int i;
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> >
> > if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > goto out;
> > @@ -321,7 +321,7 @@ static void vdpasim_blk_work(struct vdpasim *vdpasim)
> > }
> > }
> > out:
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> >
> > if (reschedule)
> > vdpasim_schedule_work(vdpasim);
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> > index e61a9ecbfafe..7ab434592bfe 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> > @@ -201,7 +201,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
> > u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0;
> > int err;
> >
> > - spin_lock(&vdpasim->lock);
> > + mutex_lock(&vdpasim->mutex);
> >
> > if (!vdpasim->running)
> > goto out;
> > @@ -264,7 +264,7 @@ static void vdpasim_net_work(struct vdpasim *vdpasim)
> > }
> >
> > out:
> > - spin_unlock(&vdpasim->lock);
> > + mutex_unlock(&vdpasim->mutex);
> >
> > u64_stats_update_begin(&net->tx_stats.syncp);
> > net->tx_stats.pkts += tx_pkts;
> > --
> > 2.39.2
> >
On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
>
> Let's use our own kthread to run device jobs.
> This allows us more flexibility, especially we can attach the kthread
> to the user address space when vDPA uses user's VA.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Jason Wang <[email protected]>
Thanks
> ---
> drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 ++-
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 ++++++++++++-----
> 2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index acee20faaf6a..ce83f9130a5d 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
> struct vdpasim {
> struct vdpa_device vdpa;
> struct vdpasim_virtqueue *vqs;
> - struct work_struct work;
> + struct kthread_worker *worker;
> + struct kthread_work work;
> struct vdpasim_dev_attr dev_attr;
> /* spinlock to synchronize virtqueue state */
> spinlock_t lock;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index a6ee830efc38..6feb29726c2a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -11,8 +11,8 @@
> #include <linux/module.h>
> #include <linux/device.h>
> #include <linux/kernel.h>
> +#include <linux/kthread.h>
> #include <linux/slab.h>
> -#include <linux/sched.h>
> #include <linux/dma-map-ops.h>
> #include <linux/vringh.h>
> #include <linux/vdpa.h>
> @@ -116,7 +116,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
> static const struct vdpa_config_ops vdpasim_config_ops;
> static const struct vdpa_config_ops vdpasim_batch_config_ops;
>
> -static void vdpasim_work_fn(struct work_struct *work)
> +static void vdpasim_work_fn(struct kthread_work *work)
> {
> struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
>
> @@ -159,7 +159,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>
> vdpasim = vdpa_to_sim(vdpa);
> vdpasim->dev_attr = *dev_attr;
> - INIT_WORK(&vdpasim->work, vdpasim_work_fn);
> +
> + kthread_init_work(&vdpasim->work, vdpasim_work_fn);
> + vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
> + dev_attr->name);
> + if (IS_ERR(vdpasim->worker))
> + goto err_iommu;
> +
> spin_lock_init(&vdpasim->lock);
> spin_lock_init(&vdpasim->iommu_lock);
>
> @@ -212,7 +218,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
>
> void vdpasim_schedule_work(struct vdpasim *vdpasim)
> {
> - schedule_work(&vdpasim->work);
> + kthread_queue_work(vdpasim->worker, &vdpasim->work);
> }
> EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
>
> @@ -612,7 +618,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> int i;
>
> - cancel_work_sync(&vdpasim->work);
> + kthread_cancel_work_sync(&vdpasim->work);
> + kthread_destroy_worker(vdpasim->worker);
>
> for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
> vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
> --
> 2.39.2
>
On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
>
> The new "use_va" module parameter (default: false) is used in
> vdpa_alloc_device() to inform the vDPA framework that the device
> supports VA.
>
> vringh is initialized to use VA only when "use_va" is true and the
> user's mm has been bound. So, only when the bus supports user VA
> (e.g. vhost-vdpa).
>
> vdpasim_mm_work_fn work is used to attach the kthread to the user
> address space when the .bind_mm callback is invoked, and to detach
> it when the .unbind_mm callback is invoked.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v2:
> - `use_va` set to true by default [Eugenio]
> - supported the new unbind_mm callback [Jason]
> - removed the unbind_mm call in vdpasim_do_reset() [Jason]
> - avoided to release the lock while call kthread_flush_work() since we
> are now using a mutex to protect the device state
>
> drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 98 +++++++++++++++++++++++++++++++-
> 2 files changed, 97 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 4774292fba8c..3a42887d05d9 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -59,6 +59,7 @@ struct vdpasim {
> struct vdpasim_virtqueue *vqs;
> struct kthread_worker *worker;
> struct kthread_work work;
> + struct mm_struct *mm_bound;
> struct vdpasim_dev_attr dev_attr;
> /* mutex to synchronize virtqueue state */
> struct mutex mutex;
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index a28103a67ae7..eda26bc33df5 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -35,10 +35,77 @@ module_param(max_iotlb_entries, int, 0444);
> MODULE_PARM_DESC(max_iotlb_entries,
> "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
>
> +static bool use_va = true;
> +module_param(use_va, bool, 0444);
> +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
> +
> #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
> #define VDPASIM_QUEUE_MAX 256
> #define VDPASIM_VENDOR_ID 0
>
> +struct vdpasim_mm_work {
> + struct kthread_work work;
> + struct mm_struct *mm;
> + bool bind;
> + int ret;
> +};
> +
> +static void vdpasim_mm_work_fn(struct kthread_work *work)
> +{
> + struct vdpasim_mm_work *mm_work =
> + container_of(work, struct vdpasim_mm_work, work);
> +
> + mm_work->ret = 0;
> +
> + if (mm_work->bind) {
> + kthread_use_mm(mm_work->mm);
> + //TODO: should we attach the cgroup of the mm owner?
> + } else {
> + kthread_unuse_mm(mm_work->mm);
> + }
> +}
> +
> +static void vdpasim_worker_queue_mm(struct vdpasim *vdpasim,
> + struct vdpasim_mm_work *mm_work)
> +{
Nit: we need to tweak the name as it does flush besides queuing the work.
> + struct kthread_work *work = &mm_work->work;
> +
> + kthread_init_work(work, vdpasim_mm_work_fn);
> + kthread_queue_work(vdpasim->worker, work);
> +
> + kthread_flush_work(work);
> +}
> +
> +static int vdpasim_worker_bind_mm(struct vdpasim *vdpasim,
> + struct mm_struct *new_mm)
> +{
> + struct vdpasim_mm_work mm_work;
> +
> + mm_work.mm = new_mm;
> + mm_work.bind = true;
> +
> + vdpasim_worker_queue_mm(vdpasim, &mm_work);
> +
> + if (!mm_work.ret)
> + vdpasim->mm_bound = new_mm;
> +
> + return mm_work.ret;
> +}
> +
> +static void vdpasim_worker_unbind_mm(struct vdpasim *vdpasim)
> +{
> + struct vdpasim_mm_work mm_work;
> +
> + if (!vdpasim->mm_bound)
> + return;
> +
> + mm_work.mm = vdpasim->mm_bound;
> + mm_work.bind = false;
Can we simply use mm_work.mm = NULL for unbinding?
> +
> + vdpasim_worker_queue_mm(vdpasim, &mm_work);
> +
> + vdpasim->mm_bound = NULL;
And change the mm_bound in the worker?
Thanks
> +}
> static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
> {
> return container_of(vdpa, struct vdpasim, vdpa);
> @@ -59,8 +126,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> {
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> uint16_t last_avail_idx = vq->vring.last_avail_idx;
> + bool va_enabled = use_va && vdpasim->mm_bound;
>
> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
> + va_enabled,
> (struct vring_desc *)(uintptr_t)vq->desc_addr,
> (struct vring_avail *)
> (uintptr_t)vq->driver_addr,
> @@ -151,7 +220,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
> vdpa = __vdpa_alloc_device(NULL, ops,
> dev_attr->ngroups, dev_attr->nas,
> dev_attr->alloc_size,
> - dev_attr->name, false);
> + dev_attr->name, use_va);
> if (IS_ERR(vdpa)) {
> ret = PTR_ERR(vdpa);
> goto err_alloc;
> @@ -571,6 +640,27 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
> return ret;
> }
>
> +static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
> +{
> + struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> + int ret;
> +
> + mutex_lock(&vdpasim->mutex);
> + ret = vdpasim_worker_bind_mm(vdpasim, mm);
> + mutex_unlock(&vdpasim->mutex);
> +
> + return ret;
> +}
> +
> +static void vdpasim_unbind_mm(struct vdpa_device *vdpa)
> +{
> + struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +
> + mutex_lock(&vdpasim->mutex);
> + vdpasim_worker_unbind_mm(vdpasim);
> + mutex_unlock(&vdpasim->mutex);
> +}
> +
> static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,
> u64 iova, u64 size,
> u64 pa, u32 perm, void *opaque)
> @@ -667,6 +757,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
> .set_group_asid = vdpasim_set_group_asid,
> .dma_map = vdpasim_dma_map,
> .dma_unmap = vdpasim_dma_unmap,
> + .bind_mm = vdpasim_bind_mm,
> + .unbind_mm = vdpasim_unbind_mm,
> .free = vdpasim_free,
> };
>
> @@ -701,6 +793,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
> .get_iova_range = vdpasim_get_iova_range,
> .set_group_asid = vdpasim_set_group_asid,
> .set_map = vdpasim_set_map,
> + .bind_mm = vdpasim_bind_mm,
> + .unbind_mm = vdpasim_unbind_mm,
> .free = vdpasim_free,
> };
>
> --
> 2.39.2
>
On martedì 14 marzo 2023 04:56:08 CET Jason Wang wrote:
> On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]>
wrote:
> > kmap_atomic() is deprecated in favor of kmap_local_page().
>
> It's better to mention the commit or code that introduces this.
>
> > With kmap_local_page() the mappings are per thread, CPU local, can take
> > page-faults, and can be called from any context (including interrupts).
> > Furthermore, the tasks can be preempted and, when they are scheduled to
> > run again, the kernel virtual addresses are restored and still valid.
> >
> > kmap_atomic() is implemented like a kmap_local_page() which also disables
> > page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> > otherwise it only disables migration).
> >
> > The code within the mappings/un-mappings in getu16_iotlb() and
> > putu16_iotlb() don't depend on the above-mentioned side effects of
> > kmap_atomic(),
>
> Note we used to use spinlock to protect simulators (at least until
> patch 7, so we probably need to re-order the patches at least) so I
> think this is only valid when:
>
> The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
> interrupts).
I'm probably missing some context but it looks that you are saying that
kmap_local_page() is not suited for any use in atomic context (you are
mentioning spinlocks).
The commit message (that I know pretty well since it's the exact copy, word by
word, of my boiler plate commits) explains that kmap_local_page() is perfectly
usable in atomic context (including interrupts).
I don't know this code, however I am not able to see why these vringh IOTLB
helpers cannot work if used under spinlocks. Can you please elaborate a little
more?
> If yes, should we document this? (Or should we introduce a boolean to
> say whether an IOTLB variant can be used in an atomic context)?
Again, you'll have no problems from the use of kmap_local_page() and so you
don't need any boolean to tell whether or not the code is running in atomic
context.
Please take a look at the Highmem documentation which has been recently
reworked and extended by me: https://docs.kernel.org/mm/highmem.html
Anyway, I have been ATK 12 or 13 hours in a row. So I'm probably missing the
whole picture.
Thanks,
Fabio
> Thanks
>
> > so that mere replacements of the old API with the new one
> > is all that is required (i.e., there is no need to explicitly add calls
> > to pagefault_disable() and/or preempt_disable()).
> >
> > Signed-off-by: Stefano Garzarella <[email protected]>
> > ---
> >
> > Notes:
> > v2:
> > - added this patch since checkpatch.pl complained about deprecation
> >
> > of kmap_atomic() touched by next patch
> >
> > drivers/vhost/vringh.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> > index a1e27da54481..0ba3ef809e48 100644
> > --- a/drivers/vhost/vringh.c
> > +++ b/drivers/vhost/vringh.c
> > @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh
> > *vrh,>
> > if (ret < 0)
> >
> > return ret;
> >
> > - kaddr = kmap_atomic(iov.bv_page);
> > + kaddr = kmap_local_page(iov.bv_page);
> >
> > from = kaddr + iov.bv_offset;
> > *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> >
> > - kunmap_atomic(kaddr);
> > + kunmap_local(kaddr);
> >
> > return 0;
> >
> > }
> >
> > @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh
> > *vrh,>
> > if (ret < 0)
> >
> > return ret;
> >
> > - kaddr = kmap_atomic(iov.bv_page);
> > + kaddr = kmap_local_page(iov.bv_page);
> >
> > to = kaddr + iov.bv_offset;
> > WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> >
> > - kunmap_atomic(kaddr);
> > + kunmap_local(kaddr);
> >
> > return 0;
> >
> > }
> >
> > --
> > 2.39.2
On Thu, Mar 16, 2023 at 5:12 AM Fabio M. De Francesco
<[email protected]> wrote:
>
> On martedì 14 marzo 2023 04:56:08 CET Jason Wang wrote:
> > On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]>
> wrote:
> > > kmap_atomic() is deprecated in favor of kmap_local_page().
> >
> > It's better to mention the commit or code that introduces this.
> >
> > > With kmap_local_page() the mappings are per thread, CPU local, can take
> > > page-faults, and can be called from any context (including interrupts).
> > > Furthermore, the tasks can be preempted and, when they are scheduled to
> > > run again, the kernel virtual addresses are restored and still valid.
> > >
> > > kmap_atomic() is implemented like a kmap_local_page() which also disables
> > > page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> > > otherwise it only disables migration).
> > >
> > > The code within the mappings/un-mappings in getu16_iotlb() and
> > > putu16_iotlb() don't depend on the above-mentioned side effects of
> > > kmap_atomic(),
> >
> > Note we used to use spinlock to protect simulators (at least until
> > patch 7, so we probably need to re-order the patches at least) so I
> > think this is only valid when:
> >
> > The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
> > interrupts).
>
> I'm probably missing some context but it looks that you are saying that
> kmap_local_page() is not suited for any use in atomic context (you are
> mentioning spinlocks).
>
> The commit message (that I know pretty well since it's the exact copy, word by
> word, of my boiler plate commits) explains that kmap_local_page() is perfectly
> usable in atomic context (including interrupts).
Thanks for the confirmation, I misread the change log and thought it
said it can't be used in interrupts.
>
> I don't know this code, however I am not able to see why these vringh IOTLB
> helpers cannot work if used under spinlocks. Can you please elaborate a little
> more?
My fault, see above.
>
> > If yes, should we document this? (Or should we introduce a boolean to
> > say whether an IOTLB variant can be used in an atomic context)?
>
> Again, you'll have no problems from the use of kmap_local_page() and so you
> don't need any boolean to tell whether or not the code is running in atomic
> context.
>
> Please take a look at the Highmem documentation which has been recently
> reworked and extended by me: https://docs.kernel.org/mm/highmem.html
This is really helpful.
>
> Anyway, I have been ATK 12 or 13 hours in a row. So I'm probably missing the
> whole picture.
It's me that misses the movitiation of kmap_local().
Thanks
>
> Thanks,
>
> Fabio
>
> > Thanks
> >
> > > so that mere replacements of the old API with the new one
> > > is all that is required (i.e., there is no need to explicitly add calls
> > > to pagefault_disable() and/or preempt_disable()).
> > >
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > ---
> > >
> > > Notes:
> > > v2:
> > > - added this patch since checkpatch.pl complained about deprecation
> > >
> > > of kmap_atomic() touched by next patch
> > >
> > > drivers/vhost/vringh.c | 8 ++++----
> > > 1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> > > index a1e27da54481..0ba3ef809e48 100644
> > > --- a/drivers/vhost/vringh.c
> > > +++ b/drivers/vhost/vringh.c
> > > @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh
> > > *vrh,>
> > > if (ret < 0)
> > >
> > > return ret;
> > >
> > > - kaddr = kmap_atomic(iov.bv_page);
> > > + kaddr = kmap_local_page(iov.bv_page);
> > >
> > > from = kaddr + iov.bv_offset;
> > > *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> > >
> > > - kunmap_atomic(kaddr);
> > > + kunmap_local(kaddr);
> > >
> > > return 0;
> > >
> > > }
> > >
> > > @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh
> > > *vrh,>
> > > if (ret < 0)
> > >
> > > return ret;
> > >
> > > - kaddr = kmap_atomic(iov.bv_page);
> > > + kaddr = kmap_local_page(iov.bv_page);
> > >
> > > to = kaddr + iov.bv_offset;
> > > WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> > >
> > > - kunmap_atomic(kaddr);
> > > + kunmap_local(kaddr);
> > >
> > > return 0;
> > >
> > > }
> > >
> > > --
> > > 2.39.2
>
>
>
>
On Wed, Mar 15, 2023 at 10:12 PM Fabio M. De Francesco
<[email protected]> wrote:
>
> On martedì 14 marzo 2023 04:56:08 CET Jason Wang wrote:
> > On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]>
> wrote:
> > > kmap_atomic() is deprecated in favor of kmap_local_page().
> >
> > It's better to mention the commit or code that introduces this.
> >
> > > With kmap_local_page() the mappings are per thread, CPU local, can take
> > > page-faults, and can be called from any context (including interrupts).
> > > Furthermore, the tasks can be preempted and, when they are scheduled to
> > > run again, the kernel virtual addresses are restored and still valid.
> > >
> > > kmap_atomic() is implemented like a kmap_local_page() which also disables
> > > page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> > > otherwise it only disables migration).
> > >
> > > The code within the mappings/un-mappings in getu16_iotlb() and
> > > putu16_iotlb() don't depend on the above-mentioned side effects of
> > > kmap_atomic(),
> >
> > Note we used to use spinlock to protect simulators (at least until
> > patch 7, so we probably need to re-order the patches at least) so I
> > think this is only valid when:
> >
> > The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
> > interrupts).
>
> I'm probably missing some context but it looks that you are saying that
> kmap_local_page() is not suited for any use in atomic context (you are
> mentioning spinlocks).
>
> The commit message (that I know pretty well since it's the exact copy, word by
> word, of my boiler plate commits)
I hope it's not a problem for you, should I mention it somehow?
I searched for the last commits that made a similar change and found
yours that explained it perfectly ;-)
Do I need to rephrase?
> explains that kmap_local_page() is perfectly
> usable in atomic context (including interrupts).
>
> I don't know this code, however I am not able to see why these vringh IOTLB
> helpers cannot work if used under spinlocks. Can you please elaborate a little
> more?
>
> > If yes, should we document this? (Or should we introduce a boolean to
> > say whether an IOTLB variant can be used in an atomic context)?
>
> Again, you'll have no problems from the use of kmap_local_page() and so you
> don't need any boolean to tell whether or not the code is running in atomic
> context.
>
> Please take a look at the Highmem documentation which has been recently
> reworked and extended by me: https://docs.kernel.org/mm/highmem.html
>
> Anyway, I have been ATK 12 or 13 hours in a row. So I'm probably missing the
> whole picture.
Thanks for your useful info!
Stefano
On Tue, Mar 14, 2023 at 11:39:42AM +0800, Jason Wang wrote:
>On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]> wrote:
>>
>> These new optional callbacks is used to bind/unbind the device to
>> a specific address space so the vDPA framework can use VA when
>> these callbacks are implemented.
>>
>> Suggested-by: Jason Wang <[email protected]>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>
>One thing that came into my mind is that after this commit:
>
>commit 5ce995f313ce56c0c62425c3ddc37c5c50fc33db
>Author: Jason Wang <[email protected]>
>Date: Fri May 29 16:02:59 2020 +0800
>
> vhost: use mmgrab() instead of mmget() for non worker device
>
> For the device that doesn't use vhost worker and use_mm(), mmget() is
> too heavy weight and it may brings troubles for implementing mmap()
> support for vDPA device.
>
>We don't hold the address space after this commit, so the userspace
>mapping could be invalid if the owner exits?
Thanks for mentioning it, I'll take a look at it!
In case maybe I should do a mmget (or get_task_mm) in vhost-vdpa before
calling the callback, or in the parent driver inside the callback, but
it seems duplicating code.
Thanks,
Stefano
>
>Thanks
>
>>
>> Notes:
>> v2:
>> - removed `struct task_struct *owner` param (unused for now, maybe
>> useful to support cgroups) [Jason]
>> - add unbind_mm callback [Jason]
>>
>> include/linux/vdpa.h | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
>> index 43f59ef10cc9..369c21394284 100644
>> --- a/include/linux/vdpa.h
>> +++ b/include/linux/vdpa.h
>> @@ -290,6 +290,14 @@ struct vdpa_map_file {
>> * @vdev: vdpa device
>> * @idx: virtqueue index
>> * Returns pointer to structure device or error (NULL)
>> + * @bind_mm: Bind the device to a specific address space
>> + * so the vDPA framework can use VA when this
>> + * callback is implemented. (optional)
>> + * @vdev: vdpa device
>> + * @mm: address space to bind
>> + * @unbind_mm: Unbind the device from the address space
>> + * bound using the bind_mm callback. (optional)
>> + * @vdev: vdpa device
>> * @free: Free resources that belongs to vDPA (optional)
>> * @vdev: vdpa device
>> */
>> @@ -351,6 +359,8 @@ struct vdpa_config_ops {
>> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
>> unsigned int asid);
>> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
>> + int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
>> + void (*unbind_mm)(struct vdpa_device *vdev);
>>
>> /* Free device resources */
>> void (*free)(struct vdpa_device *vdev);
>> --
>> 2.39.2
>>
>
On Tue, Mar 14, 2023 at 11:48:33AM +0800, Jason Wang wrote:
>On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]> wrote:
>>
>> When the user call VHOST_SET_OWNER ioctl and the vDPA device
>> has `use_va` set to true, let's call the bind_mm callback.
>> In this way we can bind the device to the user address space
>> and directly use the user VA.
>>
>> The unbind_mm callback is called during the release after
>> stopping the device.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>>
>> Notes:
>> v2:
>> - call the new unbind_mm callback during the release [Jason]
>> - avoid to call bind_mm callback after the reset, since the device
>> is not detaching it now during the reset
>>
>> drivers/vhost/vdpa.c | 30 ++++++++++++++++++++++++++++++
>> 1 file changed, 30 insertions(+)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index dc12dbd5b43b..1ab89fccd825 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
>> return vdpa_reset(vdpa);
>> }
>>
>> +static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
>> +{
>> + struct vdpa_device *vdpa = v->vdpa;
>> + const struct vdpa_config_ops *ops = vdpa->config;
>> +
>> + if (!vdpa->use_va || !ops->bind_mm)
>> + return 0;
>> +
>> + return ops->bind_mm(vdpa, v->vdev.mm);
>> +}
>> +
>> +static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
>> +{
>> + struct vdpa_device *vdpa = v->vdpa;
>> + const struct vdpa_config_ops *ops = vdpa->config;
>> +
>> + if (!vdpa->use_va || !ops->unbind_mm)
>> + return;
>> +
>> + ops->unbind_mm(vdpa);
>> +}
>> +
>> static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
>> {
>> struct vdpa_device *vdpa = v->vdpa;
>> @@ -711,6 +733,13 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>> break;
>> default:
>> r = vhost_dev_ioctl(&v->vdev, cmd, argp);
>> + if (!r && cmd == VHOST_SET_OWNER) {
>> + r = vhost_vdpa_bind_mm(v);
>> + if (r) {
>> + vhost_dev_reset_owner(&v->vdev, NULL);
>> + break;
>> + }
>> + }
>
>Nit: is it better to have a new condition/switch branch instead of
>putting them under default? (as what vring_ioctl did).
Yep, I agree!
I'll change it.
Thanks,
Stefano
On Tue, Mar 14, 2023 at 12:53:57PM +0800, Jason Wang wrote:
>On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
>>
>> vDPA supports the possibility to use user VA in the iotlb messages.
>> So, let's add support for user VA in vringh to use it in the vDPA
>> simulators.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>>
>> Notes:
>> v2:
>> - replace kmap_atomic() with kmap_local_page() [see previous patch]
>> - fix cast warnings when build with W=1 C=1
>>
>> include/linux/vringh.h | 5 +-
>> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>> drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
>> drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
>> 4 files changed, 205 insertions(+), 53 deletions(-)
>>
>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> index 1991a02c6431..d39b9f2dcba0 100644
>> --- a/include/linux/vringh.h
>> +++ b/include/linux/vringh.h
>> @@ -32,6 +32,9 @@ struct vringh {
>> /* Can we get away with weak barriers? */
>> bool weak_barriers;
>>
>> + /* Use user's VA */
>> + bool use_va;
>> +
>> /* Last available index we saw (ie. where we're up to). */
>> u16 last_avail_idx;
>>
>> @@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb,
>> spinlock_t *iotlb_lock);
>>
>> int vringh_init_iotlb(struct vringh *vrh, u64 features,
>> - unsigned int num, bool weak_barriers,
>> + unsigned int num, bool weak_barriers, bool use_va,
>> struct vring_desc *desc,
>> struct vring_avail *avail,
>> struct vring_used *used);
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 3a0e721aef05..babc8dd171a6 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>>
>> if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
>> - MLX5_CVQ_MAX_ENT, false,
>> + MLX5_CVQ_MAX_ENT, false, false,
>> (struct vring_desc *)(uintptr_t)cvq->desc_addr,
>> (struct vring_avail *)(uintptr_t)cvq->driver_addr,
>> (struct vring_used *)(uintptr_t)cvq->device_addr);
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index 6a0a65814626..481eb156658b 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
>> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>> uint16_t last_avail_idx = vq->vring.last_avail_idx;
>>
>> - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
>> + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false,
>> (struct vring_desc *)(uintptr_t)vq->desc_addr,
>> (struct vring_avail *)
>> (uintptr_t)vq->driver_addr,
>> @@ -81,7 +81,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
>> vq->cb = NULL;
>> vq->private = NULL;
>> vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features,
>> - VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
>> + VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
>>
>> vq->vring.notify = NULL;
>> }
>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> index 0ba3ef809e48..61c79cea44ca 100644
>> --- a/drivers/vhost/vringh.c
>> +++ b/drivers/vhost/vringh.c
>> @@ -1094,15 +1094,99 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>>
>> #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
>>
>> -static int iotlb_translate(const struct vringh *vrh,
>> - u64 addr, u64 len, u64 *translated,
>> - struct bio_vec iov[],
>> - int iov_size, u32 perm)
>> +static int iotlb_translate_va(const struct vringh *vrh,
>> + u64 addr, u64 len, u64 *translated,
>> + struct iovec iov[],
>> + int iov_size, u32 perm)
>> {
>> struct vhost_iotlb_map *map;
>> struct vhost_iotlb *iotlb = vrh->iotlb;
>> + u64 s = 0, last = addr + len - 1;
>> int ret = 0;
>> +
>> + spin_lock(vrh->iotlb_lock);
>> +
>> + while (len > s) {
>> + u64 size;
>> +
>> + if (unlikely(ret >= iov_size)) {
>> + ret = -ENOBUFS;
>> + break;
>> + }
>> +
>> + map = vhost_iotlb_itree_first(iotlb, addr, last);
>> + if (!map || map->start > addr) {
>> + ret = -EINVAL;
>> + break;
>> + } else if (!(map->perm & perm)) {
>> + ret = -EPERM;
>> + break;
>> + }
>> +
>> + size = map->size - addr + map->start;
>> + iov[ret].iov_len = min(len - s, size);
>> + iov[ret].iov_base = (void __user *)(unsigned long)
>> + (map->addr + addr - map->start);
>> + s += size;
>> + addr += size;
>> + ++ret;
>> + }
>> +
>> + spin_unlock(vrh->iotlb_lock);
>> +
>> + if (translated)
>> + *translated = min(len, s);
>> +
>> + return ret;
>> +}
>> +
>> +static inline int copy_from_va(const struct vringh *vrh, void *dst, void *src,
>> + u64 len, u64 *translated)
>> +{
>> + struct iovec iov[16];
>> + struct iov_iter iter;
>> + int ret;
>> +
>> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)src, len, translated, iov,
>> + ARRAY_SIZE(iov), VHOST_MAP_RO);
>> + if (ret == -ENOBUFS)
>> + ret = ARRAY_SIZE(iov);
>> + else if (ret < 0)
>> + return ret;
>> +
>> + iov_iter_init(&iter, ITER_SOURCE, iov, ret, *translated);
>> +
>> + return copy_from_iter(dst, *translated, &iter);
>> +}
>> +
>> +static inline int copy_to_va(const struct vringh *vrh, void *dst, void *src,
>> + u64 len, u64 *translated)
>> +{
>> + struct iovec iov[16];
>> + struct iov_iter iter;
>> + int ret;
>> +
>> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)dst, len, translated, iov,
>> + ARRAY_SIZE(iov), VHOST_MAP_WO);
>> + if (ret == -ENOBUFS)
>> + ret = ARRAY_SIZE(iov);
>> + else if (ret < 0)
>> + return ret;
>> +
>> + iov_iter_init(&iter, ITER_DEST, iov, ret, *translated);
>> +
>> + return copy_to_iter(src, *translated, &iter);
>> +}
>> +
>> +static int iotlb_translate_pa(const struct vringh *vrh,
>> + u64 addr, u64 len, u64 *translated,
>> + struct bio_vec iov[],
>> + int iov_size, u32 perm)
>> +{
>> + struct vhost_iotlb_map *map;
>> + struct vhost_iotlb *iotlb = vrh->iotlb;
>> u64 s = 0, last = addr + len - 1;
>> + int ret = 0;
>>
>> spin_lock(vrh->iotlb_lock);
>>
>> @@ -1141,28 +1225,61 @@ static int iotlb_translate(const struct vringh *vrh,
>> return ret;
>> }
>>
>> +static inline int copy_from_pa(const struct vringh *vrh, void *dst, void *src,
>> + u64 len, u64 *translated)
>> +{
>> + struct bio_vec iov[16];
>> + struct iov_iter iter;
>> + int ret;
>> +
>> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)src, len, translated, iov,
>> + ARRAY_SIZE(iov), VHOST_MAP_RO);
>> + if (ret == -ENOBUFS)
>> + ret = ARRAY_SIZE(iov);
>> + else if (ret < 0)
>> + return ret;
>> +
>> + iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, *translated);
>> +
>> + return copy_from_iter(dst, *translated, &iter);
>> +}
>> +
>> +static inline int copy_to_pa(const struct vringh *vrh, void *dst, void *src,
>> + u64 len, u64 *translated)
>> +{
>> + struct bio_vec iov[16];
>> + struct iov_iter iter;
>> + int ret;
>> +
>> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)dst, len, translated, iov,
>> + ARRAY_SIZE(iov), VHOST_MAP_WO);
>> + if (ret == -ENOBUFS)
>> + ret = ARRAY_SIZE(iov);
>> + else if (ret < 0)
>> + return ret;
>> +
>> + iov_iter_bvec(&iter, ITER_DEST, iov, ret, *translated);
>> +
>> + return copy_to_iter(src, *translated, &iter);
>> +}
>> +
>> static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
>> void *src, size_t len)
>> {
>> u64 total_translated = 0;
>>
>> while (total_translated < len) {
>> - struct bio_vec iov[16];
>> - struct iov_iter iter;
>> u64 translated;
>> int ret;
>>
>> - ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
>> - len - total_translated, &translated,
>> - iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
>> - if (ret == -ENOBUFS)
>> - ret = ARRAY_SIZE(iov);
>> - else if (ret < 0)
>> - return ret;
>> -
>> - iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
>> + if (vrh->use_va) {
>> + ret = copy_from_va(vrh, dst, src,
>> + len - total_translated, &translated);
>> + } else {
>> + ret = copy_from_pa(vrh, dst, src,
>> + len - total_translated, &translated);
>> + }
>>
>> - ret = copy_from_iter(dst, translated, &iter);
>> if (ret < 0)
>> return ret;
>>
>> @@ -1180,22 +1297,17 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>> u64 total_translated = 0;
>>
>> while (total_translated < len) {
>> - struct bio_vec iov[16];
>> - struct iov_iter iter;
>> u64 translated;
>> int ret;
>>
>> - ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
>> - len - total_translated, &translated,
>> - iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
>> - if (ret == -ENOBUFS)
>> - ret = ARRAY_SIZE(iov);
>> - else if (ret < 0)
>> - return ret;
>> -
>> - iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
>> + if (vrh->use_va) {
>> + ret = copy_to_va(vrh, dst, src,
>> + len - total_translated, &translated);
>> + } else {
>> + ret = copy_to_pa(vrh, dst, src,
>> + len - total_translated, &translated);
>> + }
>>
>> - ret = copy_to_iter(src, translated, &iter);
>> if (ret < 0)
>> return ret;
>>
>> @@ -1210,20 +1322,37 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
>> static inline int getu16_iotlb(const struct vringh *vrh,
>> u16 *val, const __virtio16 *p)
>> {
>> - struct bio_vec iov;
>> - void *kaddr, *from;
>> int ret;
>>
>> /* Atomic read is needed for getu16 */
>> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
>> - &iov, 1, VHOST_MAP_RO);
>> - if (ret < 0)
>> - return ret;
>> + if (vrh->use_va) {
>> + struct iovec iov;
>> + __virtio16 tmp;
>> +
>> + ret = iotlb_translate_va(vrh, (u64)(uintptr_t)p, sizeof(*p),
>> + NULL, &iov, 1, VHOST_MAP_RO);
>> + if (ret < 0)
>> + return ret;
>
>Nit: since we have copy_to_va/copy_to_pa variants, let's introduce
>getu16_iotlb_va/pa variants?
Yep!
>
>>
>> - kaddr = kmap_local_page(iov.bv_page);
>> - from = kaddr + iov.bv_offset;
>> - *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
>> - kunmap_local(kaddr);
>> + ret = __get_user(tmp, (__virtio16 __user *)iov.iov_base);
>> + if (ret)
>> + return ret;
>> +
>> + *val = vringh16_to_cpu(vrh, tmp);
>> + } else {
>> + struct bio_vec iov;
>> + void *kaddr, *from;
>> +
>> + ret = iotlb_translate_pa(vrh, (u64)(uintptr_t)p, sizeof(*p),
>> + NULL, &iov, 1, VHOST_MAP_RO);
>> + if (ret < 0)
>> + return ret;
>> +
>> + kaddr = kmap_local_page(iov.bv_page);
>
>If we decide to have a use_va switch, is kmap_local_page() still required here?
>
I think yes. This is related to the email where Fabio clarified for us,
right?
>Other looks good.
Thanks,
Stefano
On Tue, Mar 14, 2023 at 01:31:25PM +0800, Jason Wang wrote:
>On Tue, Mar 14, 2023 at 1:29 PM Jason Wang <[email protected]> wrote:
>>
>> On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
>> >
>> > The spinlock we use to protect the state of the simulator is sometimes
>> > held for a long time (for example, when devices handle requests).
>> >
>> > This also prevents us from calling functions that might sleep (such as
>> > kthread_flush_work() in the next patch), and thus having to release
>> > and retake the lock.
>> >
>> > For these reasons, let's replace the spinlock with a mutex that gives
>> > us more flexibility.
>> >
>> > Suggested-by: Jason Wang <[email protected]>
>> > Signed-off-by: Stefano Garzarella <[email protected]>
>>
>> Acked-by: Jason Wang <[email protected]>
>>
>> Thanks
>
>Btw, though it looks fine but we'd better double confirm virtio_vdpa works well.
I tested it, but I will do it more carefully to make sure everything
is okay.
>
>(I think so since there's transport that might sleep).
I see.
Thanks,
Stefano
On Tue, Mar 14, 2023 at 01:36:13PM +0800, Jason Wang wrote:
>On Thu, Mar 2, 2023 at 7:35 PM Stefano Garzarella <[email protected]> wrote:
>>
>> The new "use_va" module parameter (default: false) is used in
>> vdpa_alloc_device() to inform the vDPA framework that the device
>> supports VA.
>>
>> vringh is initialized to use VA only when "use_va" is true and the
>> user's mm has been bound. So, only when the bus supports user VA
>> (e.g. vhost-vdpa).
>>
>> vdpasim_mm_work_fn work is used to attach the kthread to the user
>> address space when the .bind_mm callback is invoked, and to detach
>> it when the .unbind_mm callback is invoked.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>>
>> Notes:
>> v2:
>> - `use_va` set to true by default [Eugenio]
>> - supported the new unbind_mm callback [Jason]
>> - removed the unbind_mm call in vdpasim_do_reset() [Jason]
>> - avoided to release the lock while call kthread_flush_work() since we
>> are now using a mutex to protect the device state
>>
>> drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
>> drivers/vdpa/vdpa_sim/vdpa_sim.c | 98 +++++++++++++++++++++++++++++++-
>> 2 files changed, 97 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> index 4774292fba8c..3a42887d05d9 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
>> @@ -59,6 +59,7 @@ struct vdpasim {
>> struct vdpasim_virtqueue *vqs;
>> struct kthread_worker *worker;
>> struct kthread_work work;
>> + struct mm_struct *mm_bound;
>> struct vdpasim_dev_attr dev_attr;
>> /* mutex to synchronize virtqueue state */
>> struct mutex mutex;
>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> index a28103a67ae7..eda26bc33df5 100644
>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
>> @@ -35,10 +35,77 @@ module_param(max_iotlb_entries, int, 0444);
>> MODULE_PARM_DESC(max_iotlb_entries,
>> "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)");
>>
>> +static bool use_va = true;
>> +module_param(use_va, bool, 0444);
>> +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
>> +
>> #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
>> #define VDPASIM_QUEUE_MAX 256
>> #define VDPASIM_VENDOR_ID 0
>>
>> +struct vdpasim_mm_work {
>> + struct kthread_work work;
>> + struct mm_struct *mm;
>> + bool bind;
>> + int ret;
>> +};
>> +
>> +static void vdpasim_mm_work_fn(struct kthread_work *work)
>> +{
>> + struct vdpasim_mm_work *mm_work =
>> + container_of(work, struct vdpasim_mm_work, work);
>> +
>> + mm_work->ret = 0;
>> +
>> + if (mm_work->bind) {
>> + kthread_use_mm(mm_work->mm);
>> + //TODO: should we attach the cgroup of the mm owner?
>> + } else {
>> + kthread_unuse_mm(mm_work->mm);
>> + }
>> +}
>> +
>> +static void vdpasim_worker_queue_mm(struct vdpasim *vdpasim,
>> + struct vdpasim_mm_work *mm_work)
>> +{
>
>Nit: we need to tweak the name as it does flush besides queuing the work.
Yep, or split in 2 functions.
>
>> + struct kthread_work *work = &mm_work->work;
>> +
>> + kthread_init_work(work, vdpasim_mm_work_fn);
>> + kthread_queue_work(vdpasim->worker, work);
>> +
>> + kthread_flush_work(work);
>> +}
>> +
>> +static int vdpasim_worker_bind_mm(struct vdpasim *vdpasim,
>> + struct mm_struct *new_mm)
>> +{
>> + struct vdpasim_mm_work mm_work;
>> +
>> + mm_work.mm = new_mm;
>> + mm_work.bind = true;
>> +
>> + vdpasim_worker_queue_mm(vdpasim, &mm_work);
>> +
>> + if (!mm_work.ret)
>> + vdpasim->mm_bound = new_mm;
>> +
>> + return mm_work.ret;
>> +}
>> +
>> +static void vdpasim_worker_unbind_mm(struct vdpasim *vdpasim)
>> +{
>> + struct vdpasim_mm_work mm_work;
>> +
>> + if (!vdpasim->mm_bound)
>> + return;
>> +
>> + mm_work.mm = vdpasim->mm_bound;
>> + mm_work.bind = false;
>
>Can we simply use mm_work.mm = NULL for unbinding?
>
>> +
>> + vdpasim_worker_queue_mm(vdpasim, &mm_work);
>> +
>> + vdpasim->mm_bound = NULL;
>
>And change the mm_bound in the worker?
Yep, I need to put `vdpasim` in struct vdpasim_mm_work.
I'll do in the next version.
Thanks,
Stefano
On gioved? 2 marzo 2023 12:34:16 CET Stefano Garzarella wrote:
> kmap_atomic() is deprecated in favor of kmap_local_page().
>
> With kmap_local_page() the mappings are per thread, CPU local, can take
> page-faults, and can be called from any context (including interrupts).
> Furthermore, the tasks can be preempted and, when they are scheduled to
> run again, the kernel virtual addresses are restored and still valid.
>
> kmap_atomic() is implemented like a kmap_local_page() which also disables
> page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> otherwise it only disables migration).
>
> The code within the mappings/un-mappings in getu16_iotlb() and
> putu16_iotlb() don't depend on the above-mentioned side effects of
> kmap_atomic(), so that mere replacements of the old API with the new one
> is all that is required (i.e., there is no need to explicitly add calls
> to pagefault_disable() and/or preempt_disable()).
It seems that my commit message is quite clear and complete and therefore has
already been reused by others who have somehow given me credit.
I would really appreciate it being mentioned here that you are reusing a
"boiler plate" commit message of my own making and Cc me :-)
Thanks,
Fabio
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
>
> Notes:
> v2:
> - added this patch since checkpatch.pl complained about deprecation
> of kmap_atomic() touched by next patch
>
> drivers/vhost/vringh.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index a1e27da54481..0ba3ef809e48 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh
> *vrh, if (ret < 0)
> return ret;
>
> - kaddr = kmap_atomic(iov.bv_page);
> + kaddr = kmap_local_page(iov.bv_page);
> from = kaddr + iov.bv_offset;
> *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>
> return 0;
> }
> @@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh
> *vrh, if (ret < 0)
> return ret;
>
> - kaddr = kmap_atomic(iov.bv_page);
> + kaddr = kmap_local_page(iov.bv_page);
> to = kaddr + iov.bv_offset;
> WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>
> return 0;
> }
> --
> 2.39.2
On Thu, Mar 16, 2023 at 10:13:39AM +0100, Fabio M. De Francesco wrote:
>On gioved? 2 marzo 2023 12:34:16 CET Stefano Garzarella wrote:
>> kmap_atomic() is deprecated in favor of kmap_local_page().
>>
>> With kmap_local_page() the mappings are per thread, CPU local, can take
>> page-faults, and can be called from any context (including interrupts).
>> Furthermore, the tasks can be preempted and, when they are scheduled to
>> run again, the kernel virtual addresses are restored and still valid.
>>
>> kmap_atomic() is implemented like a kmap_local_page() which also disables
>> page-faults and preemption (the latter only for !PREEMPT_RT kernels,
>> otherwise it only disables migration).
>>
>> The code within the mappings/un-mappings in getu16_iotlb() and
>> putu16_iotlb() don't depend on the above-mentioned side effects of
>> kmap_atomic(), so that mere replacements of the old API with the new one
>> is all that is required (i.e., there is no need to explicitly add calls
>> to pagefault_disable() and/or preempt_disable()).
>
>It seems that my commit message is quite clear and complete and therefore has
>already been reused by others who have somehow given me credit.
>
>I would really appreciate it being mentioned here that you are reusing a
>"boiler plate" commit message of my own making and Cc me :-)
Yes of course, sorry for not doing this previously!
Thanks,
Stefano
On giovedì 16 marzo 2023 09:09:29 CET Stefano Garzarella wrote:
> On Wed, Mar 15, 2023 at 10:12 PM Fabio M. De Francesco
>
> <[email protected]> wrote:
> > On martedì 14 marzo 2023 04:56:08 CET Jason Wang wrote:
> > > On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]>
> >
> > wrote:
> > > > kmap_atomic() is deprecated in favor of kmap_local_page().
> > >
> > > It's better to mention the commit or code that introduces this.
> > >
> > > > With kmap_local_page() the mappings are per thread, CPU local, can
take
> > > > page-faults, and can be called from any context (including
interrupts).
> > > > Furthermore, the tasks can be preempted and, when they are scheduled
to
> > > > run again, the kernel virtual addresses are restored and still valid.
> > > >
> > > > kmap_atomic() is implemented like a kmap_local_page() which also
> > > > disables
> > > > page-faults and preemption (the latter only for !PREEMPT_RT kernels,
> > > > otherwise it only disables migration).
> > > >
> > > > The code within the mappings/un-mappings in getu16_iotlb() and
> > > > putu16_iotlb() don't depend on the above-mentioned side effects of
> > > > kmap_atomic(),
> > >
> > > Note we used to use spinlock to protect simulators (at least until
> > > patch 7, so we probably need to re-order the patches at least) so I
> > > think this is only valid when:
> > >
> > > The vringh IOTLB helpers are not used in atomic context (e.g spinlock,
> > > interrupts).
> >
> > I'm probably missing some context but it looks that you are saying that
> > kmap_local_page() is not suited for any use in atomic context (you are
> > mentioning spinlocks).
> >
> > The commit message (that I know pretty well since it's the exact copy,
word
> > by word, of my boiler plate commits)
>
> I hope it's not a problem for you, should I mention it somehow?
Sorry, I had missed your last message when I wrote a another message few
minutes ago in this thread.
Obviously, I'm happy that my commit message it's being reused. As I said in
the other message I would appreciate some kind of crediting me as the author.
I proposed a means you can use, but feel free to ignore my suggestion and do
differently if you prefer to.
Again thanks,
Fabio
> I searched for the last commits that made a similar change and found
> yours that explained it perfectly ;-)
>
> Do I need to rephrase?
>
> > explains that kmap_local_page() is perfectly
> > usable in atomic context (including interrupts).
> >
> > I don't know this code, however I am not able to see why these vringh
IOTLB
> > helpers cannot work if used under spinlocks. Can you please elaborate a
> > little more?
> >
> > > If yes, should we document this? (Or should we introduce a boolean to
> > > say whether an IOTLB variant can be used in an atomic context)?
> >
> > Again, you'll have no problems from the use of kmap_local_page() and so
you
> > don't need any boolean to tell whether or not the code is running in
atomic
> > context.
> >
> > Please take a look at the Highmem documentation which has been recently
> > reworked and extended by me: https://docs.kernel.org/mm/highmem.html
> >
> > Anyway, I have been ATK 12 or 13 hours in a row. So I'm probably missing
the
> > whole picture.
>
> Thanks for your useful info!
> Stefano
On Thu, Mar 16, 2023 at 9:31 AM Stefano Garzarella <[email protected]> wrote:
>
> On Tue, Mar 14, 2023 at 11:48:33AM +0800, Jason Wang wrote:
> >On Thu, Mar 2, 2023 at 7:34 PM Stefano Garzarella <[email protected]> wrote:
> >>
> >> When the user call VHOST_SET_OWNER ioctl and the vDPA device
> >> has `use_va` set to true, let's call the bind_mm callback.
> >> In this way we can bind the device to the user address space
> >> and directly use the user VA.
> >>
> >> The unbind_mm callback is called during the release after
> >> stopping the device.
> >>
> >> Signed-off-by: Stefano Garzarella <[email protected]>
> >> ---
> >>
> >> Notes:
> >> v2:
> >> - call the new unbind_mm callback during the release [Jason]
> >> - avoid to call bind_mm callback after the reset, since the device
> >> is not detaching it now during the reset
> >>
> >> drivers/vhost/vdpa.c | 30 ++++++++++++++++++++++++++++++
> >> 1 file changed, 30 insertions(+)
> >>
> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >> index dc12dbd5b43b..1ab89fccd825 100644
> >> --- a/drivers/vhost/vdpa.c
> >> +++ b/drivers/vhost/vdpa.c
> >> @@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
> >> return vdpa_reset(vdpa);
> >> }
> >>
> >> +static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> >> +{
> >> + struct vdpa_device *vdpa = v->vdpa;
> >> + const struct vdpa_config_ops *ops = vdpa->config;
> >> +
> >> + if (!vdpa->use_va || !ops->bind_mm)
> >> + return 0;
> >> +
> >> + return ops->bind_mm(vdpa, v->vdev.mm);
> >> +}
> >> +
> >> +static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
> >> +{
> >> + struct vdpa_device *vdpa = v->vdpa;
> >> + const struct vdpa_config_ops *ops = vdpa->config;
> >> +
> >> + if (!vdpa->use_va || !ops->unbind_mm)
> >> + return;
> >> +
> >> + ops->unbind_mm(vdpa);
> >> +}
> >> +
> >> static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
> >> {
> >> struct vdpa_device *vdpa = v->vdpa;
> >> @@ -711,6 +733,13 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> >> break;
> >> default:
> >> r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> >> + if (!r && cmd == VHOST_SET_OWNER) {
> >> + r = vhost_vdpa_bind_mm(v);
> >> + if (r) {
> >> + vhost_dev_reset_owner(&v->vdev, NULL);
> >> + break;
> >> + }
> >> + }
> >
> >Nit: is it better to have a new condition/switch branch instead of
> >putting them under default? (as what vring_ioctl did).
>
> Yep, I agree!
>
> I'll change it.
Or maybe I can simply add `case VHOST_SET_OWNER` on this switch and call
vhost_dev_set_owner() and vhost_vdpa_bind_mm(), I mean something like
this:
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 331d4a718bf6..20250c3418b2 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -731,15 +731,16 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_RESUME:
r = vhost_vdpa_resume(v);
break;
+ case VHOST_SET_OWNER:
+ r = vhost_dev_set_owner(d);
+ if (r)
+ break;
+ r = vhost_vdpa_bind_mm(v);
+ if (r)
+ vhost_dev_reset_owner(d, NULL);
+ break;
default:
r = vhost_dev_ioctl(&v->vdev, cmd, argp);
- if (!r && cmd == VHOST_SET_OWNER) {
- r = vhost_vdpa_bind_mm(v);
- if (r) {
- vhost_dev_reset_owner(&v->vdev, NULL);
- break;
- }
- }
if (r == -ENOIOCTLCMD)
r = vhost_vdpa_vring_ioctl(v, cmd, argp);
break;
WDYT?
Thanks,
Stefano
On Fri, Mar 3, 2023 at 3:39 PM Eugenio Perez Martin <[email protected]> wrote:
>
> On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <[email protected]> wrote:
> >
> > vDPA supports the possibility to use user VA in the iotlb messages.
> > So, let's add support for user VA in vringh to use it in the vDPA
> > simulators.
> >
> > Signed-off-by: Stefano Garzarella <[email protected]>
> > ---
> >
> > Notes:
> > v2:
> > - replace kmap_atomic() with kmap_local_page() [see previous patch]
> > - fix cast warnings when build with W=1 C=1
> >
> > include/linux/vringh.h | 5 +-
> > drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> > drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
> > drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
> > 4 files changed, 205 insertions(+), 53 deletions(-)
> >
[...]
>
> It seems to me iotlb_translate_va and iotlb_translate_pa are very
> similar, their only difference is that the argument is that iov is
> iovec instead of bio_vec. And how to fill it, obviously.
>
> It would be great to merge both functions, only differing with a
> conditional on vrh->use_va, or generics, or similar. Or, if following
> the style of the rest of vringh code, to provide a callback to fill
> iovec (although I like conditional more).
>
> However I cannot think of an easy way to perform that without long
> macros or type erasure.
Thank you for pushing me :-)
I finally managed to avoid code duplication (partial patch attached,
but not yet fully tested).
@Jason: with this refactoring I removed copy_to_va/copy_to_pa, so I
also avoided getu16_iotlb_va/pa.
I will send the full patch in v3, but I would like to get your opinion
first ;-)
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 0ba3ef809e48..71dd67700e36 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1096,8 +1096,7 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
static int iotlb_translate(const struct vringh *vrh,
u64 addr, u64 len, u64 *translated,
- struct bio_vec iov[],
- int iov_size, u32 perm)
+ void *iov, int iov_size, bool iovec, u32 perm)
{
struct vhost_iotlb_map *map;
struct vhost_iotlb *iotlb = vrh->iotlb;
@@ -1107,7 +1106,7 @@ static int iotlb_translate(const struct vringh *vrh,
spin_lock(vrh->iotlb_lock);
while (len > s) {
- u64 size, pa, pfn;
+ u64 size;
if (unlikely(ret >= iov_size)) {
ret = -ENOBUFS;
@@ -1124,10 +1123,22 @@ static int iotlb_translate(const struct vringh *vrh,
}
size = map->size - addr + map->start;
- pa = map->addr + addr - map->start;
- pfn = pa >> PAGE_SHIFT;
- bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
- pa & (PAGE_SIZE - 1));
+ if (iovec) {
+ struct iovec *iovec = iov;
+
+ iovec[ret].iov_len = min(len - s, size);
+ iovec[ret].iov_base = (void __user *)(unsigned long)
+ (map->addr + addr - map->start);
+ } else {
+ u64 pa = map->addr + addr - map->start;
+ u64 pfn = pa >> PAGE_SHIFT;
+ struct bio_vec *bvec = iov;
+
+ bvec_set_page(&bvec[ret], pfn_to_page(pfn),
+ min(len - s, size),
+ pa & (PAGE_SIZE - 1));
+ }
+
s += size;
addr += size;
++ret;
@@ -1141,26 +1152,38 @@ static int iotlb_translate(const struct vringh *vrh,
return ret;
}
+#define IOTLB_IOV_SIZE 16
+
static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
void *src, size_t len)
{
u64 total_translated = 0;
while (total_translated < len) {
- struct bio_vec iov[16];
+ union {
+ struct iovec iovec[IOTLB_IOV_SIZE];
+ struct bio_vec bvec[IOTLB_IOV_SIZE];
+ } iov;
struct iov_iter iter;
u64 translated;
int ret;
ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
len - total_translated, &translated,
- iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
+ &iov, IOTLB_IOV_SIZE, vrh->use_va,
+ VHOST_MAP_RO);
if (ret == -ENOBUFS)
- ret = ARRAY_SIZE(iov);
+ ret = IOTLB_IOV_SIZE;
else if (ret < 0)
return ret;
- iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
+ if (vrh->use_va) {
+ iov_iter_init(&iter, ITER_SOURCE, iov.iovec, ret,
+ translated);
+ } else {
+ iov_iter_bvec(&iter, ITER_SOURCE, iov.bvec, ret,
+ translated);
+ }
ret = copy_from_iter(dst, translated, &iter);
if (ret < 0)
@@ -1180,20 +1203,30 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
u64 total_translated = 0;
while (total_translated < len) {
- struct bio_vec iov[16];
+ union {
+ struct iovec iovec[IOTLB_IOV_SIZE];
+ struct bio_vec bvec[IOTLB_IOV_SIZE];
+ } iov;
struct iov_iter iter;
u64 translated;
int ret;
ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
len - total_translated, &translated,
- iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
+ &iov, IOTLB_IOV_SIZE, vrh->use_va,
+ VHOST_MAP_WO);
if (ret == -ENOBUFS)
- ret = ARRAY_SIZE(iov);
+ ret = IOTLB_IOV_SIZE;
else if (ret < 0)
return ret;
- iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
+ if (vrh->use_va) {
+ iov_iter_init(&iter, ITER_DEST, iov.iovec, ret,
+ translated);
+ } else {
+ iov_iter_bvec(&iter, ITER_DEST, iov.bvec, ret,
+ translated);
+ }
ret = copy_to_iter(src, translated, &iter);
if (ret < 0)
@@ -1210,20 +1243,32 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
static inline int getu16_iotlb(const struct vringh *vrh,
u16 *val, const __virtio16 *p)
{
- struct bio_vec iov;
- void *kaddr, *from;
+ union {
+ struct iovec iovec;
+ struct bio_vec bvec;
+ } iov;
+ __virtio16 tmp;
int ret;
/* Atomic read is needed for getu16 */
- ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
- &iov, 1, VHOST_MAP_RO);
+ ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
+ NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
if (ret < 0)
return ret;
- kaddr = kmap_local_page(iov.bv_page);
- from = kaddr + iov.bv_offset;
- *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
- kunmap_local(kaddr);
+ if (vrh->use_va) {
+ ret = __get_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
+ if (ret)
+ return ret;
+ } else {
+ void *kaddr = kmap_local_page(iov.bvec.bv_page);
+ void *from = kaddr + iov.bvec.bv_offset;
+
+ tmp = READ_ONCE(*(__virtio16 *)from);
+ kunmap_local(kaddr);
+ }
+
+ *val = vringh16_to_cpu(vrh, tmp);
return 0;
}
@@ -1231,20 +1276,32 @@ static inline int getu16_iotlb(const struct vringh *vrh,
static inline int putu16_iotlb(const struct vringh *vrh,
__virtio16 *p, u16 val)
{
- struct bio_vec iov;
- void *kaddr, *to;
+ union {
+ struct iovec iovec;
+ struct bio_vec bvec;
+ } iov;
+ __virtio16 tmp;
int ret;
/* Atomic write is needed for putu16 */
- ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
- &iov, 1, VHOST_MAP_WO);
+ ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
+ NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
if (ret < 0)
return ret;
- kaddr = kmap_local_page(iov.bv_page);
- to = kaddr + iov.bv_offset;
- WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
- kunmap_local(kaddr);
+ tmp = cpu_to_vringh16(vrh, val);
+
+ if (vrh->use_va) {
+ ret = __put_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
+ if (ret)
+ return ret;
+ } else {
+ void *kaddr = kmap_local_page(iov.bvec.bv_page);
+ void *to = kaddr + iov.bvec.bv_offset;
+
+ WRITE_ONCE(*(__virtio16 *)to, tmp);
+ kunmap_local(kaddr);
+ }
return 0;
}
On Fri, Mar 17, 2023 at 12:07 AM Stefano Garzarella <[email protected]> wrote:
>
> On Fri, Mar 3, 2023 at 3:39 PM Eugenio Perez Martin <[email protected]> wrote:
> >
> > On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <[email protected]> wrote:
> > >
> > > vDPA supports the possibility to use user VA in the iotlb messages.
> > > So, let's add support for user VA in vringh to use it in the vDPA
> > > simulators.
> > >
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > ---
> > >
> > > Notes:
> > > v2:
> > > - replace kmap_atomic() with kmap_local_page() [see previous patch]
> > > - fix cast warnings when build with W=1 C=1
> > >
> > > include/linux/vringh.h | 5 +-
> > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> > > drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
> > > drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
> > > 4 files changed, 205 insertions(+), 53 deletions(-)
> > >
>
> [...]
>
> >
> > It seems to me iotlb_translate_va and iotlb_translate_pa are very
> > similar, their only difference is that the argument is that iov is
> > iovec instead of bio_vec. And how to fill it, obviously.
> >
> > It would be great to merge both functions, only differing with a
> > conditional on vrh->use_va, or generics, or similar. Or, if following
> > the style of the rest of vringh code, to provide a callback to fill
> > iovec (although I like conditional more).
> >
> > However I cannot think of an easy way to perform that without long
> > macros or type erasure.
>
> Thank you for pushing me :-)
> I finally managed to avoid code duplication (partial patch attached,
> but not yet fully tested).
>
> @Jason: with this refactoring I removed copy_to_va/copy_to_pa, so I
> also avoided getu16_iotlb_va/pa.
>
> I will send the full patch in v3, but I would like to get your opinion
> first ;-)
Fine with me.
Thanks
>
>
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..71dd67700e36 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1096,8 +1096,7 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
> static int iotlb_translate(const struct vringh *vrh,
> u64 addr, u64 len, u64 *translated,
> - struct bio_vec iov[],
> - int iov_size, u32 perm)
> + void *iov, int iov_size, bool iovec, u32 perm)
> {
> struct vhost_iotlb_map *map;
> struct vhost_iotlb *iotlb = vrh->iotlb;
> @@ -1107,7 +1106,7 @@ static int iotlb_translate(const struct vringh *vrh,
> spin_lock(vrh->iotlb_lock);
>
> while (len > s) {
> - u64 size, pa, pfn;
> + u64 size;
>
> if (unlikely(ret >= iov_size)) {
> ret = -ENOBUFS;
> @@ -1124,10 +1123,22 @@ static int iotlb_translate(const struct vringh *vrh,
> }
>
> size = map->size - addr + map->start;
> - pa = map->addr + addr - map->start;
> - pfn = pa >> PAGE_SHIFT;
> - bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
> - pa & (PAGE_SIZE - 1));
> + if (iovec) {
> + struct iovec *iovec = iov;
> +
> + iovec[ret].iov_len = min(len - s, size);
> + iovec[ret].iov_base = (void __user *)(unsigned long)
> + (map->addr + addr - map->start);
> + } else {
> + u64 pa = map->addr + addr - map->start;
> + u64 pfn = pa >> PAGE_SHIFT;
> + struct bio_vec *bvec = iov;
> +
> + bvec_set_page(&bvec[ret], pfn_to_page(pfn),
> + min(len - s, size),
> + pa & (PAGE_SIZE - 1));
> + }
> +
> s += size;
> addr += size;
> ++ret;
> @@ -1141,26 +1152,38 @@ static int iotlb_translate(const struct vringh *vrh,
> return ret;
> }
>
> +#define IOTLB_IOV_SIZE 16
> +
> static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
> void *src, size_t len)
> {
> u64 total_translated = 0;
>
> while (total_translated < len) {
> - struct bio_vec iov[16];
> + union {
> + struct iovec iovec[IOTLB_IOV_SIZE];
> + struct bio_vec bvec[IOTLB_IOV_SIZE];
> + } iov;
> struct iov_iter iter;
> u64 translated;
> int ret;
>
> ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
> len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> + &iov, IOTLB_IOV_SIZE, vrh->use_va,
> + VHOST_MAP_RO);
> if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> + ret = IOTLB_IOV_SIZE;
> else if (ret < 0)
> return ret;
>
> - iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> + if (vrh->use_va) {
> + iov_iter_init(&iter, ITER_SOURCE, iov.iovec, ret,
> + translated);
> + } else {
> + iov_iter_bvec(&iter, ITER_SOURCE, iov.bvec, ret,
> + translated);
> + }
>
> ret = copy_from_iter(dst, translated, &iter);
> if (ret < 0)
> @@ -1180,20 +1203,30 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> u64 total_translated = 0;
>
> while (total_translated < len) {
> - struct bio_vec iov[16];
> + union {
> + struct iovec iovec[IOTLB_IOV_SIZE];
> + struct bio_vec bvec[IOTLB_IOV_SIZE];
> + } iov;
> struct iov_iter iter;
> u64 translated;
> int ret;
>
> ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
> len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> + &iov, IOTLB_IOV_SIZE, vrh->use_va,
> + VHOST_MAP_WO);
> if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> + ret = IOTLB_IOV_SIZE;
> else if (ret < 0)
> return ret;
>
> - iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> + if (vrh->use_va) {
> + iov_iter_init(&iter, ITER_DEST, iov.iovec, ret,
> + translated);
> + } else {
> + iov_iter_bvec(&iter, ITER_DEST, iov.bvec, ret,
> + translated);
> + }
>
> ret = copy_to_iter(src, translated, &iter);
> if (ret < 0)
> @@ -1210,20 +1243,32 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> static inline int getu16_iotlb(const struct vringh *vrh,
> u16 *val, const __virtio16 *p)
> {
> - struct bio_vec iov;
> - void *kaddr, *from;
> + union {
> + struct iovec iovec;
> + struct bio_vec bvec;
> + } iov;
> + __virtio16 tmp;
> int ret;
>
> /* Atomic read is needed for getu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_RO);
> + ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - from = kaddr + iov.bv_offset;
> - *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_local(kaddr);
> + if (vrh->use_va) {
> + ret = __get_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
> + if (ret)
> + return ret;
> + } else {
> + void *kaddr = kmap_local_page(iov.bvec.bv_page);
> + void *from = kaddr + iov.bvec.bv_offset;
> +
> + tmp = READ_ONCE(*(__virtio16 *)from);
> + kunmap_local(kaddr);
> + }
> +
> + *val = vringh16_to_cpu(vrh, tmp);
>
> return 0;
> }
> @@ -1231,20 +1276,32 @@ static inline int getu16_iotlb(const struct vringh *vrh,
> static inline int putu16_iotlb(const struct vringh *vrh,
> __virtio16 *p, u16 val)
> {
> - struct bio_vec iov;
> - void *kaddr, *to;
> + union {
> + struct iovec iovec;
> + struct bio_vec bvec;
> + } iov;
> + __virtio16 tmp;
> int ret;
>
> /* Atomic write is needed for putu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_WO);
> + ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - to = kaddr + iov.bv_offset;
> - WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_local(kaddr);
> + tmp = cpu_to_vringh16(vrh, val);
> +
> + if (vrh->use_va) {
> + ret = __put_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
> + if (ret)
> + return ret;
> + } else {
> + void *kaddr = kmap_local_page(iov.bvec.bv_page);
> + void *to = kaddr + iov.bvec.bv_offset;
> +
> + WRITE_ONCE(*(__virtio16 *)to, tmp);
> + kunmap_local(kaddr);
> + }
>
> return 0;
> }
>
On Thu, Mar 16, 2023 at 5:07 PM Stefano Garzarella <[email protected]> wrote:
>
> On Fri, Mar 3, 2023 at 3:39 PM Eugenio Perez Martin <[email protected]> wrote:
> >
> > On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <[email protected]> wrote:
> > >
> > > vDPA supports the possibility to use user VA in the iotlb messages.
> > > So, let's add support for user VA in vringh to use it in the vDPA
> > > simulators.
> > >
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > ---
> > >
> > > Notes:
> > > v2:
> > > - replace kmap_atomic() with kmap_local_page() [see previous patch]
> > > - fix cast warnings when build with W=1 C=1
> > >
> > > include/linux/vringh.h | 5 +-
> > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> > > drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
> > > drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
> > > 4 files changed, 205 insertions(+), 53 deletions(-)
> > >
>
> [...]
>
> >
> > It seems to me iotlb_translate_va and iotlb_translate_pa are very
> > similar, their only difference is that the argument is that iov is
> > iovec instead of bio_vec. And how to fill it, obviously.
> >
> > It would be great to merge both functions, only differing with a
> > conditional on vrh->use_va, or generics, or similar. Or, if following
> > the style of the rest of vringh code, to provide a callback to fill
> > iovec (although I like conditional more).
> >
> > However I cannot think of an easy way to perform that without long
> > macros or type erasure.
>
> Thank you for pushing me :-)
> I finally managed to avoid code duplication (partial patch attached,
> but not yet fully tested).
>
> @Jason: with this refactoring I removed copy_to_va/copy_to_pa, so I
> also avoided getu16_iotlb_va/pa.
>
> I will send the full patch in v3, but I would like to get your opinion
> first ;-)
>
>
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 0ba3ef809e48..71dd67700e36 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1096,8 +1096,7 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>
> static int iotlb_translate(const struct vringh *vrh,
> u64 addr, u64 len, u64 *translated,
> - struct bio_vec iov[],
> - int iov_size, u32 perm)
> + void *iov, int iov_size, bool iovec, u32 perm)
I think this is an improvement, but we're doing type erasure here. I
don't think it is a big deal since the function is not exported, it's
pretty contained in this file, so I'd ack this version too. I'm just
throwing ideas here:
a) typedef the union {iovec, bio_vec} and use that type in the parameter.
As a drawback, that union feels out of place in this file. Is this the
only place where it is needed? I don't see other similar uses in the
kernel.
b) To convert from iov to bio_iov at return
The drawback is the extra processing if the compiler is not smart
enough to inline it. I prefer the previous one but I didn't want to
omit it, just in case.
Thanks!
> {
> struct vhost_iotlb_map *map;
> struct vhost_iotlb *iotlb = vrh->iotlb;
> @@ -1107,7 +1106,7 @@ static int iotlb_translate(const struct vringh *vrh,
> spin_lock(vrh->iotlb_lock);
>
> while (len > s) {
> - u64 size, pa, pfn;
> + u64 size;
>
> if (unlikely(ret >= iov_size)) {
> ret = -ENOBUFS;
> @@ -1124,10 +1123,22 @@ static int iotlb_translate(const struct vringh *vrh,
> }
>
> size = map->size - addr + map->start;
> - pa = map->addr + addr - map->start;
> - pfn = pa >> PAGE_SHIFT;
> - bvec_set_page(&iov[ret], pfn_to_page(pfn), min(len - s, size),
> - pa & (PAGE_SIZE - 1));
> + if (iovec) {
> + struct iovec *iovec = iov;
> +
> + iovec[ret].iov_len = min(len - s, size);
> + iovec[ret].iov_base = (void __user *)(unsigned long)
> + (map->addr + addr - map->start);
> + } else {
> + u64 pa = map->addr + addr - map->start;
> + u64 pfn = pa >> PAGE_SHIFT;
> + struct bio_vec *bvec = iov;
> +
> + bvec_set_page(&bvec[ret], pfn_to_page(pfn),
> + min(len - s, size),
> + pa & (PAGE_SIZE - 1));
> + }
> +
> s += size;
> addr += size;
> ++ret;
> @@ -1141,26 +1152,38 @@ static int iotlb_translate(const struct vringh *vrh,
> return ret;
> }
>
> +#define IOTLB_IOV_SIZE 16
> +
> static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
> void *src, size_t len)
> {
> u64 total_translated = 0;
>
> while (total_translated < len) {
> - struct bio_vec iov[16];
> + union {
> + struct iovec iovec[IOTLB_IOV_SIZE];
> + struct bio_vec bvec[IOTLB_IOV_SIZE];
> + } iov;
> struct iov_iter iter;
> u64 translated;
> int ret;
>
> ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
> len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
> + &iov, IOTLB_IOV_SIZE, vrh->use_va,
> + VHOST_MAP_RO);
> if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> + ret = IOTLB_IOV_SIZE;
> else if (ret < 0)
> return ret;
>
> - iov_iter_bvec(&iter, ITER_SOURCE, iov, ret, translated);
> + if (vrh->use_va) {
> + iov_iter_init(&iter, ITER_SOURCE, iov.iovec, ret,
> + translated);
> + } else {
> + iov_iter_bvec(&iter, ITER_SOURCE, iov.bvec, ret,
> + translated);
> + }
>
> ret = copy_from_iter(dst, translated, &iter);
> if (ret < 0)
> @@ -1180,20 +1203,30 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> u64 total_translated = 0;
>
> while (total_translated < len) {
> - struct bio_vec iov[16];
> + union {
> + struct iovec iovec[IOTLB_IOV_SIZE];
> + struct bio_vec bvec[IOTLB_IOV_SIZE];
> + } iov;
> struct iov_iter iter;
> u64 translated;
> int ret;
>
> ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
> len - total_translated, &translated,
> - iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
> + &iov, IOTLB_IOV_SIZE, vrh->use_va,
> + VHOST_MAP_WO);
> if (ret == -ENOBUFS)
> - ret = ARRAY_SIZE(iov);
> + ret = IOTLB_IOV_SIZE;
> else if (ret < 0)
> return ret;
>
> - iov_iter_bvec(&iter, ITER_DEST, iov, ret, translated);
> + if (vrh->use_va) {
> + iov_iter_init(&iter, ITER_DEST, iov.iovec, ret,
> + translated);
> + } else {
> + iov_iter_bvec(&iter, ITER_DEST, iov.bvec, ret,
> + translated);
> + }
>
> ret = copy_to_iter(src, translated, &iter);
> if (ret < 0)
> @@ -1210,20 +1243,32 @@ static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
> static inline int getu16_iotlb(const struct vringh *vrh,
> u16 *val, const __virtio16 *p)
> {
> - struct bio_vec iov;
> - void *kaddr, *from;
> + union {
> + struct iovec iovec;
> + struct bio_vec bvec;
> + } iov;
> + __virtio16 tmp;
> int ret;
>
> /* Atomic read is needed for getu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_RO);
> + ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - from = kaddr + iov.bv_offset;
> - *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> - kunmap_local(kaddr);
> + if (vrh->use_va) {
> + ret = __get_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
> + if (ret)
> + return ret;
> + } else {
> + void *kaddr = kmap_local_page(iov.bvec.bv_page);
> + void *from = kaddr + iov.bvec.bv_offset;
> +
> + tmp = READ_ONCE(*(__virtio16 *)from);
> + kunmap_local(kaddr);
> + }
> +
> + *val = vringh16_to_cpu(vrh, tmp);
>
> return 0;
> }
> @@ -1231,20 +1276,32 @@ static inline int getu16_iotlb(const struct vringh *vrh,
> static inline int putu16_iotlb(const struct vringh *vrh,
> __virtio16 *p, u16 val)
> {
> - struct bio_vec iov;
> - void *kaddr, *to;
> + union {
> + struct iovec iovec;
> + struct bio_vec bvec;
> + } iov;
> + __virtio16 tmp;
> int ret;
>
> /* Atomic write is needed for putu16 */
> - ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
> - &iov, 1, VHOST_MAP_WO);
> + ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
> + NULL, &iov, 1, vrh->use_va, VHOST_MAP_RO);
> if (ret < 0)
> return ret;
>
> - kaddr = kmap_local_page(iov.bv_page);
> - to = kaddr + iov.bv_offset;
> - WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> - kunmap_local(kaddr);
> + tmp = cpu_to_vringh16(vrh, val);
> +
> + if (vrh->use_va) {
> + ret = __put_user(tmp, (__virtio16 __user *)iov.iovec.iov_base);
> + if (ret)
> + return ret;
> + } else {
> + void *kaddr = kmap_local_page(iov.bvec.bv_page);
> + void *to = kaddr + iov.bvec.bv_offset;
> +
> + WRITE_ONCE(*(__virtio16 *)to, tmp);
> + kunmap_local(kaddr);
> + }
>
> return 0;
> }
>
On Fri, Mar 17, 2023 at 10:49:27AM +0100, Eugenio Perez Martin wrote:
>On Thu, Mar 16, 2023 at 5:07 PM Stefano Garzarella <[email protected]> wrote:
>>
>> On Fri, Mar 3, 2023 at 3:39 PM Eugenio Perez Martin <[email protected]> wrote:
>> >
>> > On Thu, Mar 2, 2023 at 12:35 PM Stefano Garzarella <[email protected]> wrote:
>> > >
>> > > vDPA supports the possibility to use user VA in the iotlb messages.
>> > > So, let's add support for user VA in vringh to use it in the vDPA
>> > > simulators.
>> > >
>> > > Signed-off-by: Stefano Garzarella <[email protected]>
>> > > ---
>> > >
>> > > Notes:
>> > > v2:
>> > > - replace kmap_atomic() with kmap_local_page() [see previous patch]
>> > > - fix cast warnings when build with W=1 C=1
>> > >
>> > > include/linux/vringh.h | 5 +-
>> > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>> > > drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 +-
>> > > drivers/vhost/vringh.c | 247 ++++++++++++++++++++++++------
>> > > 4 files changed, 205 insertions(+), 53 deletions(-)
>> > >
>>
>> [...]
>>
>> >
>> > It seems to me iotlb_translate_va and iotlb_translate_pa are very
>> > similar, their only difference is that the argument is that iov is
>> > iovec instead of bio_vec. And how to fill it, obviously.
>> >
>> > It would be great to merge both functions, only differing with a
>> > conditional on vrh->use_va, or generics, or similar. Or, if following
>> > the style of the rest of vringh code, to provide a callback to fill
>> > iovec (although I like conditional more).
>> >
>> > However I cannot think of an easy way to perform that without long
>> > macros or type erasure.
>>
>> Thank you for pushing me :-)
>> I finally managed to avoid code duplication (partial patch attached,
>> but not yet fully tested).
>>
>> @Jason: with this refactoring I removed copy_to_va/copy_to_pa, so I
>> also avoided getu16_iotlb_va/pa.
>>
>> I will send the full patch in v3, but I would like to get your opinion
>> first ;-)
>>
>>
>>
>> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>> index 0ba3ef809e48..71dd67700e36 100644
>> --- a/drivers/vhost/vringh.c
>> +++ b/drivers/vhost/vringh.c
>> @@ -1096,8 +1096,7 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
>>
>> static int iotlb_translate(const struct vringh *vrh,
>> u64 addr, u64 len, u64 *translated,
>> - struct bio_vec iov[],
>> - int iov_size, u32 perm)
>> + void *iov, int iov_size, bool iovec, u32 perm)
>
>I think this is an improvement, but we're doing type erasure here. I
>don't think it is a big deal since the function is not exported, it's
>pretty contained in this file, so I'd ack this version too. I'm just
>throwing ideas here:
>
>a) typedef the union {iovec, bio_vec} and use that type in the parameter.
>
>As a drawback, that union feels out of place in this file. Is this the
>only place where it is needed? I don't see other similar uses in the
>kernel.
iov_iter has something similar, but they are const pointers, so IIUC
it is not supposed to be used to set the bvec contents, just iterate it.
Anyway I thought something similar and should be doable, but since
it was internal API I went to type erasure.
>
>b) To convert from iov to bio_iov at return
>The drawback is the extra processing if the compiler is not smart
>enough to inline it. I prefer the previous one but I didn't want to
>omit it, just in case.
Yep, I prefer too the previous one, so let's go in that direction for
v3 ;-)
Thanks,
Stefano