2023-03-23 05:33:08

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 00/11] VDUSE: Improve performance

Hi all,

This series introduces some ways to improve VDUSE performance.

Patch 1 ~ 6 bring current interrupt affinity spreading mechanism
to vduse device and make it possible for the virtio-blk driver
to build the blk-mq queues based on it. This would be useful to
mitigate the virtqueue lock contention in virtio-blk driver. In
our test, with those patches, we could get ~50% improvement (600k
iops -> 900k iops) when using per-cpu virtqueue.

Patch 7 adds a sysfs interface for each vduse virtqueue to change
the affinity for IRQ callback. It would be helpful for performance
tuning when the affinity mask contains more than one CPU.

Patch 8 ~ 9 associate an eventfd to the vdpa callback so that
we can signal it directly during irq injection without scheduling
an additional workqueue thread to do that.

Patch 10, 11 add a sysfs interface to support specifying bounce
buffer size in virtio-vdpa case. The high throughput workloads
can benefit from it. And we can also use it to reduce the memory
overhead for small throughput workloads.

Please review, thanks!

V3 to V4:
- Remove set_irq_affinity callback [Jason]
- Move the logic of interrupt affinity spreading mechanism from
vduse module to virtio-vdpa module and unify set_vq_affinity
callback to pass the affinity to vduse device [Jason]
- Rename the eventfd variable of the vdpa callback and add more
comments for it [Jason]
- Fix compile warnings

V2 to V3:
- Rebased to newest kernel tree
- Export group_cpus_evenly() instead of irq_create_affinity_masks() [MST]
- Remove the sysfs for workqueue control [Jason]
- Associate an eventfd to the vdpa callback [Jason]
- Signal the eventfd directly in vhost-vdpa case [Jason]
- Use round-robin to spread IRQs between CPUs in the affinity mask [Jason]
- Handle the cpu hotplug case on IRQ injection [Jason]
- Remove effective IRQ affinity and balance mechanism for IRQ allocation

V1 to V2:
- Export irq_create_affinity_masks()
- Add set/get_vq_affinity and set_irq_affinity callbacks in vDPA
framework
- Add automatic irq callback affinity support in VDUSE driver [Jason]
- Add more backgrounds information in commit log [Jason]
- Only support changing effective affinity when the value is a subset
of the IRQ callback affinity mask

Xie Yongji (11):
lib/group_cpus: Export group_cpus_evenly()
vdpa: Add set/get_vq_affinity callbacks in vdpa_config_ops
virtio-vdpa: Support interrupt affinity spreading mechanism
vduse: Refactor allocation for vduse virtqueues
vduse: Support set_vq_affinity callback
vduse: Support get_vq_affinity callback
vduse: Add sysfs interface for irq callback affinity
vdpa: Add eventfd for the vdpa callback
vduse: Signal vq trigger eventfd directly if possible
vduse: Delay iova domain creation
vduse: Support specifying bounce buffer size via sysfs

drivers/vdpa/vdpa_user/vduse_dev.c | 414 ++++++++++++++++++++++++-----
drivers/vhost/vdpa.c | 2 +
drivers/virtio/virtio_vdpa.c | 97 +++++++
include/linux/vdpa.h | 19 ++
lib/group_cpus.c | 1 +
5 files changed, 470 insertions(+), 63 deletions(-)

--
2.20.1


2023-03-23 05:33:12

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

To support interrupt affinity spreading mechanism,
this makes use of group_cpus_evenly() to create
an irq callback affinity mask for each virtqueue
of vdpa device. Then we will unify set_vq_affinity
callback to pass the affinity to the vdpa device driver.

Signed-off-by: Xie Yongji <[email protected]>
---
drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index f72696b4c1c2..f3826f42b704 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -13,6 +13,7 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/uuid.h>
+#include <linux/group_cpus.h>
#include <linux/virtio.h>
#include <linux/vdpa.h>
#include <linux/virtio_config.h>
@@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
virtio_vdpa_del_vq(vq);
}

+static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
+{
+ affd->nr_sets = 1;
+ affd->set_size[0] = affvecs;
+}
+
+static struct cpumask *
+create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
+{
+ unsigned int affvecs = 0, curvec, usedvecs, i;
+ struct cpumask *masks = NULL;
+
+ if (nvecs > affd->pre_vectors + affd->post_vectors)
+ affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
+
+ if (!affd->calc_sets)
+ affd->calc_sets = default_calc_sets;
+
+ affd->calc_sets(affd, affvecs);
+
+ if (!affvecs)
+ return NULL;
+
+ masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
+ if (!masks)
+ return NULL;
+
+ /* Fill out vectors at the beginning that don't need affinity */
+ for (curvec = 0; curvec < affd->pre_vectors; curvec++)
+ cpumask_setall(&masks[curvec]);
+
+ for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
+ unsigned int this_vecs = affd->set_size[i];
+ int j;
+ struct cpumask *result = group_cpus_evenly(this_vecs);
+
+ if (!result) {
+ kfree(masks);
+ return NULL;
+ }
+
+ for (j = 0; j < this_vecs; j++)
+ cpumask_copy(&masks[curvec + j], &result[j]);
+ kfree(result);
+
+ curvec += this_vecs;
+ usedvecs += this_vecs;
+ }
+
+ /* Fill out vectors at the end that don't need affinity */
+ if (usedvecs >= affvecs)
+ curvec = affd->pre_vectors + affvecs;
+ else
+ curvec = affd->pre_vectors + usedvecs;
+ for (; curvec < nvecs; curvec++)
+ cpumask_setall(&masks[curvec]);
+
+ return masks;
+}
+
static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
@@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
struct vdpa_device *vdpa = vd_get_vdpa(vdev);
const struct vdpa_config_ops *ops = vdpa->config;
+ struct irq_affinity default_affd = { 0 };
+ struct cpumask *masks;
struct vdpa_callback cb;
int i, err, queue_idx = 0;

+ masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
+ if (!masks)
+ return -ENOMEM;
+
for (i = 0; i < nvqs; ++i) {
if (!names[i]) {
vqs[i] = NULL;
@@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
err = PTR_ERR(vqs[i]);
goto err_setup_vq;
}
+ ops->set_vq_affinity(vdpa, i, &masks[i]);
}

cb.callback = virtio_vdpa_config_cb;
--
2.20.1

2023-03-23 05:33:23

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 01/11] lib/group_cpus: Export group_cpus_evenly()

Export group_cpus_evenly() so that some modules
can make use of it to group CPUs evenly according
to NUMA and CPU locality.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
---
lib/group_cpus.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 9c837a35fef7..aa3f6815bb12 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -426,3 +426,4 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
return masks;
}
#endif /* CONFIG_SMP */
+EXPORT_SYMBOL_GPL(group_cpus_evenly);
--
2.20.1

2023-03-23 05:33:28

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 02/11] vdpa: Add set/get_vq_affinity callbacks in vdpa_config_ops

This introduces set/get_vq_affinity callbacks in
vdpa_config_ops to support virtqueue affinity
management for vdpa device drivers.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
---
drivers/virtio/virtio_vdpa.c | 28 ++++++++++++++++++++++++++++
include/linux/vdpa.h | 13 +++++++++++++
2 files changed, 41 insertions(+)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index d7f5af62ddaa..f72696b4c1c2 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -337,6 +337,32 @@ static const char *virtio_vdpa_bus_name(struct virtio_device *vdev)
return dev_name(&vdpa->dev);
}

+static int virtio_vdpa_set_vq_affinity(struct virtqueue *vq,
+ const struct cpumask *cpu_mask)
+{
+ struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vq->vdev);
+ struct vdpa_device *vdpa = vd_dev->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ unsigned int index = vq->index;
+
+ if (ops->set_vq_affinity)
+ return ops->set_vq_affinity(vdpa, index, cpu_mask);
+
+ return 0;
+}
+
+static const struct cpumask *
+virtio_vdpa_get_vq_affinity(struct virtio_device *vdev, int index)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (ops->get_vq_affinity)
+ return ops->get_vq_affinity(vdpa, index);
+
+ return NULL;
+}
+
static const struct virtio_config_ops virtio_vdpa_config_ops = {
.get = virtio_vdpa_get,
.set = virtio_vdpa_set,
@@ -349,6 +375,8 @@ static const struct virtio_config_ops virtio_vdpa_config_ops = {
.get_features = virtio_vdpa_get_features,
.finalize_features = virtio_vdpa_finalize_features,
.bus_name = virtio_vdpa_bus_name,
+ .set_vq_affinity = virtio_vdpa_set_vq_affinity,
+ .get_vq_affinity = virtio_vdpa_get_vq_affinity,
};

static void virtio_vdpa_release_dev(struct device *_d)
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 43f59ef10cc9..e52c9a37999c 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -250,6 +250,15 @@ struct vdpa_map_file {
* @vdev: vdpa device
* Returns the iova range supported by
* the device.
+ * @set_vq_affinity: Set the affinity of virtqueue (optional)
+ * @vdev: vdpa device
+ * @idx: virtqueue index
+ * @cpu_mask: the affinity mask
+ * Returns integer: success (0) or error (< 0)
+ * @get_vq_affinity: Get the affinity of virtqueue (optional)
+ * @vdev: vdpa device
+ * @idx: virtqueue index
+ * Returns the affinity mask
* @set_group_asid: Set address space identifier for a
* virtqueue group (optional)
* @vdev: vdpa device
@@ -340,6 +349,10 @@ struct vdpa_config_ops {
const void *buf, unsigned int len);
u32 (*get_generation)(struct vdpa_device *vdev);
struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev);
+ int (*set_vq_affinity)(struct vdpa_device *vdev, u16 idx,
+ const struct cpumask *cpu_mask);
+ const struct cpumask *(*get_vq_affinity)(struct vdpa_device *vdev,
+ u16 idx);

/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, unsigned int asid,
--
2.20.1

2023-03-23 05:33:42

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 06/11] vduse: Support get_vq_affinity callback

This implements get_vq_affinity callback so that
the virtio-blk driver can build the blk-mq queues
based on the irq callback affinity.

Signed-off-by: Xie Yongji <[email protected]>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 45aa8703c4b5..cefabd0dab9c 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -722,6 +722,14 @@ static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx,
return 0;
}

+static const struct cpumask *
+vduse_vdpa_get_vq_affinity(struct vdpa_device *vdpa, u16 idx)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ return &dev->vqs[idx]->irq_affinity;
+}
+
static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
unsigned int asid,
struct vhost_iotlb *iotlb)
@@ -773,6 +781,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
.set_config = vduse_vdpa_set_config,
.get_generation = vduse_vdpa_get_generation,
.set_vq_affinity = vduse_vdpa_set_vq_affinity,
+ .get_vq_affinity = vduse_vdpa_get_vq_affinity,
.reset = vduse_vdpa_reset,
.set_map = vduse_vdpa_set_map,
.free = vduse_vdpa_free,
--
2.20.1

2023-03-23 05:33:46

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 04/11] vduse: Refactor allocation for vduse virtqueues

Allocate memory for vduse virtqueues one by one instead of
doing one allocation for all of them.

This is a preparation for adding sysfs interface for virtqueues.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 98 ++++++++++++++++++++----------
1 file changed, 66 insertions(+), 32 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 0c3b48616a9f..98359d87a06f 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -76,7 +76,7 @@ struct vduse_umem {
struct vduse_dev {
struct vduse_vdpa *vdev;
struct device *dev;
- struct vduse_virtqueue *vqs;
+ struct vduse_virtqueue **vqs;
struct vduse_iova_domain *domain;
char *name;
struct mutex lock;
@@ -434,7 +434,7 @@ static void vduse_dev_reset(struct vduse_dev *dev)
flush_work(&dev->inject);

for (i = 0; i < dev->vq_num; i++) {
- struct vduse_virtqueue *vq = &dev->vqs[i];
+ struct vduse_virtqueue *vq = dev->vqs[i];

vq->ready = false;
vq->desc_addr = 0;
@@ -466,7 +466,7 @@ static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx,
u64 device_area)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
- struct vduse_virtqueue *vq = &dev->vqs[idx];
+ struct vduse_virtqueue *vq = dev->vqs[idx];

vq->desc_addr = desc_area;
vq->driver_addr = driver_area;
@@ -500,7 +500,7 @@ static void vduse_vq_kick_work(struct work_struct *work)
static void vduse_vdpa_kick_vq(struct vdpa_device *vdpa, u16 idx)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
- struct vduse_virtqueue *vq = &dev->vqs[idx];
+ struct vduse_virtqueue *vq = dev->vqs[idx];

if (!eventfd_signal_allowed()) {
schedule_work(&vq->kick);
@@ -513,7 +513,7 @@ static void vduse_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
struct vdpa_callback *cb)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
- struct vduse_virtqueue *vq = &dev->vqs[idx];
+ struct vduse_virtqueue *vq = dev->vqs[idx];

spin_lock(&vq->irq_lock);
vq->cb.callback = cb->callback;
@@ -524,7 +524,7 @@ static void vduse_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
static void vduse_vdpa_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
- struct vduse_virtqueue *vq = &dev->vqs[idx];
+ struct vduse_virtqueue *vq = dev->vqs[idx];

vq->num = num;
}
@@ -533,7 +533,7 @@ static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa,
u16 idx, bool ready)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
- struct vduse_virtqueue *vq = &dev->vqs[idx];
+ struct vduse_virtqueue *vq = dev->vqs[idx];

vq->ready = ready;
}
@@ -541,7 +541,7 @@ static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa,
static bool vduse_vdpa_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
- struct vduse_virtqueue *vq = &dev->vqs[idx];
+ struct vduse_virtqueue *vq = dev->vqs[idx];

return vq->ready;
}
@@ -550,7 +550,7 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx,
const struct vdpa_vq_state *state)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
- struct vduse_virtqueue *vq = &dev->vqs[idx];
+ struct vduse_virtqueue *vq = dev->vqs[idx];

if (dev->driver_features & BIT_ULL(VIRTIO_F_RING_PACKED)) {
vq->state.packed.last_avail_counter =
@@ -569,7 +569,7 @@ static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpa_vq_state *state)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
- struct vduse_virtqueue *vq = &dev->vqs[idx];
+ struct vduse_virtqueue *vq = dev->vqs[idx];

if (dev->driver_features & BIT_ULL(VIRTIO_F_RING_PACKED))
return vduse_dev_get_vq_state_packed(dev, vq, &state->packed);
@@ -624,8 +624,8 @@ static u16 vduse_vdpa_get_vq_num_max(struct vdpa_device *vdpa)
int i;

for (i = 0; i < dev->vq_num; i++)
- if (num_max < dev->vqs[i].num_max)
- num_max = dev->vqs[i].num_max;
+ if (num_max < dev->vqs[i]->num_max)
+ num_max = dev->vqs[i]->num_max;

return num_max;
}
@@ -863,7 +863,7 @@ static int vduse_kickfd_setup(struct vduse_dev *dev,
return -EINVAL;

index = array_index_nospec(eventfd->index, dev->vq_num);
- vq = &dev->vqs[index];
+ vq = dev->vqs[index];
if (eventfd->fd >= 0) {
ctx = eventfd_ctx_fdget(eventfd->fd);
if (IS_ERR(ctx))
@@ -889,7 +889,7 @@ static bool vduse_dev_is_ready(struct vduse_dev *dev)
int i;

for (i = 0; i < dev->vq_num; i++)
- if (!dev->vqs[i].num_max)
+ if (!dev->vqs[i]->num_max)
return false;

return true;
@@ -1130,7 +1130,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
break;

index = array_index_nospec(config.index, dev->vq_num);
- dev->vqs[index].num_max = config.max_size;
+ dev->vqs[index]->num_max = config.max_size;
ret = 0;
break;
}
@@ -1148,7 +1148,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
break;

index = array_index_nospec(vq_info.index, dev->vq_num);
- vq = &dev->vqs[index];
+ vq = dev->vqs[index];
vq_info.desc_addr = vq->desc_addr;
vq_info.driver_addr = vq->driver_addr;
vq_info.device_addr = vq->device_addr;
@@ -1198,7 +1198,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
break;

index = array_index_nospec(index, dev->vq_num);
- ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index].inject);
+ ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index]->inject);
break;
}
case VDUSE_IOTLB_REG_UMEM: {
@@ -1339,6 +1339,49 @@ static const struct file_operations vduse_dev_fops = {
.llseek = noop_llseek,
};

+static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
+{
+ int i;
+
+ if (!dev->vqs)
+ return;
+
+ for (i = 0; i < dev->vq_num; i++)
+ kfree(dev->vqs[i]);
+ kfree(dev->vqs);
+}
+
+static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
+{
+ int i;
+
+ dev->vq_align = vq_align;
+ dev->vq_num = vq_num;
+ dev->vqs = kcalloc(dev->vq_num, sizeof(*dev->vqs), GFP_KERNEL);
+ if (!dev->vqs)
+ return -ENOMEM;
+
+ for (i = 0; i < vq_num; i++) {
+ dev->vqs[i] = kzalloc(sizeof(*dev->vqs[i]), GFP_KERNEL);
+ if (!dev->vqs[i])
+ goto err;
+
+ dev->vqs[i]->index = i;
+ INIT_WORK(&dev->vqs[i]->inject, vduse_vq_irq_inject);
+ INIT_WORK(&dev->vqs[i]->kick, vduse_vq_kick_work);
+ spin_lock_init(&dev->vqs[i]->kick_lock);
+ spin_lock_init(&dev->vqs[i]->irq_lock);
+ }
+
+ return 0;
+err:
+ while (i--)
+ kfree(dev->vqs[i]);
+ kfree(dev->vqs);
+ dev->vqs = NULL;
+ return -ENOMEM;
+}
+
static struct vduse_dev *vduse_dev_create(void)
{
struct vduse_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL);
@@ -1396,7 +1439,7 @@ static int vduse_destroy_dev(char *name)
device_destroy(vduse_class, MKDEV(MAJOR(vduse_major), dev->minor));
idr_remove(&vduse_idr, dev->minor);
kvfree(dev->config);
- kfree(dev->vqs);
+ vduse_dev_deinit_vqs(dev);
vduse_domain_destroy(dev->domain);
kfree(dev->name);
vduse_dev_destroy(dev);
@@ -1486,7 +1529,7 @@ ATTRIBUTE_GROUPS(vduse_dev);
static int vduse_create_dev(struct vduse_dev_config *config,
void *config_buf, u64 api_version)
{
- int i, ret;
+ int ret;
struct vduse_dev *dev;

ret = -EEXIST;
@@ -1513,19 +1556,10 @@ static int vduse_create_dev(struct vduse_dev_config *config,

dev->config = config_buf;
dev->config_size = config->config_size;
- dev->vq_align = config->vq_align;
- dev->vq_num = config->vq_num;
- dev->vqs = kcalloc(dev->vq_num, sizeof(*dev->vqs), GFP_KERNEL);
- if (!dev->vqs)
- goto err_vqs;

- for (i = 0; i < dev->vq_num; i++) {
- dev->vqs[i].index = i;
- INIT_WORK(&dev->vqs[i].inject, vduse_vq_irq_inject);
- INIT_WORK(&dev->vqs[i].kick, vduse_vq_kick_work);
- spin_lock_init(&dev->vqs[i].kick_lock);
- spin_lock_init(&dev->vqs[i].irq_lock);
- }
+ ret = vduse_dev_init_vqs(dev, config->vq_align, config->vq_num);
+ if (ret)
+ goto err_vqs;

ret = idr_alloc(&vduse_idr, dev, 1, VDUSE_DEV_MAX, GFP_KERNEL);
if (ret < 0)
@@ -1546,7 +1580,7 @@ static int vduse_create_dev(struct vduse_dev_config *config,
err_dev:
idr_remove(&vduse_idr, dev->minor);
err_idr:
- kfree(dev->vqs);
+ vduse_dev_deinit_vqs(dev);
err_vqs:
vduse_domain_destroy(dev->domain);
err_domain:
--
2.20.1

2023-03-23 05:33:57

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 09/11] vduse: Signal vq trigger eventfd directly if possible

Now the vdpa callback will associate an trigger
eventfd in some cases. For performance reasons,
VDUSE can signal it directly during irq injection.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 77da3685568a..8c06f6ab960b 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -459,6 +459,7 @@ static void vduse_dev_reset(struct vduse_dev *dev)
spin_lock(&vq->irq_lock);
vq->cb.callback = NULL;
vq->cb.private = NULL;
+ vq->cb.trigger = NULL;
spin_unlock(&vq->irq_lock);
flush_work(&vq->inject);
flush_work(&vq->kick);
@@ -524,6 +525,7 @@ static void vduse_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
spin_lock(&vq->irq_lock);
vq->cb.callback = cb->callback;
vq->cb.private = cb->private;
+ vq->cb.trigger = cb->trigger;
spin_unlock(&vq->irq_lock);
}

@@ -941,6 +943,23 @@ static void vduse_vq_irq_inject(struct work_struct *work)
spin_unlock_irq(&vq->irq_lock);
}

+static bool vduse_vq_signal_irqfd(struct vduse_virtqueue *vq)
+{
+ bool signal = false;
+
+ if (!vq->cb.trigger)
+ return false;
+
+ spin_lock_irq(&vq->irq_lock);
+ if (vq->ready && vq->cb.trigger) {
+ eventfd_signal(vq->cb.trigger, 1);
+ signal = true;
+ }
+ spin_unlock_irq(&vq->irq_lock);
+
+ return signal;
+}
+
static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
struct work_struct *irq_work,
int irq_effective_cpu)
@@ -1243,11 +1262,14 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
if (index >= dev->vq_num)
break;

+ ret = 0;
index = array_index_nospec(index, dev->vq_num);
-
- vduse_vq_update_effective_cpu(dev->vqs[index]);
- ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index]->inject,
- dev->vqs[index]->irq_effective_cpu);
+ if (!vduse_vq_signal_irqfd(dev->vqs[index])) {
+ vduse_vq_update_effective_cpu(dev->vqs[index]);
+ ret = vduse_dev_queue_irq_work(dev,
+ &dev->vqs[index]->inject,
+ dev->vqs[index]->irq_effective_cpu);
+ }
break;
}
case VDUSE_IOTLB_REG_UMEM: {
--
2.20.1

2023-03-23 05:34:03

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 07/11] vduse: Add sysfs interface for irq callback affinity

Add sysfs interface for each vduse virtqueue to
get/set the affinity for irq callback. This might
be useful for performance tuning when the irq callback
affinity mask contains more than one CPU.

Signed-off-by: Xie Yongji <[email protected]>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 124 ++++++++++++++++++++++++++---
1 file changed, 113 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index cefabd0dab9c..77da3685568a 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -61,6 +61,7 @@ struct vduse_virtqueue {
struct work_struct kick;
int irq_effective_cpu;
struct cpumask irq_affinity;
+ struct kobject kobj;
};

struct vduse_dev;
@@ -1387,6 +1388,96 @@ static const struct file_operations vduse_dev_fops = {
.llseek = noop_llseek,
};

+static ssize_t irq_cb_affinity_show(struct vduse_virtqueue *vq, char *buf)
+{
+ return sprintf(buf, "%*pb\n", cpumask_pr_args(&vq->irq_affinity));
+}
+
+static ssize_t irq_cb_affinity_store(struct vduse_virtqueue *vq,
+ const char *buf, size_t count)
+{
+ cpumask_var_t new_value;
+ int ret;
+
+ if (!zalloc_cpumask_var(&new_value, GFP_KERNEL))
+ return -ENOMEM;
+
+ ret = cpumask_parse(buf, new_value);
+ if (ret)
+ goto free_mask;
+
+ ret = -EINVAL;
+ if (!cpumask_intersects(new_value, cpu_online_mask))
+ goto free_mask;
+
+ cpumask_copy(&vq->irq_affinity, new_value);
+ ret = count;
+free_mask:
+ free_cpumask_var(new_value);
+ return ret;
+}
+
+struct vq_sysfs_entry {
+ struct attribute attr;
+ ssize_t (*show)(struct vduse_virtqueue *vq, char *buf);
+ ssize_t (*store)(struct vduse_virtqueue *vq, const char *buf,
+ size_t count);
+};
+
+static struct vq_sysfs_entry irq_cb_affinity_attr = __ATTR_RW(irq_cb_affinity);
+
+static struct attribute *vq_attrs[] = {
+ &irq_cb_affinity_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(vq);
+
+static ssize_t vq_attr_show(struct kobject *kobj, struct attribute *attr,
+ char *buf)
+{
+ struct vduse_virtqueue *vq = container_of(kobj,
+ struct vduse_virtqueue, kobj);
+ struct vq_sysfs_entry *entry = container_of(attr,
+ struct vq_sysfs_entry, attr);
+
+ if (!entry->show)
+ return -EIO;
+
+ return entry->show(vq, buf);
+}
+
+static ssize_t vq_attr_store(struct kobject *kobj, struct attribute *attr,
+ const char *buf, size_t count)
+{
+ struct vduse_virtqueue *vq = container_of(kobj,
+ struct vduse_virtqueue, kobj);
+ struct vq_sysfs_entry *entry = container_of(attr,
+ struct vq_sysfs_entry, attr);
+
+ if (!entry->store)
+ return -EIO;
+
+ return entry->store(vq, buf, count);
+}
+
+static const struct sysfs_ops vq_sysfs_ops = {
+ .show = vq_attr_show,
+ .store = vq_attr_store,
+};
+
+static void vq_release(struct kobject *kobj)
+{
+ struct vduse_virtqueue *vq = container_of(kobj,
+ struct vduse_virtqueue, kobj);
+ kfree(vq);
+}
+
+static const struct kobj_type vq_type = {
+ .release = vq_release,
+ .sysfs_ops = &vq_sysfs_ops,
+ .default_groups = vq_groups,
+};
+
static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
{
int i;
@@ -1395,13 +1486,13 @@ static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
return;

for (i = 0; i < dev->vq_num; i++)
- kfree(dev->vqs[i]);
+ kobject_put(&dev->vqs[i]->kobj);
kfree(dev->vqs);
}

static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
{
- int i;
+ int ret, i;

dev->vq_align = vq_align;
dev->vq_num = vq_num;
@@ -1411,8 +1502,10 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)

for (i = 0; i < vq_num; i++) {
dev->vqs[i] = kzalloc(sizeof(*dev->vqs[i]), GFP_KERNEL);
- if (!dev->vqs[i])
+ if (!dev->vqs[i]) {
+ ret = -ENOMEM;
goto err;
+ }

dev->vqs[i]->index = i;
dev->vqs[i]->irq_effective_cpu = IRQ_UNBOUND;
@@ -1421,15 +1514,23 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
spin_lock_init(&dev->vqs[i]->kick_lock);
spin_lock_init(&dev->vqs[i]->irq_lock);
cpumask_setall(&dev->vqs[i]->irq_affinity);
+
+ kobject_init(&dev->vqs[i]->kobj, &vq_type);
+ ret = kobject_add(&dev->vqs[i]->kobj,
+ &dev->dev->kobj, "vq%d", i);
+ if (ret) {
+ kfree(dev->vqs[i]);
+ goto err;
+ }
}

return 0;
err:
while (i--)
- kfree(dev->vqs[i]);
+ kobject_put(&dev->vqs[i]->kobj);
kfree(dev->vqs);
dev->vqs = NULL;
- return -ENOMEM;
+ return ret;
}

static struct vduse_dev *vduse_dev_create(void)
@@ -1607,10 +1708,6 @@ static int vduse_create_dev(struct vduse_dev_config *config,
dev->config = config_buf;
dev->config_size = config->config_size;

- ret = vduse_dev_init_vqs(dev, config->vq_align, config->vq_num);
- if (ret)
- goto err_vqs;
-
ret = idr_alloc(&vduse_idr, dev, 1, VDUSE_DEV_MAX, GFP_KERNEL);
if (ret < 0)
goto err_idr;
@@ -1624,14 +1721,19 @@ static int vduse_create_dev(struct vduse_dev_config *config,
ret = PTR_ERR(dev->dev);
goto err_dev;
}
+
+ ret = vduse_dev_init_vqs(dev, config->vq_align, config->vq_num);
+ if (ret)
+ goto err_vqs;
+
__module_get(THIS_MODULE);

return 0;
+err_vqs:
+ device_destroy(vduse_class, MKDEV(MAJOR(vduse_major), dev->minor));
err_dev:
idr_remove(&vduse_idr, dev->minor);
err_idr:
- vduse_dev_deinit_vqs(dev);
-err_vqs:
vduse_domain_destroy(dev->domain);
err_domain:
kfree(dev->name);
--
2.20.1

2023-03-23 05:37:21

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 11/11] vduse: Support specifying bounce buffer size via sysfs

As discussed in [1], this adds sysfs interface to support
specifying bounce buffer size in virtio-vdpa case. It would
be a performance tuning parameter for high throughput workloads.

[1] https://lore.kernel.org/netdev/[email protected]/

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 45 +++++++++++++++++++++++++++++-
1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index ccca84d51a28..34eebdf030af 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -37,8 +37,11 @@
#define DRV_LICENSE "GPL v2"

#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024)
+#define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024)
#define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
-#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+/* 128 MB reserved for virtqueue creation */
+#define VDUSE_IOVA_SIZE (VDUSE_MAX_BOUNCE_SIZE + 128 * 1024 * 1024)
#define VDUSE_MSG_DEFAULT_TIMEOUT 30

#define IRQ_UNBOUND -1
@@ -1715,8 +1718,48 @@ static ssize_t msg_timeout_store(struct device *device,

static DEVICE_ATTR_RW(msg_timeout);

+static ssize_t bounce_size_show(struct device *device,
+ struct device_attribute *attr, char *buf)
+{
+ struct vduse_dev *dev = dev_get_drvdata(device);
+
+ return sysfs_emit(buf, "%u\n", dev->bounce_size);
+}
+
+static ssize_t bounce_size_store(struct device *device,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct vduse_dev *dev = dev_get_drvdata(device);
+ unsigned int bounce_size;
+ int ret;
+
+ ret = -EPERM;
+ mutex_lock(&dev->domain_lock);
+ if (dev->domain)
+ goto unlock;
+
+ ret = kstrtouint(buf, 10, &bounce_size);
+ if (ret < 0)
+ goto unlock;
+
+ ret = -EINVAL;
+ if (bounce_size > VDUSE_MAX_BOUNCE_SIZE ||
+ bounce_size < VDUSE_MIN_BOUNCE_SIZE)
+ goto unlock;
+
+ dev->bounce_size = bounce_size & PAGE_MASK;
+ ret = count;
+unlock:
+ mutex_unlock(&dev->domain_lock);
+ return ret;
+}
+
+static DEVICE_ATTR_RW(bounce_size);
+
static struct attribute *vduse_dev_attrs[] = {
&dev_attr_msg_timeout.attr,
+ &dev_attr_bounce_size.attr,
NULL
};

--
2.20.1

2023-03-23 05:39:56

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 08/11] vdpa: Add eventfd for the vdpa callback

Add eventfd for the vdpa callback so that user
can signal it directly instead of triggering the
callback. It will be used for vhost-vdpa case.

Signed-off-by: Xie Yongji <[email protected]>
---
drivers/vhost/vdpa.c | 2 ++
drivers/virtio/virtio_vdpa.c | 1 +
include/linux/vdpa.h | 6 ++++++
3 files changed, 9 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..9cd878e25cff 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -599,9 +599,11 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
if (vq->call_ctx.ctx) {
cb.callback = vhost_vdpa_virtqueue_cb;
cb.private = vq;
+ cb.trigger = vq->call_ctx.ctx;
} else {
cb.callback = NULL;
cb.private = NULL;
+ cb.trigger = NULL;
}
ops->set_vq_cb(vdpa, idx, &cb);
vhost_vdpa_setup_vq_irq(v, idx);
diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index f3826f42b704..2a095f37f26b 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -196,6 +196,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
/* Setup virtqueue callback */
cb.callback = callback ? virtio_vdpa_virtqueue_cb : NULL;
cb.private = info;
+ cb.trigger = NULL;
ops->set_vq_cb(vdpa, index, &cb);
ops->set_vq_num(vdpa, index, virtqueue_get_vring_size(vq));

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index e52c9a37999c..0372b2e3d38a 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -13,10 +13,16 @@
* struct vdpa_calllback - vDPA callback definition.
* @callback: interrupt callback function
* @private: the data passed to the callback function
+ * @trigger: the eventfd for the callback (Optional).
+ * When it is set, the vDPA driver must guarantee that
+ * signaling it is functional equivalent to triggering
+ * the callback. Then vDPA parent can signal it directly
+ * instead of triggering the callback.
*/
struct vdpa_callback {
irqreturn_t (*callback)(void *data);
void *private;
+ struct eventfd_ctx *trigger;
};

/**
--
2.20.1

2023-03-23 05:52:18

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 10/11] vduse: Delay iova domain creation

Delay creating iova domain until the vduse device is
registered to vdpa bus.

This is a preparation for adding sysfs interface to
support specifying bounce buffer size for the iova
domain.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 75 +++++++++++++++++++++---------
1 file changed, 53 insertions(+), 22 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 8c06f6ab960b..ccca84d51a28 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -111,6 +111,8 @@ struct vduse_dev {
u32 vq_align;
struct vduse_umem *umem;
struct mutex mem_lock;
+ unsigned int bounce_size;
+ struct mutex domain_lock;
};

struct vduse_dev_msg {
@@ -425,7 +427,7 @@ static void vduse_dev_reset(struct vduse_dev *dev)
struct vduse_iova_domain *domain = dev->domain;

/* The coherent mappings are handled in vduse_dev_free_coherent() */
- if (domain->bounce_map)
+ if (domain && domain->bounce_map)
vduse_domain_reset_bounce_map(domain);

down_write(&dev->rwsem);
@@ -993,6 +995,9 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev,
goto unlock;

ret = -EINVAL;
+ if (!dev->domain)
+ goto unlock;
+
if (dev->umem->iova != iova || size != dev->domain->bounce_size)
goto unlock;

@@ -1019,7 +1024,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
unsigned long npages, lock_limit;
int ret;

- if (!dev->domain->bounce_map ||
+ if (!dev->domain || !dev->domain->bounce_map ||
size != dev->domain->bounce_size ||
iova != 0 || uaddr & ~PAGE_MASK)
return -EINVAL;
@@ -1109,7 +1114,6 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
struct vduse_iotlb_entry entry;
struct vhost_iotlb_map *map;
struct vdpa_map_file *map_file;
- struct vduse_iova_domain *domain = dev->domain;
struct file *f = NULL;

ret = -EFAULT;
@@ -1120,8 +1124,13 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
if (entry.start > entry.last)
break;

- spin_lock(&domain->iotlb_lock);
- map = vhost_iotlb_itree_first(domain->iotlb,
+ mutex_lock(&dev->domain_lock);
+ if (!dev->domain) {
+ mutex_unlock(&dev->domain_lock);
+ break;
+ }
+ spin_lock(&dev->domain->iotlb_lock);
+ map = vhost_iotlb_itree_first(dev->domain->iotlb,
entry.start, entry.last);
if (map) {
map_file = (struct vdpa_map_file *)map->opaque;
@@ -1131,7 +1140,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
entry.last = map->last;
entry.perm = map->perm;
}
- spin_unlock(&domain->iotlb_lock);
+ spin_unlock(&dev->domain->iotlb_lock);
+ mutex_unlock(&dev->domain_lock);
ret = -EINVAL;
if (!f)
break;
@@ -1284,8 +1294,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
sizeof(umem.reserved)))
break;

+ mutex_lock(&dev->domain_lock);
ret = vduse_dev_reg_umem(dev, umem.iova,
umem.uaddr, umem.size);
+ mutex_unlock(&dev->domain_lock);
break;
}
case VDUSE_IOTLB_DEREG_UMEM: {
@@ -1299,15 +1311,15 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
if (!is_mem_zero((const char *)umem.reserved,
sizeof(umem.reserved)))
break;
-
+ mutex_lock(&dev->domain_lock);
ret = vduse_dev_dereg_umem(dev, umem.iova,
umem.size);
+ mutex_unlock(&dev->domain_lock);
break;
}
case VDUSE_IOTLB_GET_INFO: {
struct vduse_iova_info info;
struct vhost_iotlb_map *map;
- struct vduse_iova_domain *domain = dev->domain;

ret = -EFAULT;
if (copy_from_user(&info, argp, sizeof(info)))
@@ -1321,18 +1333,24 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
sizeof(info.reserved)))
break;

- spin_lock(&domain->iotlb_lock);
- map = vhost_iotlb_itree_first(domain->iotlb,
+ mutex_lock(&dev->domain_lock);
+ if (!dev->domain) {
+ mutex_unlock(&dev->domain_lock);
+ break;
+ }
+ spin_lock(&dev->domain->iotlb_lock);
+ map = vhost_iotlb_itree_first(dev->domain->iotlb,
info.start, info.last);
if (map) {
info.start = map->start;
info.last = map->last;
info.capability = 0;
- if (domain->bounce_map && map->start == 0 &&
- map->last == domain->bounce_size - 1)
+ if (dev->domain->bounce_map && map->start == 0 &&
+ map->last == dev->domain->bounce_size - 1)
info.capability |= VDUSE_IOVA_CAP_UMEM;
}
- spin_unlock(&domain->iotlb_lock);
+ spin_unlock(&dev->domain->iotlb_lock);
+ mutex_unlock(&dev->domain_lock);
if (!map)
break;

@@ -1355,7 +1373,10 @@ static int vduse_dev_release(struct inode *inode, struct file *file)
{
struct vduse_dev *dev = file->private_data;

- vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
+ mutex_lock(&dev->domain_lock);
+ if (dev->domain)
+ vduse_dev_dereg_umem(dev, 0, dev->domain->bounce_size);
+ mutex_unlock(&dev->domain_lock);
spin_lock(&dev->msg_lock);
/* Make sure the inflight messages can processed after reconncection */
list_splice_init(&dev->recv_list, &dev->send_list);
@@ -1564,6 +1585,7 @@ static struct vduse_dev *vduse_dev_create(void)

mutex_init(&dev->lock);
mutex_init(&dev->mem_lock);
+ mutex_init(&dev->domain_lock);
spin_lock_init(&dev->msg_lock);
INIT_LIST_HEAD(&dev->send_list);
INIT_LIST_HEAD(&dev->recv_list);
@@ -1613,7 +1635,8 @@ static int vduse_destroy_dev(char *name)
idr_remove(&vduse_idr, dev->minor);
kvfree(dev->config);
vduse_dev_deinit_vqs(dev);
- vduse_domain_destroy(dev->domain);
+ if (dev->domain)
+ vduse_domain_destroy(dev->domain);
kfree(dev->name);
vduse_dev_destroy(dev);
module_put(THIS_MODULE);
@@ -1722,11 +1745,7 @@ static int vduse_create_dev(struct vduse_dev_config *config,
if (!dev->name)
goto err_str;

- dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
- VDUSE_BOUNCE_SIZE);
- if (!dev->domain)
- goto err_domain;
-
+ dev->bounce_size = VDUSE_BOUNCE_SIZE;
dev->config = config_buf;
dev->config_size = config->config_size;

@@ -1756,8 +1775,6 @@ static int vduse_create_dev(struct vduse_dev_config *config,
err_dev:
idr_remove(&vduse_idr, dev->minor);
err_idr:
- vduse_domain_destroy(dev->domain);
-err_domain:
kfree(dev->name);
err_str:
vduse_dev_destroy(dev);
@@ -1924,9 +1941,23 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
if (ret)
return ret;

+ mutex_lock(&dev->domain_lock);
+ if (!dev->domain)
+ dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1,
+ dev->bounce_size);
+ mutex_unlock(&dev->domain_lock);
+ if (!dev->domain) {
+ put_device(&dev->vdev->vdpa.dev);
+ return -ENOMEM;
+ }
+
ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num);
if (ret) {
put_device(&dev->vdev->vdpa.dev);
+ mutex_lock(&dev->domain_lock);
+ vduse_domain_destroy(dev->domain);
+ dev->domain = NULL;
+ mutex_unlock(&dev->domain_lock);
return ret;
}

--
2.20.1

2023-03-23 06:02:07

by Yongji Xie

[permalink] [raw]
Subject: [PATCH v4 05/11] vduse: Support set_vq_affinity callback

Since virtio-vdpa bus driver already support interrupt
affinity spreading mechanism, let's implement the
set_vq_affinity callback to bring it to vduse device.
After we get the virtqueue's affinity, we can spread
IRQs between CPUs in the affinity mask, in a round-robin
manner, to run the irq callback.

Signed-off-by: Xie Yongji <[email protected]>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 61 ++++++++++++++++++++++++++----
1 file changed, 54 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 98359d87a06f..45aa8703c4b5 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -41,6 +41,8 @@
#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
#define VDUSE_MSG_DEFAULT_TIMEOUT 30

+#define IRQ_UNBOUND -1
+
struct vduse_virtqueue {
u16 index;
u16 num_max;
@@ -57,6 +59,8 @@ struct vduse_virtqueue {
struct vdpa_callback cb;
struct work_struct inject;
struct work_struct kick;
+ int irq_effective_cpu;
+ struct cpumask irq_affinity;
};

struct vduse_dev;
@@ -128,6 +132,7 @@ static struct class *vduse_class;
static struct cdev vduse_ctrl_cdev;
static struct cdev vduse_cdev;
static struct workqueue_struct *vduse_irq_wq;
+static struct workqueue_struct *vduse_irq_bound_wq;

static u32 allowed_device_id[] = {
VIRTIO_ID_BLOCK,
@@ -708,6 +713,15 @@ static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
return dev->generation;
}

+static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx,
+ const struct cpumask *cpu_mask)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask);
+ return 0;
+}
+
static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
unsigned int asid,
struct vhost_iotlb *iotlb)
@@ -758,6 +772,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
.get_config = vduse_vdpa_get_config,
.set_config = vduse_vdpa_set_config,
.get_generation = vduse_vdpa_get_generation,
+ .set_vq_affinity = vduse_vdpa_set_vq_affinity,
.reset = vduse_vdpa_reset,
.set_map = vduse_vdpa_set_map,
.free = vduse_vdpa_free,
@@ -917,7 +932,8 @@ static void vduse_vq_irq_inject(struct work_struct *work)
}

static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
- struct work_struct *irq_work)
+ struct work_struct *irq_work,
+ int irq_effective_cpu)
{
int ret = -EINVAL;

@@ -926,7 +942,11 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
goto unlock;

ret = 0;
- queue_work(vduse_irq_wq, irq_work);
+ if (irq_effective_cpu == IRQ_UNBOUND)
+ queue_work(vduse_irq_wq, irq_work);
+ else
+ queue_work_on(irq_effective_cpu,
+ vduse_irq_bound_wq, irq_work);
unlock:
up_read(&dev->rwsem);

@@ -1029,6 +1049,22 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
return ret;
}

+static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
+{
+ int curr_cpu = vq->irq_effective_cpu;
+
+ while (true) {
+ curr_cpu = cpumask_next(curr_cpu, &vq->irq_affinity);
+ if (cpu_online(curr_cpu))
+ break;
+
+ if (curr_cpu >= nr_cpu_ids)
+ curr_cpu = IRQ_UNBOUND;
+ }
+
+ vq->irq_effective_cpu = curr_cpu;
+}
+
static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
@@ -1111,7 +1147,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
break;
}
case VDUSE_DEV_INJECT_CONFIG_IRQ:
- ret = vduse_dev_queue_irq_work(dev, &dev->inject);
+ ret = vduse_dev_queue_irq_work(dev, &dev->inject, IRQ_UNBOUND);
break;
case VDUSE_VQ_SETUP: {
struct vduse_vq_config config;
@@ -1198,7 +1234,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
break;

index = array_index_nospec(index, dev->vq_num);
- ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index]->inject);
+
+ vduse_vq_update_effective_cpu(dev->vqs[index]);
+ ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index]->inject,
+ dev->vqs[index]->irq_effective_cpu);
break;
}
case VDUSE_IOTLB_REG_UMEM: {
@@ -1367,10 +1406,12 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
goto err;

dev->vqs[i]->index = i;
+ dev->vqs[i]->irq_effective_cpu = IRQ_UNBOUND;
INIT_WORK(&dev->vqs[i]->inject, vduse_vq_irq_inject);
INIT_WORK(&dev->vqs[i]->kick, vduse_vq_kick_work);
spin_lock_init(&dev->vqs[i]->kick_lock);
spin_lock_init(&dev->vqs[i]->irq_lock);
+ cpumask_setall(&dev->vqs[i]->irq_affinity);
}

return 0;
@@ -1858,12 +1899,15 @@ static int vduse_init(void)
if (ret)
goto err_cdev;

+ ret = -ENOMEM;
vduse_irq_wq = alloc_workqueue("vduse-irq",
WQ_HIGHPRI | WQ_SYSFS | WQ_UNBOUND, 0);
- if (!vduse_irq_wq) {
- ret = -ENOMEM;
+ if (!vduse_irq_wq)
goto err_wq;
- }
+
+ vduse_irq_bound_wq = alloc_workqueue("vduse-irq-bound", WQ_HIGHPRI, 0);
+ if (!vduse_irq_bound_wq)
+ goto err_bound_wq;

ret = vduse_domain_init();
if (ret)
@@ -1877,6 +1921,8 @@ static int vduse_init(void)
err_mgmtdev:
vduse_domain_exit();
err_domain:
+ destroy_workqueue(vduse_irq_bound_wq);
+err_bound_wq:
destroy_workqueue(vduse_irq_wq);
err_wq:
cdev_del(&vduse_cdev);
@@ -1896,6 +1942,7 @@ static void vduse_exit(void)
{
vduse_mgmtdev_exit();
vduse_domain_exit();
+ destroy_workqueue(vduse_irq_bound_wq);
destroy_workqueue(vduse_irq_wq);
cdev_del(&vduse_cdev);
device_destroy(vduse_class, vduse_major);
--
2.20.1

2023-03-24 06:33:04

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
>
> To support interrupt affinity spreading mechanism,
> this makes use of group_cpus_evenly() to create
> an irq callback affinity mask for each virtqueue
> of vdpa device. Then we will unify set_vq_affinity
> callback to pass the affinity to the vdpa device driver.
>
> Signed-off-by: Xie Yongji <[email protected]>

Thinking hard of all the logics, I think I've found something interesting.

Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
pass irq_affinity to transport specific find_vqs(). This seems a
layer violation since driver has no knowledge of

1) whether or not the callback is based on an IRQ
2) whether or not the device is a PCI or not (the details are hided by
the transport driver)
3) how many vectors could be used by a device

This means the driver can't actually pass a real affinity masks so the
commit passes a zero irq affinity structure as a hint in fact, so the
PCI layer can build a default affinity based that groups cpus evenly
based on the number of MSI-X vectors (the core logic is the
group_cpus_evenly). I think we should fix this by replacing the
irq_affinity structure with

1) a boolean like auto_cb_spreading

or

2) queue to cpu mapping

So each transport can do its own logic based on that. Then virtio-vDPA
can pass that policy to VDUSE where we only need a group_cpus_evenly()
and avoid duplicating irq_create_affinity_masks()?

Thanks

> ---
> drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 68 insertions(+)
>
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index f72696b4c1c2..f3826f42b704 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -13,6 +13,7 @@
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/uuid.h>
> +#include <linux/group_cpus.h>
> #include <linux/virtio.h>
> #include <linux/vdpa.h>
> #include <linux/virtio_config.h>
> @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
> virtio_vdpa_del_vq(vq);
> }
>
> +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
> +{
> + affd->nr_sets = 1;
> + affd->set_size[0] = affvecs;
> +}
> +
> +static struct cpumask *
> +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
> +{
> + unsigned int affvecs = 0, curvec, usedvecs, i;
> + struct cpumask *masks = NULL;
> +
> + if (nvecs > affd->pre_vectors + affd->post_vectors)
> + affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
> +
> + if (!affd->calc_sets)
> + affd->calc_sets = default_calc_sets;
> +
> + affd->calc_sets(affd, affvecs);
> +
> + if (!affvecs)
> + return NULL;
> +
> + masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
> + if (!masks)
> + return NULL;
> +
> + /* Fill out vectors at the beginning that don't need affinity */
> + for (curvec = 0; curvec < affd->pre_vectors; curvec++)
> + cpumask_setall(&masks[curvec]);
> +
> + for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
> + unsigned int this_vecs = affd->set_size[i];
> + int j;
> + struct cpumask *result = group_cpus_evenly(this_vecs);
> +
> + if (!result) {
> + kfree(masks);
> + return NULL;
> + }
> +
> + for (j = 0; j < this_vecs; j++)
> + cpumask_copy(&masks[curvec + j], &result[j]);
> + kfree(result);
> +
> + curvec += this_vecs;
> + usedvecs += this_vecs;
> + }
> +
> + /* Fill out vectors at the end that don't need affinity */
> + if (usedvecs >= affvecs)
> + curvec = affd->pre_vectors + affvecs;
> + else
> + curvec = affd->pre_vectors + usedvecs;
> + for (; curvec < nvecs; curvec++)
> + cpumask_setall(&masks[curvec]);
> +
> + return masks;
> +}
> +
> static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> struct virtqueue *vqs[],
> vq_callback_t *callbacks[],
> @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
> struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> const struct vdpa_config_ops *ops = vdpa->config;
> + struct irq_affinity default_affd = { 0 };
> + struct cpumask *masks;
> struct vdpa_callback cb;
> int i, err, queue_idx = 0;
>
> + masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
> + if (!masks)
> + return -ENOMEM;
> +
> for (i = 0; i < nvqs; ++i) {
> if (!names[i]) {
> vqs[i] = NULL;
> @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> err = PTR_ERR(vqs[i]);
> goto err_setup_vq;
> }
> + ops->set_vq_affinity(vdpa, i, &masks[i]);
> }
>
> cb.callback = virtio_vdpa_config_cb;
> --
> 2.20.1
>

2023-03-24 09:19:51

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

On Fri, Mar 24, 2023 at 02:27:52PM +0800, Jason Wang wrote:
> On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
> >
> > To support interrupt affinity spreading mechanism,
> > this makes use of group_cpus_evenly() to create
> > an irq callback affinity mask for each virtqueue
> > of vdpa device. Then we will unify set_vq_affinity
> > callback to pass the affinity to the vdpa device driver.
> >
> > Signed-off-by: Xie Yongji <[email protected]>
>
> Thinking hard of all the logics, I think I've found something interesting.
>
> Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> pass irq_affinity to transport specific find_vqs(). This seems a
> layer violation since driver has no knowledge of
>
> 1) whether or not the callback is based on an IRQ
> 2) whether or not the device is a PCI or not (the details are hided by
> the transport driver)
> 3) how many vectors could be used by a device
>
> This means the driver can't actually pass a real affinity masks so the
> commit passes a zero irq affinity structure as a hint in fact, so the
> PCI layer can build a default affinity based that groups cpus evenly
> based on the number of MSI-X vectors (the core logic is the
> group_cpus_evenly). I think we should fix this by replacing the
> irq_affinity structure with
>
> 1) a boolean like auto_cb_spreading
>
> or
>
> 2) queue to cpu mapping
>
> So each transport can do its own logic based on that. Then virtio-vDPA
> can pass that policy to VDUSE where we only need a group_cpus_evenly()
> and avoid duplicating irq_create_affinity_masks()?
>
> Thanks

I don't really understand what you propose. Care to post a patch?
Also does it have to block this patchset or can it be done on top?

> > ---
> > drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++
> > 1 file changed, 68 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> > index f72696b4c1c2..f3826f42b704 100644
> > --- a/drivers/virtio/virtio_vdpa.c
> > +++ b/drivers/virtio/virtio_vdpa.c
> > @@ -13,6 +13,7 @@
> > #include <linux/kernel.h>
> > #include <linux/slab.h>
> > #include <linux/uuid.h>
> > +#include <linux/group_cpus.h>
> > #include <linux/virtio.h>
> > #include <linux/vdpa.h>
> > #include <linux/virtio_config.h>
> > @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
> > virtio_vdpa_del_vq(vq);
> > }
> >
> > +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
> > +{
> > + affd->nr_sets = 1;
> > + affd->set_size[0] = affvecs;
> > +}
> > +
> > +static struct cpumask *
> > +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
> > +{
> > + unsigned int affvecs = 0, curvec, usedvecs, i;
> > + struct cpumask *masks = NULL;
> > +
> > + if (nvecs > affd->pre_vectors + affd->post_vectors)
> > + affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
> > +
> > + if (!affd->calc_sets)
> > + affd->calc_sets = default_calc_sets;
> > +
> > + affd->calc_sets(affd, affvecs);
> > +
> > + if (!affvecs)
> > + return NULL;
> > +
> > + masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
> > + if (!masks)
> > + return NULL;
> > +
> > + /* Fill out vectors at the beginning that don't need affinity */
> > + for (curvec = 0; curvec < affd->pre_vectors; curvec++)
> > + cpumask_setall(&masks[curvec]);
> > +
> > + for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
> > + unsigned int this_vecs = affd->set_size[i];
> > + int j;
> > + struct cpumask *result = group_cpus_evenly(this_vecs);
> > +
> > + if (!result) {
> > + kfree(masks);
> > + return NULL;
> > + }
> > +
> > + for (j = 0; j < this_vecs; j++)
> > + cpumask_copy(&masks[curvec + j], &result[j]);
> > + kfree(result);
> > +
> > + curvec += this_vecs;
> > + usedvecs += this_vecs;
> > + }
> > +
> > + /* Fill out vectors at the end that don't need affinity */
> > + if (usedvecs >= affvecs)
> > + curvec = affd->pre_vectors + affvecs;
> > + else
> > + curvec = affd->pre_vectors + usedvecs;
> > + for (; curvec < nvecs; curvec++)
> > + cpumask_setall(&masks[curvec]);
> > +
> > + return masks;
> > +}
> > +
> > static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> > struct virtqueue *vqs[],
> > vq_callback_t *callbacks[],
> > @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> > struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
> > struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> > const struct vdpa_config_ops *ops = vdpa->config;
> > + struct irq_affinity default_affd = { 0 };
> > + struct cpumask *masks;
> > struct vdpa_callback cb;
> > int i, err, queue_idx = 0;
> >
> > + masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
> > + if (!masks)
> > + return -ENOMEM;
> > +
> > for (i = 0; i < nvqs; ++i) {
> > if (!names[i]) {
> > vqs[i] = NULL;
> > @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> > err = PTR_ERR(vqs[i]);
> > goto err_setup_vq;
> > }
> > + ops->set_vq_affinity(vdpa, i, &masks[i]);
> > }
> >
> > cb.callback = virtio_vdpa_config_cb;
> > --
> > 2.20.1
> >

2023-03-28 03:29:24

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <[email protected]> wrote:
>
> On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <[email protected]> wrote:
> >
> > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
> > >
> > > To support interrupt affinity spreading mechanism,
> > > this makes use of group_cpus_evenly() to create
> > > an irq callback affinity mask for each virtqueue
> > > of vdpa device. Then we will unify set_vq_affinity
> > > callback to pass the affinity to the vdpa device driver.
> > >
> > > Signed-off-by: Xie Yongji <[email protected]>
> >
> > Thinking hard of all the logics, I think I've found something interesting.
> >
> > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > pass irq_affinity to transport specific find_vqs(). This seems a
> > layer violation since driver has no knowledge of
> >
> > 1) whether or not the callback is based on an IRQ
> > 2) whether or not the device is a PCI or not (the details are hided by
> > the transport driver)
> > 3) how many vectors could be used by a device
> >
> > This means the driver can't actually pass a real affinity masks so the
> > commit passes a zero irq affinity structure as a hint in fact, so the
> > PCI layer can build a default affinity based that groups cpus evenly
> > based on the number of MSI-X vectors (the core logic is the
> > group_cpus_evenly). I think we should fix this by replacing the
> > irq_affinity structure with
> >
> > 1) a boolean like auto_cb_spreading
> >
> > or
> >
> > 2) queue to cpu mapping
> >
>
> But only the driver knows which queues are used in the control path
> which don't need the automatic irq affinity assignment.

Is this knowledge awarded by the transport driver now?

E.g virtio-blk uses:

struct irq_affinity desc = { 0, };

Atleast we can tell the transport driver which vq requires automatic
irq affinity.

> So I think the
> irq_affinity structure can only be created by device drivers and
> passed to the virtio-pci/virtio-vdpa driver.

This could be not easy since the driver doesn't even know how many
interrupts will be used by the transport driver, so it can't built the
actual affinity structure.

>
> > So each transport can do its own logic based on that. Then virtio-vDPA
> > can pass that policy to VDUSE where we only need a group_cpus_evenly()
> > and avoid duplicating irq_create_affinity_masks()?
> >
>
> I don't get why we would have duplicated irq_create_affinity_masks().

I meant the create_affinity_masks() in patch 3 seems a duplication of
irq_create_affinity_masks().

Thanks

>
> Thanks,
> Yongji
>

2023-03-28 03:32:03

by Yongji Xie

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <[email protected]> wrote:
>
> On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
> >
> > To support interrupt affinity spreading mechanism,
> > this makes use of group_cpus_evenly() to create
> > an irq callback affinity mask for each virtqueue
> > of vdpa device. Then we will unify set_vq_affinity
> > callback to pass the affinity to the vdpa device driver.
> >
> > Signed-off-by: Xie Yongji <[email protected]>
>
> Thinking hard of all the logics, I think I've found something interesting.
>
> Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> pass irq_affinity to transport specific find_vqs(). This seems a
> layer violation since driver has no knowledge of
>
> 1) whether or not the callback is based on an IRQ
> 2) whether or not the device is a PCI or not (the details are hided by
> the transport driver)
> 3) how many vectors could be used by a device
>
> This means the driver can't actually pass a real affinity masks so the
> commit passes a zero irq affinity structure as a hint in fact, so the
> PCI layer can build a default affinity based that groups cpus evenly
> based on the number of MSI-X vectors (the core logic is the
> group_cpus_evenly). I think we should fix this by replacing the
> irq_affinity structure with
>
> 1) a boolean like auto_cb_spreading
>
> or
>
> 2) queue to cpu mapping
>

But only the driver knows which queues are used in the control path
which don't need the automatic irq affinity assignment. So I think the
irq_affinity structure can only be created by device drivers and
passed to the virtio-pci/virtio-vdpa driver.

> So each transport can do its own logic based on that. Then virtio-vDPA
> can pass that policy to VDUSE where we only need a group_cpus_evenly()
> and avoid duplicating irq_create_affinity_masks()?
>

I don't get why we would have duplicated irq_create_affinity_masks().

Thanks,
Yongji

2023-03-28 03:45:49

by Yongji Xie

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <[email protected]> wrote:
>
> On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <[email protected]> wrote:
> >
> > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <[email protected]> wrote:
> > >
> > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
> > > >
> > > > To support interrupt affinity spreading mechanism,
> > > > this makes use of group_cpus_evenly() to create
> > > > an irq callback affinity mask for each virtqueue
> > > > of vdpa device. Then we will unify set_vq_affinity
> > > > callback to pass the affinity to the vdpa device driver.
> > > >
> > > > Signed-off-by: Xie Yongji <[email protected]>
> > >
> > > Thinking hard of all the logics, I think I've found something interesting.
> > >
> > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > > pass irq_affinity to transport specific find_vqs(). This seems a
> > > layer violation since driver has no knowledge of
> > >
> > > 1) whether or not the callback is based on an IRQ
> > > 2) whether or not the device is a PCI or not (the details are hided by
> > > the transport driver)
> > > 3) how many vectors could be used by a device
> > >
> > > This means the driver can't actually pass a real affinity masks so the
> > > commit passes a zero irq affinity structure as a hint in fact, so the
> > > PCI layer can build a default affinity based that groups cpus evenly
> > > based on the number of MSI-X vectors (the core logic is the
> > > group_cpus_evenly). I think we should fix this by replacing the
> > > irq_affinity structure with
> > >
> > > 1) a boolean like auto_cb_spreading
> > >
> > > or
> > >
> > > 2) queue to cpu mapping
> > >
> >
> > But only the driver knows which queues are used in the control path
> > which don't need the automatic irq affinity assignment.
>
> Is this knowledge awarded by the transport driver now?
>

This knowledge is awarded by the device driver rather than the transport driver.

E.g. virtio-scsi uses:

struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control
queue, vq1 is event queue

> E.g virtio-blk uses:
>
> struct irq_affinity desc = { 0, };
>
> Atleast we can tell the transport driver which vq requires automatic
> irq affinity.
>

I think that is what the current implementation does.

> > So I think the
> > irq_affinity structure can only be created by device drivers and
> > passed to the virtio-pci/virtio-vdpa driver.
>
> This could be not easy since the driver doesn't even know how many
> interrupts will be used by the transport driver, so it can't built the
> actual affinity structure.
>

The actual affinity mask is built by the transport driver, device
driver only passes a hint on which queues don't need the automatic irq
affinity assignment.

Thanks,
Yongji

2023-03-28 03:54:15

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <[email protected]> wrote:
>
> On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <[email protected]> wrote:
> >
> > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <[email protected]> wrote:
> > >
> > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <[email protected]> wrote:
> > > >
> > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
> > > > >
> > > > > To support interrupt affinity spreading mechanism,
> > > > > this makes use of group_cpus_evenly() to create
> > > > > an irq callback affinity mask for each virtqueue
> > > > > of vdpa device. Then we will unify set_vq_affinity
> > > > > callback to pass the affinity to the vdpa device driver.
> > > > >
> > > > > Signed-off-by: Xie Yongji <[email protected]>
> > > >
> > > > Thinking hard of all the logics, I think I've found something interesting.
> > > >
> > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > > > pass irq_affinity to transport specific find_vqs(). This seems a
> > > > layer violation since driver has no knowledge of
> > > >
> > > > 1) whether or not the callback is based on an IRQ
> > > > 2) whether or not the device is a PCI or not (the details are hided by
> > > > the transport driver)
> > > > 3) how many vectors could be used by a device
> > > >
> > > > This means the driver can't actually pass a real affinity masks so the
> > > > commit passes a zero irq affinity structure as a hint in fact, so the
> > > > PCI layer can build a default affinity based that groups cpus evenly
> > > > based on the number of MSI-X vectors (the core logic is the
> > > > group_cpus_evenly). I think we should fix this by replacing the
> > > > irq_affinity structure with
> > > >
> > > > 1) a boolean like auto_cb_spreading
> > > >
> > > > or
> > > >
> > > > 2) queue to cpu mapping
> > > >
> > >
> > > But only the driver knows which queues are used in the control path
> > > which don't need the automatic irq affinity assignment.
> >
> > Is this knowledge awarded by the transport driver now?
> >
>
> This knowledge is awarded by the device driver rather than the transport driver.
>
> E.g. virtio-scsi uses:
>
> struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control
> queue, vq1 is event queue

Ok, but it only works as a hint, it's not a real affinity. As replied,
we can pass an array of boolean in this case then transport driver
knows it doesn't need to use automatic affinity for the first two
queues.

>
> > E.g virtio-blk uses:
> >
> > struct irq_affinity desc = { 0, };
> >
> > Atleast we can tell the transport driver which vq requires automatic
> > irq affinity.
> >
>
> I think that is what the current implementation does.
>
> > > So I think the
> > > irq_affinity structure can only be created by device drivers and
> > > passed to the virtio-pci/virtio-vdpa driver.
> >
> > This could be not easy since the driver doesn't even know how many
> > interrupts will be used by the transport driver, so it can't built the
> > actual affinity structure.
> >
>
> The actual affinity mask is built by the transport driver,

For PCI yes, it talks directly to the IRQ subsystems.

> device
> driver only passes a hint on which queues don't need the automatic irq
> affinity assignment.

But not for virtio-vDPA since the IRQ needs to be dealt with by the
parent driver. For our case, it's the VDUSE where it doesn't need IRQ
at all, a queue to cpu mapping is sufficient.

Thanks

>
> Thanks,
> Yongji
>

2023-03-28 04:38:24

by Yongji Xie

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

On Tue, Mar 28, 2023 at 11:44 AM Jason Wang <[email protected]> wrote:
>
> On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <[email protected]> wrote:
> >
> > On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <[email protected]> wrote:
> > >
> > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <[email protected]> wrote:
> > > > >
> > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
> > > > > >
> > > > > > To support interrupt affinity spreading mechanism,
> > > > > > this makes use of group_cpus_evenly() to create
> > > > > > an irq callback affinity mask for each virtqueue
> > > > > > of vdpa device. Then we will unify set_vq_affinity
> > > > > > callback to pass the affinity to the vdpa device driver.
> > > > > >
> > > > > > Signed-off-by: Xie Yongji <[email protected]>
> > > > >
> > > > > Thinking hard of all the logics, I think I've found something interesting.
> > > > >
> > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > > > > pass irq_affinity to transport specific find_vqs(). This seems a
> > > > > layer violation since driver has no knowledge of
> > > > >
> > > > > 1) whether or not the callback is based on an IRQ
> > > > > 2) whether or not the device is a PCI or not (the details are hided by
> > > > > the transport driver)
> > > > > 3) how many vectors could be used by a device
> > > > >
> > > > > This means the driver can't actually pass a real affinity masks so the
> > > > > commit passes a zero irq affinity structure as a hint in fact, so the
> > > > > PCI layer can build a default affinity based that groups cpus evenly
> > > > > based on the number of MSI-X vectors (the core logic is the
> > > > > group_cpus_evenly). I think we should fix this by replacing the
> > > > > irq_affinity structure with
> > > > >
> > > > > 1) a boolean like auto_cb_spreading
> > > > >
> > > > > or
> > > > >
> > > > > 2) queue to cpu mapping
> > > > >
> > > >
> > > > But only the driver knows which queues are used in the control path
> > > > which don't need the automatic irq affinity assignment.
> > >
> > > Is this knowledge awarded by the transport driver now?
> > >
> >
> > This knowledge is awarded by the device driver rather than the transport driver.
> >
> > E.g. virtio-scsi uses:
> >
> > struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control
> > queue, vq1 is event queue
>
> Ok, but it only works as a hint, it's not a real affinity. As replied,
> we can pass an array of boolean in this case then transport driver
> knows it doesn't need to use automatic affinity for the first two
> queues.
>

But we don't know whether we would use other fields in structure
irq_affinity in the future. So a full set should be better?

> >
> > > E.g virtio-blk uses:
> > >
> > > struct irq_affinity desc = { 0, };
> > >
> > > Atleast we can tell the transport driver which vq requires automatic
> > > irq affinity.
> > >
> >
> > I think that is what the current implementation does.
> >
> > > > So I think the
> > > > irq_affinity structure can only be created by device drivers and
> > > > passed to the virtio-pci/virtio-vdpa driver.
> > >
> > > This could be not easy since the driver doesn't even know how many
> > > interrupts will be used by the transport driver, so it can't built the
> > > actual affinity structure.
> > >
> >
> > The actual affinity mask is built by the transport driver,
>
> For PCI yes, it talks directly to the IRQ subsystems.
>
> > device
> > driver only passes a hint on which queues don't need the automatic irq
> > affinity assignment.
>
> But not for virtio-vDPA since the IRQ needs to be dealt with by the
> parent driver. For our case, it's the VDUSE where it doesn't need IRQ
> at all, a queue to cpu mapping is sufficient.
>

The device driver doesn't know whether it is binded to virtio-pci or
virtio-vdpa. So it should pass a full set needed by the automatic irq
affinity assignment instead of a subset. Then virtio-vdpa can choose
to pass a queue to cpu mapping to VDUSE, which is what we do now (use
set_vq_affinity()).

Thanks,
Yongji

2023-03-28 06:13:24

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

On Tue, Mar 28, 2023 at 12:05 PM Yongji Xie <[email protected]> wrote:
>
> On Tue, Mar 28, 2023 at 11:44 AM Jason Wang <[email protected]> wrote:
> >
> > On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <[email protected]> wrote:
> > >
> > > On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <[email protected]> wrote:
> > > >
> > > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
> > > > > > >
> > > > > > > To support interrupt affinity spreading mechanism,
> > > > > > > this makes use of group_cpus_evenly() to create
> > > > > > > an irq callback affinity mask for each virtqueue
> > > > > > > of vdpa device. Then we will unify set_vq_affinity
> > > > > > > callback to pass the affinity to the vdpa device driver.
> > > > > > >
> > > > > > > Signed-off-by: Xie Yongji <[email protected]>
> > > > > >
> > > > > > Thinking hard of all the logics, I think I've found something interesting.
> > > > > >
> > > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > > > > > pass irq_affinity to transport specific find_vqs(). This seems a
> > > > > > layer violation since driver has no knowledge of
> > > > > >
> > > > > > 1) whether or not the callback is based on an IRQ
> > > > > > 2) whether or not the device is a PCI or not (the details are hided by
> > > > > > the transport driver)
> > > > > > 3) how many vectors could be used by a device
> > > > > >
> > > > > > This means the driver can't actually pass a real affinity masks so the
> > > > > > commit passes a zero irq affinity structure as a hint in fact, so the
> > > > > > PCI layer can build a default affinity based that groups cpus evenly
> > > > > > based on the number of MSI-X vectors (the core logic is the
> > > > > > group_cpus_evenly). I think we should fix this by replacing the
> > > > > > irq_affinity structure with
> > > > > >
> > > > > > 1) a boolean like auto_cb_spreading
> > > > > >
> > > > > > or
> > > > > >
> > > > > > 2) queue to cpu mapping
> > > > > >
> > > > >
> > > > > But only the driver knows which queues are used in the control path
> > > > > which don't need the automatic irq affinity assignment.
> > > >
> > > > Is this knowledge awarded by the transport driver now?
> > > >
> > >
> > > This knowledge is awarded by the device driver rather than the transport driver.
> > >
> > > E.g. virtio-scsi uses:
> > >
> > > struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control
> > > queue, vq1 is event queue
> >
> > Ok, but it only works as a hint, it's not a real affinity. As replied,
> > we can pass an array of boolean in this case then transport driver
> > knows it doesn't need to use automatic affinity for the first two
> > queues.
> >
>
> But we don't know whether we would use other fields in structure
> irq_affinity in the future. So a full set should be better?

Good point. So the issue is the calc_sets() and we probably need that
if there's a virtio driver that needs more than one set of vectors
that needs to be spreaded. Technically, we could have a virtio level
abstraction for this but I agree it's probably not worth bothering
now.

>
> > >
> > > > E.g virtio-blk uses:
> > > >
> > > > struct irq_affinity desc = { 0, };
> > > >
> > > > Atleast we can tell the transport driver which vq requires automatic
> > > > irq affinity.
> > > >
> > >
> > > I think that is what the current implementation does.
> > >
> > > > > So I think the
> > > > > irq_affinity structure can only be created by device drivers and
> > > > > passed to the virtio-pci/virtio-vdpa driver.
> > > >
> > > > This could be not easy since the driver doesn't even know how many
> > > > interrupts will be used by the transport driver, so it can't built the
> > > > actual affinity structure.
> > > >
> > >
> > > The actual affinity mask is built by the transport driver,
> >
> > For PCI yes, it talks directly to the IRQ subsystems.
> >
> > > device
> > > driver only passes a hint on which queues don't need the automatic irq
> > > affinity assignment.
> >
> > But not for virtio-vDPA since the IRQ needs to be dealt with by the
> > parent driver. For our case, it's the VDUSE where it doesn't need IRQ
> > at all, a queue to cpu mapping is sufficient.
> >
>
> The device driver doesn't know whether it is binded to virtio-pci or
> virtio-vdpa. So it should pass a full set needed by the automatic irq
> affinity assignment instead of a subset. Then virtio-vdpa can choose
> to pass a queue to cpu mapping to VDUSE, which is what we do now (use
> set_vq_affinity()).

Yes, so basically two ways:

1) automatic IRQ management, passing affd to find_vqs(), affinity was
determined by the transport (e.g vDPA).
2) affinity that is under the control of the driver, it needs to use
set_vq_affinity() but need to deal with cpu hotplug stuffs.

Thanks

>
> Thanks,
> Yongji
>

2023-03-28 06:15:02

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism


在 2023/3/24 17:12, Michael S. Tsirkin 写道:
> On Fri, Mar 24, 2023 at 02:27:52PM +0800, Jason Wang wrote:
>> On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <[email protected]> wrote:
>>> To support interrupt affinity spreading mechanism,
>>> this makes use of group_cpus_evenly() to create
>>> an irq callback affinity mask for each virtqueue
>>> of vdpa device. Then we will unify set_vq_affinity
>>> callback to pass the affinity to the vdpa device driver.
>>>
>>> Signed-off-by: Xie Yongji <[email protected]>
>> Thinking hard of all the logics, I think I've found something interesting.
>>
>> Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
>> pass irq_affinity to transport specific find_vqs(). This seems a
>> layer violation since driver has no knowledge of
>>
>> 1) whether or not the callback is based on an IRQ
>> 2) whether or not the device is a PCI or not (the details are hided by
>> the transport driver)
>> 3) how many vectors could be used by a device
>>
>> This means the driver can't actually pass a real affinity masks so the
>> commit passes a zero irq affinity structure as a hint in fact, so the
>> PCI layer can build a default affinity based that groups cpus evenly
>> based on the number of MSI-X vectors (the core logic is the
>> group_cpus_evenly). I think we should fix this by replacing the
>> irq_affinity structure with
>>
>> 1) a boolean like auto_cb_spreading
>>
>> or
>>
>> 2) queue to cpu mapping
>>
>> So each transport can do its own logic based on that. Then virtio-vDPA
>> can pass that policy to VDUSE where we only need a group_cpus_evenly()
>> and avoid duplicating irq_create_affinity_masks()?
>>
>> Thanks
> I don't really understand what you propose. Care to post a patch?


I meant to avoid passing irq_affinity structure in find_vqs but an array
of boolean telling us whether or not the vq requires a automatic
spreading of callbacks. But it seems less flexible.


> Also does it have to block this patchset or can it be done on top?


We can leave it in the future.

So

Acked-by: Jason Wang <[email protected]>

Thanks


>
>>> ---
>>> drivers/virtio/virtio_vdpa.c | 68 ++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 68 insertions(+)
>>>
>>> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
>>> index f72696b4c1c2..f3826f42b704 100644
>>> --- a/drivers/virtio/virtio_vdpa.c
>>> +++ b/drivers/virtio/virtio_vdpa.c
>>> @@ -13,6 +13,7 @@
>>> #include <linux/kernel.h>
>>> #include <linux/slab.h>
>>> #include <linux/uuid.h>
>>> +#include <linux/group_cpus.h>
>>> #include <linux/virtio.h>
>>> #include <linux/vdpa.h>
>>> #include <linux/virtio_config.h>
>>> @@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
>>> virtio_vdpa_del_vq(vq);
>>> }
>>>
>>> +static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
>>> +{
>>> + affd->nr_sets = 1;
>>> + affd->set_size[0] = affvecs;
>>> +}
>>> +
>>> +static struct cpumask *
>>> +create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
>>> +{
>>> + unsigned int affvecs = 0, curvec, usedvecs, i;
>>> + struct cpumask *masks = NULL;
>>> +
>>> + if (nvecs > affd->pre_vectors + affd->post_vectors)
>>> + affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
>>> +
>>> + if (!affd->calc_sets)
>>> + affd->calc_sets = default_calc_sets;
>>> +
>>> + affd->calc_sets(affd, affvecs);
>>> +
>>> + if (!affvecs)
>>> + return NULL;
>>> +
>>> + masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
>>> + if (!masks)
>>> + return NULL;
>>> +
>>> + /* Fill out vectors at the beginning that don't need affinity */
>>> + for (curvec = 0; curvec < affd->pre_vectors; curvec++)
>>> + cpumask_setall(&masks[curvec]);
>>> +
>>> + for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
>>> + unsigned int this_vecs = affd->set_size[i];
>>> + int j;
>>> + struct cpumask *result = group_cpus_evenly(this_vecs);
>>> +
>>> + if (!result) {
>>> + kfree(masks);
>>> + return NULL;
>>> + }
>>> +
>>> + for (j = 0; j < this_vecs; j++)
>>> + cpumask_copy(&masks[curvec + j], &result[j]);
>>> + kfree(result);
>>> +
>>> + curvec += this_vecs;
>>> + usedvecs += this_vecs;
>>> + }
>>> +
>>> + /* Fill out vectors at the end that don't need affinity */
>>> + if (usedvecs >= affvecs)
>>> + curvec = affd->pre_vectors + affvecs;
>>> + else
>>> + curvec = affd->pre_vectors + usedvecs;
>>> + for (; curvec < nvecs; curvec++)
>>> + cpumask_setall(&masks[curvec]);
>>> +
>>> + return masks;
>>> +}
>>> +
>>> static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>> struct virtqueue *vqs[],
>>> vq_callback_t *callbacks[],
>>> @@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>> struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
>>> struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>>> const struct vdpa_config_ops *ops = vdpa->config;
>>> + struct irq_affinity default_affd = { 0 };
>>> + struct cpumask *masks;
>>> struct vdpa_callback cb;
>>> int i, err, queue_idx = 0;
>>>
>>> + masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
>>> + if (!masks)
>>> + return -ENOMEM;
>>> +
>>> for (i = 0; i < nvqs; ++i) {
>>> if (!names[i]) {
>>> vqs[i] = NULL;
>>> @@ -298,6 +365,7 @@ static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>> err = PTR_ERR(vqs[i]);
>>> goto err_setup_vq;
>>> }
>>> + ops->set_vq_affinity(vdpa, i, &masks[i]);
>>> }
>>>
>>> cb.callback = virtio_vdpa_config_cb;
>>> --
>>> 2.20.1
>>>

2023-03-28 06:16:31

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 05/11] vduse: Support set_vq_affinity callback


在 2023/3/23 13:30, Xie Yongji 写道:
> Since virtio-vdpa bus driver already support interrupt
> affinity spreading mechanism, let's implement the
> set_vq_affinity callback to bring it to vduse device.
> After we get the virtqueue's affinity, we can spread
> IRQs between CPUs in the affinity mask, in a round-robin
> manner, to run the irq callback.
>
> Signed-off-by: Xie Yongji <[email protected]>


Acked-by: Jason Wang <[email protected]>

Thanks


> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 61 ++++++++++++++++++++++++++----
> 1 file changed, 54 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 98359d87a06f..45aa8703c4b5 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -41,6 +41,8 @@
> #define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
> #define VDUSE_MSG_DEFAULT_TIMEOUT 30
>
> +#define IRQ_UNBOUND -1
> +
> struct vduse_virtqueue {
> u16 index;
> u16 num_max;
> @@ -57,6 +59,8 @@ struct vduse_virtqueue {
> struct vdpa_callback cb;
> struct work_struct inject;
> struct work_struct kick;
> + int irq_effective_cpu;
> + struct cpumask irq_affinity;
> };
>
> struct vduse_dev;
> @@ -128,6 +132,7 @@ static struct class *vduse_class;
> static struct cdev vduse_ctrl_cdev;
> static struct cdev vduse_cdev;
> static struct workqueue_struct *vduse_irq_wq;
> +static struct workqueue_struct *vduse_irq_bound_wq;
>
> static u32 allowed_device_id[] = {
> VIRTIO_ID_BLOCK,
> @@ -708,6 +713,15 @@ static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
> return dev->generation;
> }
>
> +static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx,
> + const struct cpumask *cpu_mask)
> +{
> + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> +
> + cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask);
> + return 0;
> +}
> +
> static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> unsigned int asid,
> struct vhost_iotlb *iotlb)
> @@ -758,6 +772,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> .get_config = vduse_vdpa_get_config,
> .set_config = vduse_vdpa_set_config,
> .get_generation = vduse_vdpa_get_generation,
> + .set_vq_affinity = vduse_vdpa_set_vq_affinity,
> .reset = vduse_vdpa_reset,
> .set_map = vduse_vdpa_set_map,
> .free = vduse_vdpa_free,
> @@ -917,7 +932,8 @@ static void vduse_vq_irq_inject(struct work_struct *work)
> }
>
> static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> - struct work_struct *irq_work)
> + struct work_struct *irq_work,
> + int irq_effective_cpu)
> {
> int ret = -EINVAL;
>
> @@ -926,7 +942,11 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> goto unlock;
>
> ret = 0;
> - queue_work(vduse_irq_wq, irq_work);
> + if (irq_effective_cpu == IRQ_UNBOUND)
> + queue_work(vduse_irq_wq, irq_work);
> + else
> + queue_work_on(irq_effective_cpu,
> + vduse_irq_bound_wq, irq_work);
> unlock:
> up_read(&dev->rwsem);
>
> @@ -1029,6 +1049,22 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
> return ret;
> }
>
> +static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> +{
> + int curr_cpu = vq->irq_effective_cpu;
> +
> + while (true) {
> + curr_cpu = cpumask_next(curr_cpu, &vq->irq_affinity);
> + if (cpu_online(curr_cpu))
> + break;
> +
> + if (curr_cpu >= nr_cpu_ids)
> + curr_cpu = IRQ_UNBOUND;
> + }
> +
> + vq->irq_effective_cpu = curr_cpu;
> +}
> +
> static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> unsigned long arg)
> {
> @@ -1111,7 +1147,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> break;
> }
> case VDUSE_DEV_INJECT_CONFIG_IRQ:
> - ret = vduse_dev_queue_irq_work(dev, &dev->inject);
> + ret = vduse_dev_queue_irq_work(dev, &dev->inject, IRQ_UNBOUND);
> break;
> case VDUSE_VQ_SETUP: {
> struct vduse_vq_config config;
> @@ -1198,7 +1234,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> break;
>
> index = array_index_nospec(index, dev->vq_num);
> - ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index]->inject);
> +
> + vduse_vq_update_effective_cpu(dev->vqs[index]);
> + ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index]->inject,
> + dev->vqs[index]->irq_effective_cpu);
> break;
> }
> case VDUSE_IOTLB_REG_UMEM: {
> @@ -1367,10 +1406,12 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
> goto err;
>
> dev->vqs[i]->index = i;
> + dev->vqs[i]->irq_effective_cpu = IRQ_UNBOUND;
> INIT_WORK(&dev->vqs[i]->inject, vduse_vq_irq_inject);
> INIT_WORK(&dev->vqs[i]->kick, vduse_vq_kick_work);
> spin_lock_init(&dev->vqs[i]->kick_lock);
> spin_lock_init(&dev->vqs[i]->irq_lock);
> + cpumask_setall(&dev->vqs[i]->irq_affinity);
> }
>
> return 0;
> @@ -1858,12 +1899,15 @@ static int vduse_init(void)
> if (ret)
> goto err_cdev;
>
> + ret = -ENOMEM;
> vduse_irq_wq = alloc_workqueue("vduse-irq",
> WQ_HIGHPRI | WQ_SYSFS | WQ_UNBOUND, 0);
> - if (!vduse_irq_wq) {
> - ret = -ENOMEM;
> + if (!vduse_irq_wq)
> goto err_wq;
> - }
> +
> + vduse_irq_bound_wq = alloc_workqueue("vduse-irq-bound", WQ_HIGHPRI, 0);
> + if (!vduse_irq_bound_wq)
> + goto err_bound_wq;
>
> ret = vduse_domain_init();
> if (ret)
> @@ -1877,6 +1921,8 @@ static int vduse_init(void)
> err_mgmtdev:
> vduse_domain_exit();
> err_domain:
> + destroy_workqueue(vduse_irq_bound_wq);
> +err_bound_wq:
> destroy_workqueue(vduse_irq_wq);
> err_wq:
> cdev_del(&vduse_cdev);
> @@ -1896,6 +1942,7 @@ static void vduse_exit(void)
> {
> vduse_mgmtdev_exit();
> vduse_domain_exit();
> + destroy_workqueue(vduse_irq_bound_wq);
> destroy_workqueue(vduse_irq_wq);
> cdev_del(&vduse_cdev);
> device_destroy(vduse_class, vduse_major);

2023-03-28 06:18:03

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 06/11] vduse: Support get_vq_affinity callback


在 2023/3/23 13:30, Xie Yongji 写道:
> This implements get_vq_affinity callback so that
> the virtio-blk driver can build the blk-mq queues
> based on the irq callback affinity.
>
> Signed-off-by: Xie Yongji <[email protected]>


Acked-by: Jason Wang <[email protected]>

Thanks


> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 45aa8703c4b5..cefabd0dab9c 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -722,6 +722,14 @@ static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx,
> return 0;
> }
>
> +static const struct cpumask *
> +vduse_vdpa_get_vq_affinity(struct vdpa_device *vdpa, u16 idx)
> +{
> + struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> +
> + return &dev->vqs[idx]->irq_affinity;
> +}
> +
> static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
> unsigned int asid,
> struct vhost_iotlb *iotlb)
> @@ -773,6 +781,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = {
> .set_config = vduse_vdpa_set_config,
> .get_generation = vduse_vdpa_get_generation,
> .set_vq_affinity = vduse_vdpa_set_vq_affinity,
> + .get_vq_affinity = vduse_vdpa_get_vq_affinity,
> .reset = vduse_vdpa_reset,
> .set_map = vduse_vdpa_set_map,
> .free = vduse_vdpa_free,

2023-03-28 06:25:44

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 07/11] vduse: Add sysfs interface for irq callback affinity


在 2023/3/23 13:30, Xie Yongji 写道:
> Add sysfs interface for each vduse virtqueue to
> get/set the affinity for irq callback. This might
> be useful for performance tuning when the irq callback
> affinity mask contains more than one CPU.
>
> Signed-off-by: Xie Yongji <[email protected]>


Acked-by: Jason Wang <[email protected]>

Thanks


> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 124 ++++++++++++++++++++++++++---
> 1 file changed, 113 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index cefabd0dab9c..77da3685568a 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -61,6 +61,7 @@ struct vduse_virtqueue {
> struct work_struct kick;
> int irq_effective_cpu;
> struct cpumask irq_affinity;
> + struct kobject kobj;
> };
>
> struct vduse_dev;
> @@ -1387,6 +1388,96 @@ static const struct file_operations vduse_dev_fops = {
> .llseek = noop_llseek,
> };
>
> +static ssize_t irq_cb_affinity_show(struct vduse_virtqueue *vq, char *buf)
> +{
> + return sprintf(buf, "%*pb\n", cpumask_pr_args(&vq->irq_affinity));
> +}
> +
> +static ssize_t irq_cb_affinity_store(struct vduse_virtqueue *vq,
> + const char *buf, size_t count)
> +{
> + cpumask_var_t new_value;
> + int ret;
> +
> + if (!zalloc_cpumask_var(&new_value, GFP_KERNEL))
> + return -ENOMEM;
> +
> + ret = cpumask_parse(buf, new_value);
> + if (ret)
> + goto free_mask;
> +
> + ret = -EINVAL;
> + if (!cpumask_intersects(new_value, cpu_online_mask))
> + goto free_mask;
> +
> + cpumask_copy(&vq->irq_affinity, new_value);
> + ret = count;
> +free_mask:
> + free_cpumask_var(new_value);
> + return ret;
> +}
> +
> +struct vq_sysfs_entry {
> + struct attribute attr;
> + ssize_t (*show)(struct vduse_virtqueue *vq, char *buf);
> + ssize_t (*store)(struct vduse_virtqueue *vq, const char *buf,
> + size_t count);
> +};
> +
> +static struct vq_sysfs_entry irq_cb_affinity_attr = __ATTR_RW(irq_cb_affinity);
> +
> +static struct attribute *vq_attrs[] = {
> + &irq_cb_affinity_attr.attr,
> + NULL,
> +};
> +ATTRIBUTE_GROUPS(vq);
> +
> +static ssize_t vq_attr_show(struct kobject *kobj, struct attribute *attr,
> + char *buf)
> +{
> + struct vduse_virtqueue *vq = container_of(kobj,
> + struct vduse_virtqueue, kobj);
> + struct vq_sysfs_entry *entry = container_of(attr,
> + struct vq_sysfs_entry, attr);
> +
> + if (!entry->show)
> + return -EIO;
> +
> + return entry->show(vq, buf);
> +}
> +
> +static ssize_t vq_attr_store(struct kobject *kobj, struct attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct vduse_virtqueue *vq = container_of(kobj,
> + struct vduse_virtqueue, kobj);
> + struct vq_sysfs_entry *entry = container_of(attr,
> + struct vq_sysfs_entry, attr);
> +
> + if (!entry->store)
> + return -EIO;
> +
> + return entry->store(vq, buf, count);
> +}
> +
> +static const struct sysfs_ops vq_sysfs_ops = {
> + .show = vq_attr_show,
> + .store = vq_attr_store,
> +};
> +
> +static void vq_release(struct kobject *kobj)
> +{
> + struct vduse_virtqueue *vq = container_of(kobj,
> + struct vduse_virtqueue, kobj);
> + kfree(vq);
> +}
> +
> +static const struct kobj_type vq_type = {
> + .release = vq_release,
> + .sysfs_ops = &vq_sysfs_ops,
> + .default_groups = vq_groups,
> +};
> +
> static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
> {
> int i;
> @@ -1395,13 +1486,13 @@ static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
> return;
>
> for (i = 0; i < dev->vq_num; i++)
> - kfree(dev->vqs[i]);
> + kobject_put(&dev->vqs[i]->kobj);
> kfree(dev->vqs);
> }
>
> static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
> {
> - int i;
> + int ret, i;
>
> dev->vq_align = vq_align;
> dev->vq_num = vq_num;
> @@ -1411,8 +1502,10 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
>
> for (i = 0; i < vq_num; i++) {
> dev->vqs[i] = kzalloc(sizeof(*dev->vqs[i]), GFP_KERNEL);
> - if (!dev->vqs[i])
> + if (!dev->vqs[i]) {
> + ret = -ENOMEM;
> goto err;
> + }
>
> dev->vqs[i]->index = i;
> dev->vqs[i]->irq_effective_cpu = IRQ_UNBOUND;
> @@ -1421,15 +1514,23 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)
> spin_lock_init(&dev->vqs[i]->kick_lock);
> spin_lock_init(&dev->vqs[i]->irq_lock);
> cpumask_setall(&dev->vqs[i]->irq_affinity);
> +
> + kobject_init(&dev->vqs[i]->kobj, &vq_type);
> + ret = kobject_add(&dev->vqs[i]->kobj,
> + &dev->dev->kobj, "vq%d", i);
> + if (ret) {
> + kfree(dev->vqs[i]);
> + goto err;
> + }
> }
>
> return 0;
> err:
> while (i--)
> - kfree(dev->vqs[i]);
> + kobject_put(&dev->vqs[i]->kobj);
> kfree(dev->vqs);
> dev->vqs = NULL;
> - return -ENOMEM;
> + return ret;
> }
>
> static struct vduse_dev *vduse_dev_create(void)
> @@ -1607,10 +1708,6 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> dev->config = config_buf;
> dev->config_size = config->config_size;
>
> - ret = vduse_dev_init_vqs(dev, config->vq_align, config->vq_num);
> - if (ret)
> - goto err_vqs;
> -
> ret = idr_alloc(&vduse_idr, dev, 1, VDUSE_DEV_MAX, GFP_KERNEL);
> if (ret < 0)
> goto err_idr;
> @@ -1624,14 +1721,19 @@ static int vduse_create_dev(struct vduse_dev_config *config,
> ret = PTR_ERR(dev->dev);
> goto err_dev;
> }
> +
> + ret = vduse_dev_init_vqs(dev, config->vq_align, config->vq_num);
> + if (ret)
> + goto err_vqs;
> +
> __module_get(THIS_MODULE);
>
> return 0;
> +err_vqs:
> + device_destroy(vduse_class, MKDEV(MAJOR(vduse_major), dev->minor));
> err_dev:
> idr_remove(&vduse_idr, dev->minor);
> err_idr:
> - vduse_dev_deinit_vqs(dev);
> -err_vqs:
> vduse_domain_destroy(dev->domain);
> err_domain:
> kfree(dev->name);

2023-03-28 06:30:02

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 08/11] vdpa: Add eventfd for the vdpa callback


在 2023/3/23 13:30, Xie Yongji 写道:
> Add eventfd for the vdpa callback so that user
> can signal it directly instead of triggering the
> callback. It will be used for vhost-vdpa case.
>
> Signed-off-by: Xie Yongji <[email protected]>


Acked-by: Jason Wang <[email protected]>

Thanks


> ---
> drivers/vhost/vdpa.c | 2 ++
> drivers/virtio/virtio_vdpa.c | 1 +
> include/linux/vdpa.h | 6 ++++++
> 3 files changed, 9 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 7be9d9d8f01c..9cd878e25cff 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -599,9 +599,11 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> if (vq->call_ctx.ctx) {
> cb.callback = vhost_vdpa_virtqueue_cb;
> cb.private = vq;
> + cb.trigger = vq->call_ctx.ctx;
> } else {
> cb.callback = NULL;
> cb.private = NULL;
> + cb.trigger = NULL;
> }
> ops->set_vq_cb(vdpa, idx, &cb);
> vhost_vdpa_setup_vq_irq(v, idx);
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index f3826f42b704..2a095f37f26b 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -196,6 +196,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
> /* Setup virtqueue callback */
> cb.callback = callback ? virtio_vdpa_virtqueue_cb : NULL;
> cb.private = info;
> + cb.trigger = NULL;
> ops->set_vq_cb(vdpa, index, &cb);
> ops->set_vq_num(vdpa, index, virtqueue_get_vring_size(vq));
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index e52c9a37999c..0372b2e3d38a 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -13,10 +13,16 @@
> * struct vdpa_calllback - vDPA callback definition.
> * @callback: interrupt callback function
> * @private: the data passed to the callback function
> + * @trigger: the eventfd for the callback (Optional).
> + * When it is set, the vDPA driver must guarantee that
> + * signaling it is functional equivalent to triggering
> + * the callback. Then vDPA parent can signal it directly
> + * instead of triggering the callback.
> */
> struct vdpa_callback {
> irqreturn_t (*callback)(void *data);
> void *private;
> + struct eventfd_ctx *trigger;
> };
>
> /**

2023-04-04 18:24:12

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 01/11] lib/group_cpus: Export group_cpus_evenly()

On Thu, Mar 23, 2023 at 01:30:33PM +0800, Xie Yongji wrote:
> Export group_cpus_evenly() so that some modules
> can make use of it to group CPUs evenly according
> to NUMA and CPU locality.
>
> Signed-off-by: Xie Yongji <[email protected]>
> Acked-by: Jason Wang <[email protected]>


Hi Thomas, do you ack merging this through my tree?
Thanks!

> ---
> lib/group_cpus.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index 9c837a35fef7..aa3f6815bb12 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -426,3 +426,4 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
> return masks;
> }
> #endif /* CONFIG_SMP */
> +EXPORT_SYMBOL_GPL(group_cpus_evenly);
> --
> 2.20.1