2020-02-10 09:06:52

by Zha Bin

[permalink] [raw]
Subject: [PATCH v2 0/5] virtio mmio specification enhancement

In cloud native environment, we need a lightweight and secure system. It
should benefit from the speed of containers and the security of VM, which
is classified as secure containers. The traditional solution of cloud VM
is Qemu. In fact we don't need to pay for the legacy devices. Currently,
more userspace VMMs, e.g. Qemu, Firecracker, Cloud Hypervisor and Alibaba
Cloud VMM which is called Dragonball, began to pay attention to a
lightweight solution.

The lightweight VMM is suitable to cloud native infrastructure which is
designed for creating secure sandbox to address the requirements of
multi-tenant. Meanwhile, with faster startup time and lower memory
overhead, it makes possible to launch thousands of microVMs on the same
machine. This VMM minimizes the emulation devices and uses virtio-mmio to
get a more lightweight transport layer. The virtio-mmio devices have less
code than virtio-pci, which can decrease boot time and increase deploy
density by customizing kernel such as setting pci=off. From another point
of view, the minimal device can reduce the attack surface.

We have compared the number of files and the lines of code between
virtio-mmio and virio-pci.

Virtio-PCI Virtio-MMIO
number of files(Linux) 161 1
lines of code(Linux) 78237 538
number of files(Qemu) 24 1
lines of code(Qemu) 8952 421

But the current standard virtio-mmio spec has some limitations which is
only support legacy interrupt and will cause performance penalties.

To address such limitation, we proposed to update virtio-mmio spec with
two new feature bits to support MSI interrupt and enhancing notification
mechanism[1], which can achieve the same performance as virtio-pci devices
with only around 600 lines of code.

Here are the performance gain of MSI interrupt in virtio-mmio. Each case is
repeated three times.

netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024

Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
trans/s 9536 6939 9500
trans/s 9734 7029 9749
trans/s 9894 7095 9318

With the virtio spec proposal[1], other VMMs (e.g. Qemu) can also make use
of the new features to get a enhancing performance.

[1] https://lkml.org/lkml/2020/1/21/31

Change Log:
v1->v2
* Change version update to feature bit
* Add mask/unmask support
* Add two MSI sharing/non-sharing modes
* Create generic irq domain for all architectures

Liu Jiang (5):
virtio-mmio: add notify feature for per-queue
virtio-mmio: refactor common functionality
virtio-mmio: create a generic MSI irq domain
virtio-mmio: add MSI interrupt feature support
x86: virtio-mmio: support virtio-mmio with MSI for x86

arch/x86/kernel/apic/msi.c | 11 +-
drivers/base/platform-msi.c | 4 +-
drivers/virtio/Kconfig | 9 +
drivers/virtio/virtio_mmio.c | 351 ++++++++++++++++++++++++++++++++----
drivers/virtio/virtio_mmio_common.h | 39 ++++
drivers/virtio/virtio_mmio_msi.h | 175 ++++++++++++++++++
include/linux/msi.h | 1 +
include/uapi/linux/virtio_config.h | 13 +-
include/uapi/linux/virtio_mmio.h | 31 ++++
9 files changed, 596 insertions(+), 38 deletions(-)
create mode 100644 drivers/virtio/virtio_mmio_common.h
create mode 100644 drivers/virtio/virtio_mmio_msi.h

--
1.8.3.1


2020-02-10 09:07:26

by Zha Bin

[permalink] [raw]
Subject: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue

From: Liu Jiang <[email protected]>

The standard virtio-mmio devices use notification register to signal
backend. This will cause vmexits and slow down the performance when we
passthrough the virtio-mmio devices to guest virtual machines.
We proposed to update virtio over MMIO spec to add the per-queue
notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
configure notify location for each queue.

[1] https://lkml.org/lkml/2020/1/21/31

Signed-off-by: Liu Jiang <[email protected]>
Co-developed-by: Zha Bin <[email protected]>
Signed-off-by: Zha Bin <[email protected]>
Co-developed-by: Jing Liu <[email protected]>
Signed-off-by: Jing Liu <[email protected]>
Co-developed-by: Chao Peng <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
---
drivers/virtio/virtio_mmio.c | 37 +++++++++++++++++++++++++++++++++++--
include/uapi/linux/virtio_config.h | 8 +++++++-
2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 97d5725..1733ab97 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -90,6 +90,9 @@ struct virtio_mmio_device {
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
+
+ unsigned short notify_base;
+ unsigned short notify_multiplier;
};

struct virtio_mmio_vq_info {
@@ -98,6 +101,9 @@ struct virtio_mmio_vq_info {

/* the list node for the virtqueues list */
struct list_head node;
+
+ /* Notify Address*/
+ unsigned int notify_addr;
};


@@ -119,13 +125,23 @@ static u64 vm_get_features(struct virtio_device *vdev)
return features;
}

+static void vm_transport_features(struct virtio_device *vdev, u64 features)
+{
+ if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
+ __virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
+}
+
static int vm_finalize_features(struct virtio_device *vdev)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+ u64 features = vdev->features;

/* Give virtio_ring a chance to accept features. */
vring_transport_features(vdev);

+ /* Give virtio_mmio a chance to accept features. */
+ vm_transport_features(vdev, features);
+
/* Make sure there is are no mixed devices */
if (vm_dev->version == 2 &&
!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
@@ -272,10 +288,13 @@ static void vm_reset(struct virtio_device *vdev)
static bool vm_notify(struct virtqueue *vq)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
+ struct virtio_mmio_vq_info *info = vq->priv;

- /* We write the queue's selector into the notification register to
+ /* We write the queue's selector into the Notify Address to
* signal the other end */
- writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
+ if (info)
+ writel(vq->index, vm_dev->base + info->notify_addr);
+
return true;
}

@@ -434,6 +453,12 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
vq->priv = info;
info->vq = vq;

+ if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION))
+ info->notify_addr = vm_dev->notify_base +
+ vm_dev->notify_multiplier * vq->index;
+ else
+ info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
+
spin_lock_irqsave(&vm_dev->lock, flags);
list_add(&info->node, &vm_dev->virtqueues);
spin_unlock_irqrestore(&vm_dev->lock, flags);
@@ -471,6 +496,14 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
return irq;
}

+ if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
+ unsigned int notify = readl(vm_dev->base +
+ VIRTIO_MMIO_QUEUE_NOTIFY);
+
+ vm_dev->notify_base = notify & 0xffff;
+ vm_dev->notify_multiplier = (notify >> 16) & 0xffff;
+ }
+
err = request_irq(irq, vm_interrupt, IRQF_SHARED,
dev_name(&vdev->dev), vm_dev);
if (err)
diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index ff8e7dc..5d93c01 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
* rest are per-device feature bits.
*/
#define VIRTIO_TRANSPORT_F_START 28
-#define VIRTIO_TRANSPORT_F_END 38
+#define VIRTIO_TRANSPORT_F_END 40

#ifndef VIRTIO_CONFIG_NO_LEGACY
/* Do we get callbacks when the ring is completely used, even if we've
@@ -88,4 +88,10 @@
* Does the device support Single Root I/O Virtualization?
*/
#define VIRTIO_F_SR_IOV 37
+
+/*
+ * This feature indicates the enhanced notification support on MMIO transport
+ * layer.
+ */
+#define VIRTIO_F_MMIO_NOTIFICATION 39
#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
--
1.8.3.1

2020-02-10 09:07:54

by Zha Bin

[permalink] [raw]
Subject: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support

From: Liu Jiang <[email protected]>

Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
virtio over mmio devices as a lightweight machine model for modern
cloud. The standard virtio over MMIO transport layer only supports one
legacy interrupt, which is much heavier than virtio over PCI transport
layer using MSI. Legacy interrupt has long work path and causes specific
VMExits in following cases, which would considerably slow down the
performance:

1) read interrupt status register
2) update interrupt status register
3) write IOAPIC EOI register

We proposed to add MSI support for virtio over MMIO via new feature
bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.

With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
uses msi_sharing[1] to indicate the event and vector mapping.
Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
Bit 1 is 1: device uses sharing mode and dynamic mapping.

[1] https://lkml.org/lkml/2020/1/21/31

Signed-off-by: Liu Jiang <[email protected]>
Co-developed-by: Zha Bin <[email protected]>
Signed-off-by: Zha Bin <[email protected]>
Co-developed-by: Jing Liu <[email protected]>
Signed-off-by: Jing Liu <[email protected]>
Co-developed-by: Chao Peng <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
---
drivers/virtio/virtio_mmio.c | 299 ++++++++++++++++++++++++++++++++++--
drivers/virtio/virtio_mmio_common.h | 8 +
drivers/virtio/virtio_mmio_msi.h | 82 ++++++++++
include/uapi/linux/virtio_config.h | 7 +-
include/uapi/linux/virtio_mmio.h | 31 ++++
5 files changed, 411 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 41e1c93..b24ce21 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -67,8 +67,7 @@
#include <uapi/linux/virtio_mmio.h>
#include <linux/virtio_ring.h>
#include "virtio_mmio_common.h"
-
-
+#include "virtio_mmio_msi.h"

/* The alignment to use between consumer and producer parts of vring.
* Currently hardcoded to the page size. */
@@ -85,9 +84,13 @@ struct virtio_mmio_vq_info {

/* Notify Address*/
unsigned int notify_addr;
-};

+ /* MSI vector (or none) */
+ unsigned int msi_vector;
+};

+static void vm_free_msi_irqs(struct virtio_device *vdev);
+static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs);

/* Configuration interface */

@@ -110,6 +113,9 @@ static void vm_transport_features(struct virtio_device *vdev, u64 features)
{
if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
__virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
+
+ if (features & BIT_ULL(VIRTIO_F_MMIO_MSI))
+ __virtio_set_bit(vdev, VIRTIO_F_MMIO_MSI);
}

static int vm_finalize_features(struct virtio_device *vdev)
@@ -307,7 +313,33 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
return ret;
}

+static irqreturn_t vm_vring_interrupt(int irq, void *opaque)
+{
+ struct virtio_mmio_device *vm_dev = opaque;
+ struct virtio_mmio_vq_info *info;
+ irqreturn_t ret = IRQ_NONE;
+ unsigned long flags;
+
+ spin_lock_irqsave(&vm_dev->lock, flags);
+ list_for_each_entry(info, &vm_dev->virtqueues, node) {
+ if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+ ret = IRQ_HANDLED;
+ }
+ spin_unlock_irqrestore(&vm_dev->lock, flags);
+
+ return ret;
+}
+
+
+/* Handle a configuration change */
+static irqreturn_t vm_config_changed(int irq, void *opaque)
+{
+ struct virtio_mmio_device *vm_dev = opaque;
+
+ virtio_config_changed(&vm_dev->vdev);

+ return IRQ_HANDLED;
+}

static void vm_del_vq(struct virtqueue *vq)
{
@@ -316,6 +348,15 @@ static void vm_del_vq(struct virtqueue *vq)
unsigned long flags;
unsigned int index = vq->index;

+ if (vm_dev->msi_enabled && !vm_dev->msi_share) {
+ if (info->msi_vector != VIRTIO_MMIO_MSI_NO_VECTOR) {
+ int irq = mmio_msi_irq_vector(&vq->vdev->dev,
+ info->msi_vector);
+
+ free_irq(irq, vq);
+ }
+ }
+
spin_lock_irqsave(&vm_dev->lock, flags);
list_del(&info->node);
spin_unlock_irqrestore(&vm_dev->lock, flags);
@@ -334,20 +375,56 @@ static void vm_del_vq(struct virtqueue *vq)
kfree(info);
}

-static void vm_del_vqs(struct virtio_device *vdev)
+static void vm_free_irqs(struct virtio_device *vdev)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+ if (vm_dev->msi_enabled)
+ vm_free_msi_irqs(vdev);
+ else
+ free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
+}
+
+static void vm_del_vqs(struct virtio_device *vdev)
+{
struct virtqueue *vq, *n;

list_for_each_entry_safe(vq, n, &vdev->vqs, list)
vm_del_vq(vq);

- free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
+ vm_free_irqs(vdev);
+}
+
+static inline void mmio_msi_set_enable(struct virtio_mmio_device *vm_dev,
+ int enable)
+{
+ u32 state;
+
+ state = readl(vm_dev->base + VIRTIO_MMIO_MSI_STATE);
+ if (enable && (state & VIRTIO_MMIO_MSI_ENABLED_MASK))
+ return;
+
+ writel(VIRTIO_MMIO_MSI_CMD_ENABLE,
+ vm_dev->base + VIRTIO_MMIO_MSI_COMMAND);
+}
+
+static void mmio_msi_config_vector(struct virtio_mmio_device *vm_dev, u32 vec)
+{
+ writel(vec, vm_dev->base + VIRTIO_MMIO_MSI_VEC_SEL);
+ writel(VIRTIO_MMIO_MSI_CMD_MAP_CONFIG, vm_dev->base +
+ VIRTIO_MMIO_MSI_COMMAND);
+}
+
+static void mmio_msi_queue_vector(struct virtio_mmio_device *vm_dev, u32 vec)
+{
+ writel(vec, vm_dev->base + VIRTIO_MMIO_MSI_VEC_SEL);
+ writel(VIRTIO_MMIO_MSI_CMD_MAP_QUEUE, vm_dev->base +
+ VIRTIO_MMIO_MSI_COMMAND);
}

static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
void (*callback)(struct virtqueue *vq),
- const char *name, bool ctx)
+ const char *name, bool ctx, u32 msi_vector)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
struct virtio_mmio_vq_info *info;
@@ -440,6 +517,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
else
info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;

+ info->msi_vector = msi_vector;
+ /* Set queue event and vector mapping for MSI share mode. */
+ if (vm_dev->msi_share && msi_vector != VIRTIO_MMIO_MSI_NO_VECTOR)
+ mmio_msi_queue_vector(vm_dev, msi_vector);
+
spin_lock_irqsave(&vm_dev->lock, flags);
list_add(&info->node, &vm_dev->virtqueues);
spin_unlock_irqrestore(&vm_dev->lock, flags);
@@ -461,12 +543,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
return ERR_PTR(err);
}

-static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
- struct virtqueue *vqs[],
- vq_callback_t *callbacks[],
- const char * const names[],
- const bool *ctx,
- struct irq_affinity *desc)
+static int vm_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
+ struct virtqueue *vqs[],
+ vq_callback_t *callbacks[],
+ const char * const names[],
+ const bool *ctx)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
int irq = platform_get_irq(vm_dev->pdev, 0);
@@ -487,8 +568,6 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,

err = request_irq(irq, vm_interrupt, IRQF_SHARED,
dev_name(&vdev->dev), vm_dev);
- if (err)
- return err;

for (i = 0; i < nvqs; ++i) {
if (!names[i]) {
@@ -497,14 +576,202 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
}

vqs[i] = vm_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
- ctx ? ctx[i] : false);
+ ctx ? ctx[i] : false,
+ VIRTIO_MMIO_MSI_NO_VECTOR);
if (IS_ERR(vqs[i])) {
vm_del_vqs(vdev);
return PTR_ERR(vqs[i]);
}
}

+ return err;
+}
+
+static int vm_find_vqs_msi(struct virtio_device *vdev, unsigned int nvqs,
+ struct virtqueue *vqs[], vq_callback_t *callbacks[],
+ const char * const names[], bool per_vq_vectors,
+ const bool *ctx, struct irq_affinity *desc)
+{
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+ int i, err, allocated_vectors, nvectors;
+ u32 msi_vec;
+ u32 max_vec_num = readl(vm_dev->base + VIRTIO_MMIO_MSI_VEC_NUM);
+
+ /* For MSI non-sharing, the max vector number MUST greater than nvqs.
+ * Otherwise, go back to legacy interrupt.
+ */
+ if (per_vq_vectors && max_vec_num < (nvqs + 1))
+ return -EINVAL;
+
+ if (per_vq_vectors) {
+ nvectors = 1;
+ for (i = 0; i < nvqs; ++i)
+ if (callbacks[i])
+ ++nvectors;
+ } else {
+ nvectors = 2;
+ }
+
+ vm_dev->msi_share = !per_vq_vectors;
+
+ /* Allocate nvqs irqs for queues and one irq for configuration */
+ err = vm_request_msi_vectors(vdev, nvectors);
+ if (err != 0)
+ return err;
+
+ allocated_vectors = vm_dev->msi_used_vectors;
+ for (i = 0; i < nvqs; i++) {
+ if (!names[i]) {
+ vqs[i] = NULL;
+ continue;
+ }
+ if (!callbacks[i])
+ msi_vec = VIRTIO_MMIO_MSI_NO_VECTOR;
+ else if (per_vq_vectors)
+ msi_vec = allocated_vectors++;
+ else
+ msi_vec = 1;
+ vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
+ ctx ? ctx[i] : false, msi_vec);
+ if (IS_ERR(vqs[i])) {
+ err = PTR_ERR(vqs[i]);
+ goto error_find;
+ }
+
+ if (!per_vq_vectors ||
+ msi_vec == VIRTIO_MMIO_MSI_NO_VECTOR)
+ continue;
+
+ /* allocate per-vq irq if available and necessary */
+ snprintf(vm_dev->vm_vq_names[msi_vec],
+ sizeof(*vm_dev->vm_vq_names),
+ "%s-%s",
+ dev_name(&vm_dev->vdev.dev), names[i]);
+ err = request_irq(mmio_msi_irq_vector(&vqs[i]->vdev->dev,
+ msi_vec),
+ vring_interrupt, 0,
+ vm_dev->vm_vq_names[msi_vec], vqs[i]);
+
+ if (err)
+ goto error_find;
+ }
+
return 0;
+
+error_find:
+ vm_del_vqs(vdev);
+ return err;
+}
+
+static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
+ struct virtqueue *vqs[],
+ vq_callback_t *callbacks[],
+ const char * const names[],
+ const bool *ctx,
+ struct irq_affinity *desc)
+{
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+ int err;
+
+ if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
+ unsigned int notify = readl(vm_dev->base +
+ VIRTIO_MMIO_QUEUE_NOTIFY);
+
+ vm_dev->notify_base = notify & 0xffff;
+ vm_dev->notify_multiplier = (notify >> 16) & 0xffff;
+ }
+
+ if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_MSI)) {
+ bool dyn_mapping = !!(readl(vm_dev->base +
+ VIRTIO_MMIO_MSI_STATE) &
+ VIRTIO_MMIO_MSI_SHARING_MASK);
+ if (!dyn_mapping)
+ err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
+ names, true, ctx, desc);
+ else
+ err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
+ names, false, ctx, desc);
+ if (!err)
+ return 0;
+ }
+
+ return vm_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
+}
+
+static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs)
+{
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+ unsigned int v;
+ int irq, err;
+
+ if (vm_dev->msi_enabled)
+ return -EINVAL;
+
+ vm_dev->vm_vq_names = kmalloc_array(nirqs, sizeof(*vm_dev->vm_vq_names),
+ GFP_KERNEL);
+ if (!vm_dev->vm_vq_names)
+ return -ENOMEM;
+
+ mmio_get_msi_domain(vdev);
+ err = mmio_msi_domain_alloc_irqs(&vdev->dev, nirqs);
+ if (err) {
+ kfree(vm_dev->vm_vq_names);
+ vm_dev->vm_vq_names = NULL;
+ return err;
+ }
+
+ mmio_msi_set_enable(vm_dev, 1);
+ vm_dev->msi_enabled = true;
+
+ v = vm_dev->msi_used_vectors;
+ /* The first MSI vector is used for configuration change event. */
+ snprintf(vm_dev->vm_vq_names[v], sizeof(*vm_dev->vm_vq_names),
+ "%s-config", dev_name(&vdev->dev));
+ irq = mmio_msi_irq_vector(&vdev->dev, v);
+ err = request_irq(irq, vm_config_changed, 0, vm_dev->vm_vq_names[v],
+ vm_dev);
+ if (err)
+ goto error_request_irq;
+
+ /* Set the configuration event mapping. */
+ if (vm_dev->msi_share)
+ mmio_msi_config_vector(vm_dev, v);
+
+ ++vm_dev->msi_used_vectors;
+
+ if (vm_dev->msi_share) {
+ v = vm_dev->msi_used_vectors;
+ snprintf(vm_dev->vm_vq_names[v], sizeof(*vm_dev->vm_vq_names),
+ "%s-virtqueues", dev_name(&vm_dev->vdev.dev));
+ err = request_irq(mmio_msi_irq_vector(&vdev->dev, v),
+ vm_vring_interrupt, 0, vm_dev->vm_vq_names[v],
+ vm_dev);
+ if (err)
+ goto error_request_irq;
+ ++vm_dev->msi_used_vectors;
+ }
+
+ return 0;
+
+error_request_irq:
+ vm_free_msi_irqs(vdev);
+
+ return err;
+}
+
+static void vm_free_msi_irqs(struct virtio_device *vdev)
+{
+ int i;
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+ mmio_msi_set_enable(vm_dev, 0);
+ for (i = 0; i < vm_dev->msi_used_vectors; i++)
+ free_irq(mmio_msi_irq_vector(&vdev->dev, i), vm_dev);
+ mmio_msi_domain_free_irqs(&vdev->dev);
+ kfree(vm_dev->vm_vq_names);
+ vm_dev->vm_vq_names = NULL;
+ vm_dev->msi_enabled = false;
+ vm_dev->msi_used_vectors = 0;
}

static const char *vm_bus_name(struct virtio_device *vdev)
@@ -609,6 +876,8 @@ static int virtio_mmio_probe(struct platform_device *pdev)

platform_set_drvdata(pdev, vm_dev);

+ mmio_msi_create_irq_domain();
+
rc = register_virtio_device(&vm_dev->vdev);
if (rc)
put_device(&vm_dev->vdev.dev);
diff --git a/drivers/virtio/virtio_mmio_common.h b/drivers/virtio/virtio_mmio_common.h
index 90cb304..ccf6320 100644
--- a/drivers/virtio/virtio_mmio_common.h
+++ b/drivers/virtio/virtio_mmio_common.h
@@ -26,6 +26,14 @@ struct virtio_mmio_device {

unsigned short notify_base;
unsigned short notify_multiplier;
+
+ /* Name strings for interrupts. This size should be enough. */
+ char (*vm_vq_names)[256];
+
+ /* used vectors */
+ unsigned int msi_used_vectors;
+ bool msi_share;
+ bool msi_enabled;
};

#endif
diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
index 27cb2af..10038cb 100644
--- a/drivers/virtio/virtio_mmio_msi.h
+++ b/drivers/virtio/virtio_mmio_msi.h
@@ -8,6 +8,7 @@
#include <linux/irq.h>
#include <linux/irqdomain.h>
#include <linux/platform_device.h>
+#include "virtio_mmio_common.h"

static irq_hw_number_t mmio_msi_hwirq;
static struct irq_domain *mmio_msi_domain;
@@ -21,12 +22,41 @@ void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
{
}

+static void __iomem *vm_dev_base(struct msi_desc *desc)
+{
+ if (desc) {
+ struct device *dev = desc->dev;
+ struct virtio_device *vdev = dev_to_virtio(dev);
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+ return vm_dev->base;
+ }
+
+ return NULL;
+}
+
+static void mmio_msi_set_mask_bit(struct irq_data *data, u32 flag)
+{
+ struct msi_desc *desc = irq_data_get_msi_desc(data);
+ void __iomem *base = vm_dev_base(desc);
+ unsigned int offset = data->irq - desc->irq;
+
+ if (base) {
+ u32 op = flag ? VIRTIO_MMIO_MSI_CMD_MASK :
+ VIRTIO_MMIO_MSI_CMD_UNMASK;
+ writel(offset, base + VIRTIO_MMIO_MSI_VEC_SEL);
+ writel(op, base + VIRTIO_MMIO_MSI_COMMAND);
+ }
+}
+
static void mmio_msi_mask_irq(struct irq_data *data)
{
+ mmio_msi_set_mask_bit(data, 1);
}

static void mmio_msi_unmask_irq(struct irq_data *data)
{
+ mmio_msi_set_mask_bit(data, 0);
}

static struct irq_chip mmio_msi_controller = {
@@ -86,8 +116,60 @@ static inline void mmio_msi_create_irq_domain(void)
irq_domain_free_fwnode(fn);
}
}
+
+static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
+{
+ void __iomem *base = vm_dev_base(desc);
+
+ if (base) {
+ writel(desc->platform.msi_index, base +
+ VIRTIO_MMIO_MSI_VEC_SEL);
+ writel(msg->address_lo, base + VIRTIO_MMIO_MSI_ADDRESS_LOW);
+ writel(msg->address_hi, base + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
+ writel(msg->data, base + VIRTIO_MMIO_MSI_DATA);
+ writel(VIRTIO_MMIO_MSI_CMD_CONFIGURE,
+ base + VIRTIO_MMIO_MSI_COMMAND);
+ }
+}
+
+static inline int mmio_msi_domain_alloc_irqs(struct device *dev,
+ unsigned int nvec)
+{
+ return platform_msi_domain_alloc_irqs(dev, nvec,
+ mmio_write_msi_msg);
+}
+
+static inline void mmio_msi_domain_free_irqs(struct device *dev)
+{
+ return platform_msi_domain_free_irqs(dev);
+}
+
+static inline void mmio_get_msi_domain(struct virtio_device *vdev)
+{
+ if (!vdev->dev.msi_domain)
+ vdev->dev.msi_domain = mmio_msi_domain;
+}
+
+static inline int mmio_msi_irq_vector(struct device *dev, unsigned int nr)
+{
+ struct msi_desc *entry = first_msi_entry(dev);
+
+ return entry->irq + nr;
+}
+
#else
static inline void mmio_msi_create_irq_domain(void) {}
+static inline int mmio_msi_irq_vector(struct device *dev, unsigned int nr)
+{
+ return -EINVAL;
+}
+static inline void mmio_get_msi_domain(struct virtio_device *vdev) {}
+static inline int mmio_msi_domain_alloc_irqs(struct device *dev,
+ unsigned int nvec)
+{
+ return -EINVAL;
+}
+static inline void mmio_msi_domain_free_irqs(struct device *dev) {}
#endif

#endif
diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 5d93c01..5eb3900 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
* rest are per-device feature bits.
*/
#define VIRTIO_TRANSPORT_F_START 28
-#define VIRTIO_TRANSPORT_F_END 40
+#define VIRTIO_TRANSPORT_F_END 41

#ifndef VIRTIO_CONFIG_NO_LEGACY
/* Do we get callbacks when the ring is completely used, even if we've
@@ -94,4 +94,9 @@
* layer.
*/
#define VIRTIO_F_MMIO_NOTIFICATION 39
+
+/*
+ * This feature indicates the MSI support on MMIO layer.
+ */
+#define VIRTIO_F_MMIO_MSI 40
#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_mmio.h b/include/uapi/linux/virtio_mmio.h
index c4b0968..777cb0e 100644
--- a/include/uapi/linux/virtio_mmio.h
+++ b/include/uapi/linux/virtio_mmio.h
@@ -122,6 +122,21 @@
#define VIRTIO_MMIO_QUEUE_USED_LOW 0x0a0
#define VIRTIO_MMIO_QUEUE_USED_HIGH 0x0a4

+/* MSI max vector number that device supports - Read Only */
+#define VIRTIO_MMIO_MSI_VEC_NUM 0x0c0
+/* MSI state register - Read Only */
+#define VIRTIO_MMIO_MSI_STATE 0x0c4
+/* MSI command register - Write Only */
+#define VIRTIO_MMIO_MSI_COMMAND 0x0c8
+/* MSI vector selector - Write Only */
+#define VIRTIO_MMIO_MSI_VEC_SEL 0x0d0
+/* MSI low 32 bit address, 64 bits in two halves */
+#define VIRTIO_MMIO_MSI_ADDRESS_LOW 0x0d4
+/* MSI high 32 bit address, 64 bits in two halves */
+#define VIRTIO_MMIO_MSI_ADDRESS_HIGH 0x0d8
+/* MSI 32 bit data */
+#define VIRTIO_MMIO_MSI_DATA 0x0dc
+
/* Configuration atomicity value */
#define VIRTIO_MMIO_CONFIG_GENERATION 0x0fc

@@ -130,6 +145,22 @@
#define VIRTIO_MMIO_CONFIG 0x100


+/* MSI commands */
+#define VIRTIO_MMIO_MSI_CMD_ENABLE 0x1
+#define VIRTIO_MMIO_MSI_CMD_DISABLE 0x2
+#define VIRTIO_MMIO_MSI_CMD_CONFIGURE 0x3
+#define VIRTIO_MMIO_MSI_CMD_MASK 0x4
+#define VIRTIO_MMIO_MSI_CMD_UNMASK 0x5
+#define VIRTIO_MMIO_MSI_CMD_MAP_CONFIG 0x6
+#define VIRTIO_MMIO_MSI_CMD_MAP_QUEUE 0x7
+
+/* MSI NO_VECTOR */
+#define VIRTIO_MMIO_MSI_NO_VECTOR 0xffffffff
+
+/* MSI state enabled state mask */
+#define VIRTIO_MMIO_MSI_ENABLED_MASK (1 << 31)
+/* MSI state MSI sharing mask */
+#define VIRTIO_MMIO_MSI_SHARING_MASK (1 << 30)

/*
* Interrupt flags (re: interrupt status & acknowledge registers)
--
1.8.3.1

2020-02-10 09:08:21

by Zha Bin

[permalink] [raw]
Subject: [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain

From: Liu Jiang <[email protected]>

Create a generic irq domain for all architectures which
supports virtio-mmio. The device offering VIRTIO_F_MMIO_MSI
feature bit can use this irq domain.

Signed-off-by: Liu Jiang <[email protected]>
Co-developed-by: Zha Bin <[email protected]>
Signed-off-by: Zha Bin <[email protected]>
Co-developed-by: Jing Liu <[email protected]>
Signed-off-by: Jing Liu <[email protected]>
Co-developed-by: Chao Peng <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
---
drivers/base/platform-msi.c | 4 +-
drivers/virtio/Kconfig | 9 ++++
drivers/virtio/virtio_mmio_msi.h | 93 ++++++++++++++++++++++++++++++++++++++++
include/linux/msi.h | 1 +
4 files changed, 105 insertions(+), 2 deletions(-)
create mode 100644 drivers/virtio/virtio_mmio_msi.h

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 8da314b..45752f1 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -31,12 +31,11 @@ struct platform_msi_priv_data {
/* The devid allocator */
static DEFINE_IDA(platform_msi_devid_ida);

-#ifdef GENERIC_MSI_DOMAIN_OPS
/*
* Convert an msi_desc to a globaly unique identifier (per-device
* devid + msi_desc position in the msi_list).
*/
-static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
+irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
{
u32 devid;

@@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
}

+#ifdef GENERIC_MSI_DOMAIN_OPS
static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
{
arg->desc = desc;
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 078615c..551a9f7 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -84,6 +84,15 @@ config VIRTIO_MMIO

If unsure, say N.

+config VIRTIO_MMIO_MSI
+ bool "Memory-mapped virtio device MSI"
+ depends on VIRTIO_MMIO && GENERIC_MSI_IRQ_DOMAIN && GENERIC_MSI_IRQ
+ help
+ This allows device drivers to support msi interrupt handling for
+ virtio-mmio devices. It can improve performance greatly.
+
+ If unsure, say N.
+
config VIRTIO_MMIO_CMDLINE_DEVICES
bool "Memory mapped virtio devices parameter parsing"
depends on VIRTIO_MMIO
diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
new file mode 100644
index 0000000..27cb2af
--- /dev/null
+++ b/drivers/virtio/virtio_mmio_msi.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
+#define _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
+
+#ifdef CONFIG_VIRTIO_MMIO_MSI
+
+#include <linux/msi.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
+#include <linux/platform_device.h>
+
+static irq_hw_number_t mmio_msi_hwirq;
+static struct irq_domain *mmio_msi_domain;
+
+struct irq_domain *__weak arch_msi_root_irq_domain(void)
+{
+ return NULL;
+}
+
+void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+}
+
+static void mmio_msi_mask_irq(struct irq_data *data)
+{
+}
+
+static void mmio_msi_unmask_irq(struct irq_data *data)
+{
+}
+
+static struct irq_chip mmio_msi_controller = {
+ .name = "VIRTIO-MMIO-MSI",
+ .irq_mask = mmio_msi_mask_irq,
+ .irq_unmask = mmio_msi_unmask_irq,
+ .irq_ack = irq_chip_ack_parent,
+ .irq_retrigger = irq_chip_retrigger_hierarchy,
+ .irq_compose_msi_msg = irq_msi_compose_msg,
+ .flags = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static int mmio_msi_prepare(struct irq_domain *domain, struct device *dev,
+ int nvec, msi_alloc_info_t *arg)
+{
+ memset(arg, 0, sizeof(*arg));
+ return 0;
+}
+
+static void mmio_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+ mmio_msi_hwirq = platform_msi_calc_hwirq(desc);
+}
+
+static irq_hw_number_t mmio_msi_get_hwirq(struct msi_domain_info *info,
+ msi_alloc_info_t *arg)
+{
+ return mmio_msi_hwirq;
+}
+
+static struct msi_domain_ops mmio_msi_domain_ops = {
+ .msi_prepare = mmio_msi_prepare,
+ .set_desc = mmio_msi_set_desc,
+ .get_hwirq = mmio_msi_get_hwirq,
+};
+
+static struct msi_domain_info mmio_msi_domain_info = {
+ .flags = MSI_FLAG_USE_DEF_DOM_OPS |
+ MSI_FLAG_USE_DEF_CHIP_OPS |
+ MSI_FLAG_ACTIVATE_EARLY,
+ .ops = &mmio_msi_domain_ops,
+ .chip = &mmio_msi_controller,
+ .handler = handle_edge_irq,
+ .handler_name = "edge",
+};
+
+static inline void mmio_msi_create_irq_domain(void)
+{
+ struct fwnode_handle *fn;
+ struct irq_domain *parent = arch_msi_root_irq_domain();
+
+ fn = irq_domain_alloc_named_fwnode("VIRTIO-MMIO-MSI");
+ if (fn && parent) {
+ mmio_msi_domain =
+ platform_msi_create_irq_domain(fn,
+ &mmio_msi_domain_info, parent);
+ irq_domain_free_fwnode(fn);
+ }
+}
+#else
+static inline void mmio_msi_create_irq_domain(void) {}
+#endif
+
+#endif
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8ad679e..ee5f566 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -362,6 +362,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
unsigned int nvec);
void *platform_msi_get_host_data(struct irq_domain *domain);
+irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc);
#endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */

#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
--
1.8.3.1

2020-02-10 11:45:29

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] virtio mmio specification enhancement

On Mon, Feb 10, 2020 at 05:05:16PM +0800, Zha Bin wrote:
> We have compared the number of files and the lines of code between
> virtio-mmio and virio-pci.
>
> Virtio-PCI Virtio-MMIO
> number of files(Linux) 161 1
> lines of code(Linux) 78237 538



Something's very wrong here. virtio PCI is 161 files?
Are you counting the whole PCI subsystem?
Sure enough:

$ find drivers/pci -name '*c' |wc -l
150

That's not reasonable, this includes a bunch of drivers that
never run on a typical hypervisor.

MMIO is also not as small as you are trying to show:

$ cloc drivers/virtio/virtio_mmio.c include/uapi/linux/virtio_mmio.h
2 text files.
2 unique files.
0 files ignored.

github.com/AlDanial/cloc v 1.82 T=0.01 s (230.7 files/s, 106126.5 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C 1 144 100 535
C/C++ Header 1 39 66 36
-------------------------------------------------------------------------------
SUM: 2 183 166 571
-------------------------------------------------------------------------------


I don't doubt MMIO is smaller than PCI. Of course that's because it has
no features to speak of - just this patch already doubles it's size. If
we keep doing that because we want the features then they will reach
the same size in about 4 iterations.


--
MST

2020-02-11 03:22:34

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support


On 2020/2/10 下午5:05, Zha Bin wrote:
> From: Liu Jiang<[email protected]>
>
> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> virtio over mmio devices as a lightweight machine model for modern
> cloud. The standard virtio over MMIO transport layer only supports one
> legacy interrupt, which is much heavier than virtio over PCI transport
> layer using MSI. Legacy interrupt has long work path and causes specific
> VMExits in following cases, which would considerably slow down the
> performance:
>
> 1) read interrupt status register
> 2) update interrupt status register
> 3) write IOAPIC EOI register
>
> We proposed to add MSI support for virtio over MMIO via new feature
> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>
> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> uses msi_sharing[1] to indicate the event and vector mapping.
> Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
> Bit 1 is 1: device uses sharing mode and dynamic mapping.


I believe dynamic mapping should cover the case of fixed vector?

Thanks


2020-02-11 03:45:09

by Liu, Jing2

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support


On 2/11/2020 11:17 AM, Jason Wang wrote:
>
> On 2020/2/10 下午5:05, Zha Bin wrote:
>> From: Liu Jiang<[email protected]>
>>
>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>> virtio over mmio devices as a lightweight machine model for modern
>> cloud. The standard virtio over MMIO transport layer only supports one
>> legacy interrupt, which is much heavier than virtio over PCI transport
>> layer using MSI. Legacy interrupt has long work path and causes specific
>> VMExits in following cases, which would considerably slow down the
>> performance:
>>
>> 1) read interrupt status register
>> 2) update interrupt status register
>> 3) write IOAPIC EOI register
>>
>> We proposed to add MSI support for virtio over MMIO via new feature
>> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>>
>> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
>> uses msi_sharing[1] to indicate the event and vector mapping.
>> Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
>> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>
>
> I believe dynamic mapping should cover the case of fixed vector?
>
Actually this bit *aims* for msi sharing or msi non-sharing.

It means, when msi sharing bit is 1, device doesn't want vector per queue

(it wants msi vector sharing as name) and doesn't want a high interrupt
rate.

So driver turns to !per_vq_vectors and has to do dynamical mapping.

So they are opposite not superset.

Thanks!

Jing


> Thanks
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

2020-02-11 05:45:54

by Jason Wang

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support


On 2020/2/11 上午11:35, Liu, Jing2 wrote:
>
> On 2/11/2020 11:17 AM, Jason Wang wrote:
>>
>> On 2020/2/10 下午5:05, Zha Bin wrote:
>>> From: Liu Jiang<[email protected]>
>>>
>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>> virtio over mmio devices as a lightweight machine model for modern
>>> cloud. The standard virtio over MMIO transport layer only supports one
>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>> layer using MSI. Legacy interrupt has long work path and causes
>>> specific
>>> VMExits in following cases, which would considerably slow down the
>>> performance:
>>>
>>> 1) read interrupt status register
>>> 2) update interrupt status register
>>> 3) write IOAPIC EOI register
>>>
>>> We proposed to add MSI support for virtio over MMIO via new feature
>>> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>>>
>>> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
>>> uses msi_sharing[1] to indicate the event and vector mapping.
>>> Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
>>> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>>
>>
>> I believe dynamic mapping should cover the case of fixed vector?
>>
> Actually this bit *aims* for msi sharing or msi non-sharing.
>
> It means, when msi sharing bit is 1, device doesn't want vector per queue
>
> (it wants msi vector sharing as name) and doesn't want a high
> interrupt rate.
>
> So driver turns to !per_vq_vectors and has to do dynamical mapping.
>
> So they are opposite not superset.
>
> Thanks!
>
> Jing


I think you need add more comments on the command.

E.g if I want to map vector 0 to queue 1, how do I need to do?

write(1, queue_sel);
write(0, vector_sel);

?

Thanks


>
>
>> Thanks
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>

2020-02-11 09:23:57

by Chao Peng

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] virtio mmio specification enhancement

On Mon, Feb 10, 2020 at 06:44:50AM -0500, Michael S. Tsirkin wrote:
> On Mon, Feb 10, 2020 at 05:05:16PM +0800, Zha Bin wrote:
> > We have compared the number of files and the lines of code between
> > virtio-mmio and virio-pci.
> >
> > Virtio-PCI Virtio-MMIO
> > number of files(Linux) 161 1
> > lines of code(Linux) 78237 538
>
>
>
> Something's very wrong here. virtio PCI is 161 files?
> Are you counting the whole PCI subsystem?

Right, that is just a rough statistics. Surely enough, some drivers will
never get enabled in a typcial config.

> Sure enough:
>
> $ find drivers/pci -name '*c' |wc -l
> 150

and plus:
$ find arch/x86/pci/ -name '*c' |wc -l
22

>
> That's not reasonable, this includes a bunch of drivers that
> never run on a typical hypervisor.
>
> MMIO is also not as small as you are trying to show:
>
> $ cloc drivers/virtio/virtio_mmio.c include/uapi/linux/virtio_mmio.h
> 2 text files.
> 2 unique files.
> 0 files ignored.
>
> github.com/AlDanial/cloc v 1.82 T=0.01 s (230.7 files/s, 106126.5 lines/s)
> -------------------------------------------------------------------------------
> Language files blank comment code
> -------------------------------------------------------------------------------
> C 1 144 100 535
> C/C++ Header 1 39 66 36
> -------------------------------------------------------------------------------
> SUM: 2 183 166 571
> -------------------------------------------------------------------------------
>
>
> I don't doubt MMIO is smaller than PCI. Of course that's because it has
> no features to speak of - just this patch already doubles it's size. If
> we keep doing that because we want the features then they will reach
> the same size in about 4 iterations.

Since current virtio-mmio size is small enough, so adding any notable
feature would easily double it. I have no objection that it may one day
reach the same level of PCI, but in this patch some are actually
generic changes and for MSI specific code we provide the option to
confige away.

Thanks,
Chao

>
>
> --
> MST

2020-02-11 11:39:26

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue

On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> The standard virtio-mmio devices use notification register to signal
> backend. This will cause vmexits and slow down the performance when we
> passthrough the virtio-mmio devices to guest virtual machines.
> We proposed to update virtio over MMIO spec to add the per-queue
> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
> configure notify location for each queue.
>
> [1] https://lkml.org/lkml/2020/1/21/31
>
> Signed-off-by: Liu Jiang <[email protected]>
> Co-developed-by: Zha Bin <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Co-developed-by: Jing Liu <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>
> Co-developed-by: Chao Peng <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>

So this is for pass-through for nested virt?
OK but let's split this out please, and benchmark separately.


> ---
> drivers/virtio/virtio_mmio.c | 37 +++++++++++++++++++++++++++++++++++--
> include/uapi/linux/virtio_config.h | 8 +++++++-
> 2 files changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 97d5725..1733ab97 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -90,6 +90,9 @@ struct virtio_mmio_device {
> /* a list of queues so we can dispatch IRQs */
> spinlock_t lock;
> struct list_head virtqueues;
> +
> + unsigned short notify_base;
> + unsigned short notify_multiplier;
> };
>
> struct virtio_mmio_vq_info {
> @@ -98,6 +101,9 @@ struct virtio_mmio_vq_info {
>
> /* the list node for the virtqueues list */
> struct list_head node;
> +
> + /* Notify Address*/
> + unsigned int notify_addr;
> };
>
>
> @@ -119,13 +125,23 @@ static u64 vm_get_features(struct virtio_device *vdev)
> return features;
> }
>
> +static void vm_transport_features(struct virtio_device *vdev, u64 features)
> +{
> + if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
> + __virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
> +}
> +
> static int vm_finalize_features(struct virtio_device *vdev)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + u64 features = vdev->features;
>
> /* Give virtio_ring a chance to accept features. */
> vring_transport_features(vdev);
>
> + /* Give virtio_mmio a chance to accept features. */
> + vm_transport_features(vdev, features);
> +
> /* Make sure there is are no mixed devices */
> if (vm_dev->version == 2 &&
> !__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> @@ -272,10 +288,13 @@ static void vm_reset(struct virtio_device *vdev)
> static bool vm_notify(struct virtqueue *vq)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> + struct virtio_mmio_vq_info *info = vq->priv;
>
> - /* We write the queue's selector into the notification register to
> + /* We write the queue's selector into the Notify Address to
> * signal the other end */
> - writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> + if (info)
> + writel(vq->index, vm_dev->base + info->notify_addr);
> +
> return true;
> }
>
> @@ -434,6 +453,12 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> vq->priv = info;
> info->vq = vq;
>
> + if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION))
> + info->notify_addr = vm_dev->notify_base +
> + vm_dev->notify_multiplier * vq->index;
> + else
> + info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
> +
> spin_lock_irqsave(&vm_dev->lock, flags);
> list_add(&info->node, &vm_dev->virtqueues);
> spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -471,6 +496,14 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> return irq;
> }
>
> + if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
> + unsigned int notify = readl(vm_dev->base +
> + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> + vm_dev->notify_base = notify & 0xffff;
> + vm_dev->notify_multiplier = (notify >> 16) & 0xffff;
> + }
> +
> err = request_irq(irq, vm_interrupt, IRQF_SHARED,
> dev_name(&vdev->dev), vm_dev);
> if (err)
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index ff8e7dc..5d93c01 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -52,7 +52,7 @@
> * rest are per-device feature bits.
> */
> #define VIRTIO_TRANSPORT_F_START 28
> -#define VIRTIO_TRANSPORT_F_END 38
> +#define VIRTIO_TRANSPORT_F_END 40
>
> #ifndef VIRTIO_CONFIG_NO_LEGACY
> /* Do we get callbacks when the ring is completely used, even if we've
> @@ -88,4 +88,10 @@
> * Does the device support Single Root I/O Virtualization?
> */
> #define VIRTIO_F_SR_IOV 37
> +
> +/*
> + * This feature indicates the enhanced notification support on MMIO transport
> + * layer.
> + */
> +#define VIRTIO_F_MMIO_NOTIFICATION 39
> #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> --
> 1.8.3.1

2020-02-11 11:44:15

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] virtio mmio specification enhancement

On Tue, Feb 11, 2020 at 04:05:41PM +0000, Chao Peng wrote:
> On Mon, Feb 10, 2020 at 06:44:50AM -0500, Michael S. Tsirkin wrote:
> > On Mon, Feb 10, 2020 at 05:05:16PM +0800, Zha Bin wrote:
> > > We have compared the number of files and the lines of code between
> > > virtio-mmio and virio-pci.
> > >
> > > Virtio-PCI Virtio-MMIO
> > > number of files(Linux) 161 1
> > > lines of code(Linux) 78237 538
> >
> >
> >
> > Something's very wrong here. virtio PCI is 161 files?
> > Are you counting the whole PCI subsystem?
>
> Right, that is just a rough statistics.

Please try not to make them look so wrong then.
E.g. you don't include drivers/base/platform-msi.c for
mmio do you? Your patch brings a bunch of code in there.

> Surely enough, some drivers will
> never get enabled in a typcial config.
>
> > Sure enough:
> >
> > $ find drivers/pci -name '*c' |wc -l
> > 150
>
> and plus:
> $ find arch/x86/pci/ -name '*c' |wc -l
> 22

But what's the point? This is code that is maintained by PCI core
people anyway.

> >
> > That's not reasonable, this includes a bunch of drivers that
> > never run on a typical hypervisor.
> >
> > MMIO is also not as small as you are trying to show:
> >
> > $ cloc drivers/virtio/virtio_mmio.c include/uapi/linux/virtio_mmio.h
> > 2 text files.
> > 2 unique files.
> > 0 files ignored.
> >
> > github.com/AlDanial/cloc v 1.82 T=0.01 s (230.7 files/s, 106126.5 lines/s)
> > -------------------------------------------------------------------------------
> > Language files blank comment code
> > -------------------------------------------------------------------------------
> > C 1 144 100 535
> > C/C++ Header 1 39 66 36
> > -------------------------------------------------------------------------------
> > SUM: 2 183 166 571
> > -------------------------------------------------------------------------------
> >
> >
> > I don't doubt MMIO is smaller than PCI. Of course that's because it has
> > no features to speak of - just this patch already doubles it's size. If
> > we keep doing that because we want the features then they will reach
> > the same size in about 4 iterations.
>
> Since current virtio-mmio size is small enough, so adding any notable
> feature would easily double it.

But really unlike PCI this is just PV stuff that is not reused by
anyone. We end up maintaining all this by ourselves.

> I have no objection that it may one day
> reach the same level of PCI, but in this patch some are actually
> generic changes and for MSI specific code we provide the option to
> confige away.
>
> Thanks,
> Chao

The option will make it fall down at runtime but
it does not actually seem to remove all of the overhead.



> >
> >
> > --
> > MST

2020-02-11 11:49:30

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support

On Mon, Feb 10, 2020 at 05:05:20PM +0800, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> virtio over mmio devices as a lightweight machine model for modern
> cloud. The standard virtio over MMIO transport layer only supports one
> legacy interrupt, which is much heavier than virtio over PCI transport
> layer using MSI. Legacy interrupt has long work path and causes specific
> VMExits in following cases, which would considerably slow down the
> performance:
>
> 1) read interrupt status register
> 2) update interrupt status register
> 3) write IOAPIC EOI register
>
> We proposed to add MSI support for virtio over MMIO via new feature
> bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
>
> With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> uses msi_sharing[1] to indicate the event and vector mapping.
> Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
> Bit 1 is 1: device uses sharing mode and dynamic mapping.
>
> [1] https://lkml.org/lkml/2020/1/21/31
>
> Signed-off-by: Liu Jiang <[email protected]>
> Co-developed-by: Zha Bin <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Co-developed-by: Jing Liu <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>
> Co-developed-by: Chao Peng <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>
> ---
> drivers/virtio/virtio_mmio.c | 299 ++++++++++++++++++++++++++++++++++--
> drivers/virtio/virtio_mmio_common.h | 8 +
> drivers/virtio/virtio_mmio_msi.h | 82 ++++++++++
> include/uapi/linux/virtio_config.h | 7 +-
> include/uapi/linux/virtio_mmio.h | 31 ++++
> 5 files changed, 411 insertions(+), 16 deletions(-)

So that's > 100 exatra lines in headers that you for some reason
don't count when comparing code size :).


I think an effort can be made to reduce the complexity here.
Otherwise you will end up with a clone of PCI
sooner than you think. In fact, disabling all legacy:

$ wc -l drivers/virtio/virtio_pci_modern.c
734 drivers/virtio/virtio_pci_modern.c
$ wc -l drivers/virtio/virtio_pci_common.c
635 drivers/virtio/virtio_pci_common.c

So you have almost matched that with your patch.


>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 41e1c93..b24ce21 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -67,8 +67,7 @@
> #include <uapi/linux/virtio_mmio.h>
> #include <linux/virtio_ring.h>
> #include "virtio_mmio_common.h"
> -
> -
> +#include "virtio_mmio_msi.h"
>
> /* The alignment to use between consumer and producer parts of vring.
> * Currently hardcoded to the page size. */
> @@ -85,9 +84,13 @@ struct virtio_mmio_vq_info {
>
> /* Notify Address*/
> unsigned int notify_addr;
> -};
>
> + /* MSI vector (or none) */
> + unsigned int msi_vector;
> +};
>
> +static void vm_free_msi_irqs(struct virtio_device *vdev);
> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs);
>
> /* Configuration interface */
>
> @@ -110,6 +113,9 @@ static void vm_transport_features(struct virtio_device *vdev, u64 features)
> {
> if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
> __virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
> +
> + if (features & BIT_ULL(VIRTIO_F_MMIO_MSI))
> + __virtio_set_bit(vdev, VIRTIO_F_MMIO_MSI);
> }
>
> static int vm_finalize_features(struct virtio_device *vdev)
> @@ -307,7 +313,33 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
> return ret;
> }
>
> +static irqreturn_t vm_vring_interrupt(int irq, void *opaque)
> +{
> + struct virtio_mmio_device *vm_dev = opaque;
> + struct virtio_mmio_vq_info *info;
> + irqreturn_t ret = IRQ_NONE;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&vm_dev->lock, flags);
> + list_for_each_entry(info, &vm_dev->virtqueues, node) {
> + if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
> + ret = IRQ_HANDLED;
> + }
> + spin_unlock_irqrestore(&vm_dev->lock, flags);
> +
> + return ret;
> +}
> +
> +
> +/* Handle a configuration change */
> +static irqreturn_t vm_config_changed(int irq, void *opaque)
> +{
> + struct virtio_mmio_device *vm_dev = opaque;
> +
> + virtio_config_changed(&vm_dev->vdev);
>
> + return IRQ_HANDLED;
> +}
>
> static void vm_del_vq(struct virtqueue *vq)
> {
> @@ -316,6 +348,15 @@ static void vm_del_vq(struct virtqueue *vq)
> unsigned long flags;
> unsigned int index = vq->index;
>
> + if (vm_dev->msi_enabled && !vm_dev->msi_share) {
> + if (info->msi_vector != VIRTIO_MMIO_MSI_NO_VECTOR) {
> + int irq = mmio_msi_irq_vector(&vq->vdev->dev,
> + info->msi_vector);
> +
> + free_irq(irq, vq);
> + }
> + }
> +
> spin_lock_irqsave(&vm_dev->lock, flags);
> list_del(&info->node);
> spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -334,20 +375,56 @@ static void vm_del_vq(struct virtqueue *vq)
> kfree(info);
> }
>
> -static void vm_del_vqs(struct virtio_device *vdev)
> +static void vm_free_irqs(struct virtio_device *vdev)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + if (vm_dev->msi_enabled)
> + vm_free_msi_irqs(vdev);
> + else
> + free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
> +}
> +
> +static void vm_del_vqs(struct virtio_device *vdev)
> +{
> struct virtqueue *vq, *n;
>
> list_for_each_entry_safe(vq, n, &vdev->vqs, list)
> vm_del_vq(vq);
>
> - free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
> + vm_free_irqs(vdev);
> +}
> +
> +static inline void mmio_msi_set_enable(struct virtio_mmio_device *vm_dev,
> + int enable)
> +{
> + u32 state;
> +
> + state = readl(vm_dev->base + VIRTIO_MMIO_MSI_STATE);
> + if (enable && (state & VIRTIO_MMIO_MSI_ENABLED_MASK))
> + return;
> +
> + writel(VIRTIO_MMIO_MSI_CMD_ENABLE,
> + vm_dev->base + VIRTIO_MMIO_MSI_COMMAND);
> +}
> +
> +static void mmio_msi_config_vector(struct virtio_mmio_device *vm_dev, u32 vec)
> +{
> + writel(vec, vm_dev->base + VIRTIO_MMIO_MSI_VEC_SEL);
> + writel(VIRTIO_MMIO_MSI_CMD_MAP_CONFIG, vm_dev->base +
> + VIRTIO_MMIO_MSI_COMMAND);
> +}
> +
> +static void mmio_msi_queue_vector(struct virtio_mmio_device *vm_dev, u32 vec)
> +{
> + writel(vec, vm_dev->base + VIRTIO_MMIO_MSI_VEC_SEL);
> + writel(VIRTIO_MMIO_MSI_CMD_MAP_QUEUE, vm_dev->base +
> + VIRTIO_MMIO_MSI_COMMAND);
> }
>
> static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> void (*callback)(struct virtqueue *vq),
> - const char *name, bool ctx)
> + const char *name, bool ctx, u32 msi_vector)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> struct virtio_mmio_vq_info *info;
> @@ -440,6 +517,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> else
> info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
>
> + info->msi_vector = msi_vector;
> + /* Set queue event and vector mapping for MSI share mode. */
> + if (vm_dev->msi_share && msi_vector != VIRTIO_MMIO_MSI_NO_VECTOR)
> + mmio_msi_queue_vector(vm_dev, msi_vector);
> +
> spin_lock_irqsave(&vm_dev->lock, flags);
> list_add(&info->node, &vm_dev->virtqueues);
> spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -461,12 +543,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> return ERR_PTR(err);
> }
>
> -static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> - struct virtqueue *vqs[],
> - vq_callback_t *callbacks[],
> - const char * const names[],
> - const bool *ctx,
> - struct irq_affinity *desc)
> +static int vm_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
> + struct virtqueue *vqs[],
> + vq_callback_t *callbacks[],
> + const char * const names[],
> + const bool *ctx)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> int irq = platform_get_irq(vm_dev->pdev, 0);
> @@ -487,8 +568,6 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
>
> err = request_irq(irq, vm_interrupt, IRQF_SHARED,
> dev_name(&vdev->dev), vm_dev);
> - if (err)
> - return err;
>
> for (i = 0; i < nvqs; ++i) {
> if (!names[i]) {
> @@ -497,14 +576,202 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> }
>
> vqs[i] = vm_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
> - ctx ? ctx[i] : false);
> + ctx ? ctx[i] : false,
> + VIRTIO_MMIO_MSI_NO_VECTOR);
> if (IS_ERR(vqs[i])) {
> vm_del_vqs(vdev);
> return PTR_ERR(vqs[i]);
> }
> }
>
> + return err;
> +}
> +
> +static int vm_find_vqs_msi(struct virtio_device *vdev, unsigned int nvqs,
> + struct virtqueue *vqs[], vq_callback_t *callbacks[],
> + const char * const names[], bool per_vq_vectors,
> + const bool *ctx, struct irq_affinity *desc)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + int i, err, allocated_vectors, nvectors;
> + u32 msi_vec;
> + u32 max_vec_num = readl(vm_dev->base + VIRTIO_MMIO_MSI_VEC_NUM);
> +
> + /* For MSI non-sharing, the max vector number MUST greater than nvqs.
> + * Otherwise, go back to legacy interrupt.
> + */
> + if (per_vq_vectors && max_vec_num < (nvqs + 1))
> + return -EINVAL;
> +
> + if (per_vq_vectors) {
> + nvectors = 1;
> + for (i = 0; i < nvqs; ++i)
> + if (callbacks[i])
> + ++nvectors;
> + } else {
> + nvectors = 2;
> + }
> +
> + vm_dev->msi_share = !per_vq_vectors;
> +
> + /* Allocate nvqs irqs for queues and one irq for configuration */
> + err = vm_request_msi_vectors(vdev, nvectors);
> + if (err != 0)
> + return err;
> +
> + allocated_vectors = vm_dev->msi_used_vectors;
> + for (i = 0; i < nvqs; i++) {
> + if (!names[i]) {
> + vqs[i] = NULL;
> + continue;
> + }
> + if (!callbacks[i])
> + msi_vec = VIRTIO_MMIO_MSI_NO_VECTOR;
> + else if (per_vq_vectors)
> + msi_vec = allocated_vectors++;
> + else
> + msi_vec = 1;
> + vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
> + ctx ? ctx[i] : false, msi_vec);
> + if (IS_ERR(vqs[i])) {
> + err = PTR_ERR(vqs[i]);
> + goto error_find;
> + }
> +
> + if (!per_vq_vectors ||
> + msi_vec == VIRTIO_MMIO_MSI_NO_VECTOR)
> + continue;
> +
> + /* allocate per-vq irq if available and necessary */
> + snprintf(vm_dev->vm_vq_names[msi_vec],
> + sizeof(*vm_dev->vm_vq_names),
> + "%s-%s",
> + dev_name(&vm_dev->vdev.dev), names[i]);
> + err = request_irq(mmio_msi_irq_vector(&vqs[i]->vdev->dev,
> + msi_vec),
> + vring_interrupt, 0,
> + vm_dev->vm_vq_names[msi_vec], vqs[i]);
> +
> + if (err)
> + goto error_find;
> + }
> +
> return 0;
> +
> +error_find:
> + vm_del_vqs(vdev);
> + return err;
> +}
> +
> +static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> + struct virtqueue *vqs[],
> + vq_callback_t *callbacks[],
> + const char * const names[],
> + const bool *ctx,
> + struct irq_affinity *desc)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + int err;
> +
> + if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
> + unsigned int notify = readl(vm_dev->base +
> + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> + vm_dev->notify_base = notify & 0xffff;
> + vm_dev->notify_multiplier = (notify >> 16) & 0xffff;
> + }
> +
> + if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_MSI)) {
> + bool dyn_mapping = !!(readl(vm_dev->base +
> + VIRTIO_MMIO_MSI_STATE) &
> + VIRTIO_MMIO_MSI_SHARING_MASK);
> + if (!dyn_mapping)
> + err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
> + names, true, ctx, desc);
> + else
> + err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
> + names, false, ctx, desc);
> + if (!err)
> + return 0;
> + }
> +
> + return vm_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
> +}
> +
> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + unsigned int v;
> + int irq, err;
> +
> + if (vm_dev->msi_enabled)
> + return -EINVAL;
> +
> + vm_dev->vm_vq_names = kmalloc_array(nirqs, sizeof(*vm_dev->vm_vq_names),
> + GFP_KERNEL);
> + if (!vm_dev->vm_vq_names)
> + return -ENOMEM;
> +
> + mmio_get_msi_domain(vdev);
> + err = mmio_msi_domain_alloc_irqs(&vdev->dev, nirqs);
> + if (err) {
> + kfree(vm_dev->vm_vq_names);
> + vm_dev->vm_vq_names = NULL;
> + return err;
> + }
> +
> + mmio_msi_set_enable(vm_dev, 1);
> + vm_dev->msi_enabled = true;
> +
> + v = vm_dev->msi_used_vectors;
> + /* The first MSI vector is used for configuration change event. */
> + snprintf(vm_dev->vm_vq_names[v], sizeof(*vm_dev->vm_vq_names),
> + "%s-config", dev_name(&vdev->dev));
> + irq = mmio_msi_irq_vector(&vdev->dev, v);
> + err = request_irq(irq, vm_config_changed, 0, vm_dev->vm_vq_names[v],
> + vm_dev);
> + if (err)
> + goto error_request_irq;
> +
> + /* Set the configuration event mapping. */
> + if (vm_dev->msi_share)
> + mmio_msi_config_vector(vm_dev, v);
> +
> + ++vm_dev->msi_used_vectors;
> +
> + if (vm_dev->msi_share) {
> + v = vm_dev->msi_used_vectors;
> + snprintf(vm_dev->vm_vq_names[v], sizeof(*vm_dev->vm_vq_names),
> + "%s-virtqueues", dev_name(&vm_dev->vdev.dev));
> + err = request_irq(mmio_msi_irq_vector(&vdev->dev, v),
> + vm_vring_interrupt, 0, vm_dev->vm_vq_names[v],
> + vm_dev);
> + if (err)
> + goto error_request_irq;
> + ++vm_dev->msi_used_vectors;
> + }
> +
> + return 0;
> +
> +error_request_irq:
> + vm_free_msi_irqs(vdev);
> +
> + return err;
> +}
> +
> +static void vm_free_msi_irqs(struct virtio_device *vdev)
> +{
> + int i;
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + mmio_msi_set_enable(vm_dev, 0);
> + for (i = 0; i < vm_dev->msi_used_vectors; i++)
> + free_irq(mmio_msi_irq_vector(&vdev->dev, i), vm_dev);
> + mmio_msi_domain_free_irqs(&vdev->dev);
> + kfree(vm_dev->vm_vq_names);
> + vm_dev->vm_vq_names = NULL;
> + vm_dev->msi_enabled = false;
> + vm_dev->msi_used_vectors = 0;
> }
>
> static const char *vm_bus_name(struct virtio_device *vdev)
> @@ -609,6 +876,8 @@ static int virtio_mmio_probe(struct platform_device *pdev)
>
> platform_set_drvdata(pdev, vm_dev);
>
> + mmio_msi_create_irq_domain();
> +
> rc = register_virtio_device(&vm_dev->vdev);
> if (rc)
> put_device(&vm_dev->vdev.dev);
> diff --git a/drivers/virtio/virtio_mmio_common.h b/drivers/virtio/virtio_mmio_common.h
> index 90cb304..ccf6320 100644
> --- a/drivers/virtio/virtio_mmio_common.h
> +++ b/drivers/virtio/virtio_mmio_common.h
> @@ -26,6 +26,14 @@ struct virtio_mmio_device {
>
> unsigned short notify_base;
> unsigned short notify_multiplier;
> +
> + /* Name strings for interrupts. This size should be enough. */
> + char (*vm_vq_names)[256];
> +
> + /* used vectors */
> + unsigned int msi_used_vectors;
> + bool msi_share;
> + bool msi_enabled;
> };
>
> #endif
> diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
> index 27cb2af..10038cb 100644
> --- a/drivers/virtio/virtio_mmio_msi.h
> +++ b/drivers/virtio/virtio_mmio_msi.h
> @@ -8,6 +8,7 @@
> #include <linux/irq.h>
> #include <linux/irqdomain.h>
> #include <linux/platform_device.h>
> +#include "virtio_mmio_common.h"
>
> static irq_hw_number_t mmio_msi_hwirq;
> static struct irq_domain *mmio_msi_domain;
> @@ -21,12 +22,41 @@ void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> {
> }
>
> +static void __iomem *vm_dev_base(struct msi_desc *desc)
> +{
> + if (desc) {
> + struct device *dev = desc->dev;
> + struct virtio_device *vdev = dev_to_virtio(dev);
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + return vm_dev->base;
> + }
> +
> + return NULL;
> +}
> +
> +static void mmio_msi_set_mask_bit(struct irq_data *data, u32 flag)
> +{
> + struct msi_desc *desc = irq_data_get_msi_desc(data);
> + void __iomem *base = vm_dev_base(desc);
> + unsigned int offset = data->irq - desc->irq;
> +
> + if (base) {
> + u32 op = flag ? VIRTIO_MMIO_MSI_CMD_MASK :
> + VIRTIO_MMIO_MSI_CMD_UNMASK;
> + writel(offset, base + VIRTIO_MMIO_MSI_VEC_SEL);
> + writel(op, base + VIRTIO_MMIO_MSI_COMMAND);
> + }
> +}
> +
> static void mmio_msi_mask_irq(struct irq_data *data)
> {
> + mmio_msi_set_mask_bit(data, 1);
> }
>
> static void mmio_msi_unmask_irq(struct irq_data *data)
> {
> + mmio_msi_set_mask_bit(data, 0);
> }
>
> static struct irq_chip mmio_msi_controller = {
> @@ -86,8 +116,60 @@ static inline void mmio_msi_create_irq_domain(void)
> irq_domain_free_fwnode(fn);
> }
> }
> +
> +static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
> +{
> + void __iomem *base = vm_dev_base(desc);
> +
> + if (base) {
> + writel(desc->platform.msi_index, base +
> + VIRTIO_MMIO_MSI_VEC_SEL);
> + writel(msg->address_lo, base + VIRTIO_MMIO_MSI_ADDRESS_LOW);
> + writel(msg->address_hi, base + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
> + writel(msg->data, base + VIRTIO_MMIO_MSI_DATA);
> + writel(VIRTIO_MMIO_MSI_CMD_CONFIGURE,
> + base + VIRTIO_MMIO_MSI_COMMAND);
> + }
> +}
> +
> +static inline int mmio_msi_domain_alloc_irqs(struct device *dev,
> + unsigned int nvec)
> +{
> + return platform_msi_domain_alloc_irqs(dev, nvec,
> + mmio_write_msi_msg);
> +}
> +
> +static inline void mmio_msi_domain_free_irqs(struct device *dev)
> +{
> + return platform_msi_domain_free_irqs(dev);
> +}
> +
> +static inline void mmio_get_msi_domain(struct virtio_device *vdev)
> +{
> + if (!vdev->dev.msi_domain)
> + vdev->dev.msi_domain = mmio_msi_domain;
> +}
> +
> +static inline int mmio_msi_irq_vector(struct device *dev, unsigned int nr)
> +{
> + struct msi_desc *entry = first_msi_entry(dev);
> +
> + return entry->irq + nr;
> +}
> +
> #else
> static inline void mmio_msi_create_irq_domain(void) {}
> +static inline int mmio_msi_irq_vector(struct device *dev, unsigned int nr)
> +{
> + return -EINVAL;
> +}
> +static inline void mmio_get_msi_domain(struct virtio_device *vdev) {}
> +static inline int mmio_msi_domain_alloc_irqs(struct device *dev,
> + unsigned int nvec)
> +{
> + return -EINVAL;
> +}
> +static inline void mmio_msi_domain_free_irqs(struct device *dev) {}
> #endif
>
> #endif
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index 5d93c01..5eb3900 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -52,7 +52,7 @@
> * rest are per-device feature bits.
> */
> #define VIRTIO_TRANSPORT_F_START 28
> -#define VIRTIO_TRANSPORT_F_END 40
> +#define VIRTIO_TRANSPORT_F_END 41
>
> #ifndef VIRTIO_CONFIG_NO_LEGACY
> /* Do we get callbacks when the ring is completely used, even if we've
> @@ -94,4 +94,9 @@
> * layer.
> */
> #define VIRTIO_F_MMIO_NOTIFICATION 39
> +
> +/*
> + * This feature indicates the MSI support on MMIO layer.
> + */
> +#define VIRTIO_F_MMIO_MSI 40
> #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> diff --git a/include/uapi/linux/virtio_mmio.h b/include/uapi/linux/virtio_mmio.h
> index c4b0968..777cb0e 100644
> --- a/include/uapi/linux/virtio_mmio.h
> +++ b/include/uapi/linux/virtio_mmio.h
> @@ -122,6 +122,21 @@
> #define VIRTIO_MMIO_QUEUE_USED_LOW 0x0a0
> #define VIRTIO_MMIO_QUEUE_USED_HIGH 0x0a4
>
> +/* MSI max vector number that device supports - Read Only */
> +#define VIRTIO_MMIO_MSI_VEC_NUM 0x0c0
> +/* MSI state register - Read Only */
> +#define VIRTIO_MMIO_MSI_STATE 0x0c4
> +/* MSI command register - Write Only */
> +#define VIRTIO_MMIO_MSI_COMMAND 0x0c8
> +/* MSI vector selector - Write Only */
> +#define VIRTIO_MMIO_MSI_VEC_SEL 0x0d0
> +/* MSI low 32 bit address, 64 bits in two halves */
> +#define VIRTIO_MMIO_MSI_ADDRESS_LOW 0x0d4
> +/* MSI high 32 bit address, 64 bits in two halves */
> +#define VIRTIO_MMIO_MSI_ADDRESS_HIGH 0x0d8
> +/* MSI 32 bit data */
> +#define VIRTIO_MMIO_MSI_DATA 0x0dc
> +
> /* Configuration atomicity value */
> #define VIRTIO_MMIO_CONFIG_GENERATION 0x0fc
>
> @@ -130,6 +145,22 @@
> #define VIRTIO_MMIO_CONFIG 0x100
>
>
> +/* MSI commands */
> +#define VIRTIO_MMIO_MSI_CMD_ENABLE 0x1
> +#define VIRTIO_MMIO_MSI_CMD_DISABLE 0x2
> +#define VIRTIO_MMIO_MSI_CMD_CONFIGURE 0x3
> +#define VIRTIO_MMIO_MSI_CMD_MASK 0x4
> +#define VIRTIO_MMIO_MSI_CMD_UNMASK 0x5
> +#define VIRTIO_MMIO_MSI_CMD_MAP_CONFIG 0x6
> +#define VIRTIO_MMIO_MSI_CMD_MAP_QUEUE 0x7
> +
> +/* MSI NO_VECTOR */
> +#define VIRTIO_MMIO_MSI_NO_VECTOR 0xffffffff
> +
> +/* MSI state enabled state mask */
> +#define VIRTIO_MMIO_MSI_ENABLED_MASK (1 << 31)
> +/* MSI state MSI sharing mask */
> +#define VIRTIO_MMIO_MSI_SHARING_MASK (1 << 30)
>

You should be able to drop MAP commands,
replace vector number with vq number.

I also don't really see a burning need for separate
enable/disable/configure/mask/unmask commands.
Just two registers mask and enable should be enough?
Will remove need for NO_VECTOR hack too.

what is sharing that hypervisor needs to worry about it?
just make all interrupts shared, or all unshared.
does it matter for performance?

> /*
> * Interrupt flags (re: interrupt status & acknowledge registers)
> --
> 1.8.3.1

2020-02-11 11:51:07

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain

On Mon, Feb 10, 2020 at 05:05:19PM +0800, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> Create a generic irq domain for all architectures which
> supports virtio-mmio. The device offering VIRTIO_F_MMIO_MSI
> feature bit can use this irq domain.
>
> Signed-off-by: Liu Jiang <[email protected]>
> Co-developed-by: Zha Bin <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Co-developed-by: Jing Liu <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>
> Co-developed-by: Chao Peng <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>
> ---
> drivers/base/platform-msi.c | 4 +-
> drivers/virtio/Kconfig | 9 ++++
> drivers/virtio/virtio_mmio_msi.h | 93 ++++++++++++++++++++++++++++++++++++++++
> include/linux/msi.h | 1 +
> 4 files changed, 105 insertions(+), 2 deletions(-)
> create mode 100644 drivers/virtio/virtio_mmio_msi.h


This patch needs to copy maintainers for drivers/base/platform-msi.c and
include/linux/msi.h

> diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
> index 8da314b..45752f1 100644
> --- a/drivers/base/platform-msi.c
> +++ b/drivers/base/platform-msi.c
> @@ -31,12 +31,11 @@ struct platform_msi_priv_data {
> /* The devid allocator */
> static DEFINE_IDA(platform_msi_devid_ida);
>
> -#ifdef GENERIC_MSI_DOMAIN_OPS
> /*
> * Convert an msi_desc to a globaly unique identifier (per-device
> * devid + msi_desc position in the msi_list).
> */
> -static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> {
> u32 devid;
>
> @@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
> }
>
> +#ifdef GENERIC_MSI_DOMAIN_OPS
> static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> {
> arg->desc = desc;
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 078615c..551a9f7 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -84,6 +84,15 @@ config VIRTIO_MMIO
>
> If unsure, say N.
>
> +config VIRTIO_MMIO_MSI
> + bool "Memory-mapped virtio device MSI"
> + depends on VIRTIO_MMIO && GENERIC_MSI_IRQ_DOMAIN && GENERIC_MSI_IRQ
> + help
> + This allows device drivers to support msi interrupt handling for
> + virtio-mmio devices. It can improve performance greatly.
> +
> + If unsure, say N.
> +
> config VIRTIO_MMIO_CMDLINE_DEVICES
> bool "Memory mapped virtio devices parameter parsing"
> depends on VIRTIO_MMIO
> diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
> new file mode 100644
> index 0000000..27cb2af
> --- /dev/null
> +++ b/drivers/virtio/virtio_mmio_msi.h
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
> +#define _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
> +
> +#ifdef CONFIG_VIRTIO_MMIO_MSI
> +
> +#include <linux/msi.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/platform_device.h>
> +
> +static irq_hw_number_t mmio_msi_hwirq;
> +static struct irq_domain *mmio_msi_domain;
> +
> +struct irq_domain *__weak arch_msi_root_irq_domain(void)
> +{
> + return NULL;
> +}
> +
> +void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> +}
> +
> +static void mmio_msi_mask_irq(struct irq_data *data)
> +{
> +}
> +
> +static void mmio_msi_unmask_irq(struct irq_data *data)
> +{
> +}
> +
> +static struct irq_chip mmio_msi_controller = {
> + .name = "VIRTIO-MMIO-MSI",
> + .irq_mask = mmio_msi_mask_irq,
> + .irq_unmask = mmio_msi_unmask_irq,
> + .irq_ack = irq_chip_ack_parent,
> + .irq_retrigger = irq_chip_retrigger_hierarchy,
> + .irq_compose_msi_msg = irq_msi_compose_msg,
> + .flags = IRQCHIP_SKIP_SET_WAKE,
> +};
> +
> +static int mmio_msi_prepare(struct irq_domain *domain, struct device *dev,
> + int nvec, msi_alloc_info_t *arg)
> +{
> + memset(arg, 0, sizeof(*arg));
> + return 0;
> +}
> +
> +static void mmio_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> +{
> + mmio_msi_hwirq = platform_msi_calc_hwirq(desc);
> +}


This call isn't exported to modules. How will it work when virtio is
modular?

> +
> +static irq_hw_number_t mmio_msi_get_hwirq(struct msi_domain_info *info,
> + msi_alloc_info_t *arg)
> +{
> + return mmio_msi_hwirq;
> +}
> +
> +static struct msi_domain_ops mmio_msi_domain_ops = {
> + .msi_prepare = mmio_msi_prepare,
> + .set_desc = mmio_msi_set_desc,
> + .get_hwirq = mmio_msi_get_hwirq,
> +};
> +
> +static struct msi_domain_info mmio_msi_domain_info = {
> + .flags = MSI_FLAG_USE_DEF_DOM_OPS |
> + MSI_FLAG_USE_DEF_CHIP_OPS |
> + MSI_FLAG_ACTIVATE_EARLY,
> + .ops = &mmio_msi_domain_ops,
> + .chip = &mmio_msi_controller,
> + .handler = handle_edge_irq,
> + .handler_name = "edge",
> +};
> +
> +static inline void mmio_msi_create_irq_domain(void)
> +{
> + struct fwnode_handle *fn;
> + struct irq_domain *parent = arch_msi_root_irq_domain();
> +
> + fn = irq_domain_alloc_named_fwnode("VIRTIO-MMIO-MSI");
> + if (fn && parent) {
> + mmio_msi_domain =
> + platform_msi_create_irq_domain(fn,
> + &mmio_msi_domain_info, parent);
> + irq_domain_free_fwnode(fn);
> + }
> +}
> +#else
> +static inline void mmio_msi_create_irq_domain(void) {}
> +#endif
> +
> +#endif
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 8ad679e..ee5f566 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -362,6 +362,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
> unsigned int nvec);
> void *platform_msi_get_host_data(struct irq_domain *domain);
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc);
> #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
>
> #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> --
> 1.8.3.1

2020-02-11 12:17:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support

On Tue, Feb 11, 2020 at 11:35:43AM +0800, Liu, Jing2 wrote:
>
> On 2/11/2020 11:17 AM, Jason Wang wrote:
> >
> > On 2020/2/10 下午5:05, Zha Bin wrote:
> > > From: Liu Jiang<[email protected]>
> > >
> > > Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> > > virtio over mmio devices as a lightweight machine model for modern
> > > cloud. The standard virtio over MMIO transport layer only supports one
> > > legacy interrupt, which is much heavier than virtio over PCI transport
> > > layer using MSI. Legacy interrupt has long work path and causes specific
> > > VMExits in following cases, which would considerably slow down the
> > > performance:
> > >
> > > 1) read interrupt status register
> > > 2) update interrupt status register
> > > 3) write IOAPIC EOI register
> > >
> > > We proposed to add MSI support for virtio over MMIO via new feature
> > > bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.
> > >
> > > With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
> > > uses msi_sharing[1] to indicate the event and vector mapping.
> > > Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
> > > Bit 1 is 1: device uses sharing mode and dynamic mapping.
> >
> >
> > I believe dynamic mapping should cover the case of fixed vector?
> >
> Actually this bit *aims* for msi sharing or msi non-sharing.
>
> It means, when msi sharing bit is 1, device doesn't want vector per queue
>
> (it wants msi vector sharing as name) and doesn't want a high interrupt
> rate.
>
> So driver turns to !per_vq_vectors and has to do dynamical mapping.
>
> So they are opposite not superset.
>
> Thanks!
>
> Jing

what's the point of all this flexibility? the cover letter
complains about the code size of pci, then you go back
and add a ton of options with no justification.



>
> > Thanks
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >

2020-02-11 12:19:01

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue

On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> The standard virtio-mmio devices use notification register to signal
> backend. This will cause vmexits and slow down the performance when we
> passthrough the virtio-mmio devices to guest virtual machines.
> We proposed to update virtio over MMIO spec to add the per-queue
> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
> configure notify location for each queue.
>
> [1] https://lkml.org/lkml/2020/1/21/31
>
> Signed-off-by: Liu Jiang <[email protected]>
> Co-developed-by: Zha Bin <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Co-developed-by: Jing Liu <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>
> Co-developed-by: Chao Peng <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>


Hmm. Any way to make this static so we don't need
base and multiplier?

> ---
> drivers/virtio/virtio_mmio.c | 37 +++++++++++++++++++++++++++++++++++--
> include/uapi/linux/virtio_config.h | 8 +++++++-
> 2 files changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 97d5725..1733ab97 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -90,6 +90,9 @@ struct virtio_mmio_device {
> /* a list of queues so we can dispatch IRQs */
> spinlock_t lock;
> struct list_head virtqueues;
> +
> + unsigned short notify_base;
> + unsigned short notify_multiplier;
> };
>
> struct virtio_mmio_vq_info {
> @@ -98,6 +101,9 @@ struct virtio_mmio_vq_info {
>
> /* the list node for the virtqueues list */
> struct list_head node;
> +
> + /* Notify Address*/
> + unsigned int notify_addr;
> };
>
>
> @@ -119,13 +125,23 @@ static u64 vm_get_features(struct virtio_device *vdev)
> return features;
> }
>
> +static void vm_transport_features(struct virtio_device *vdev, u64 features)
> +{
> + if (features & BIT_ULL(VIRTIO_F_MMIO_NOTIFICATION))
> + __virtio_set_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION);
> +}
> +
> static int vm_finalize_features(struct virtio_device *vdev)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + u64 features = vdev->features;
>
> /* Give virtio_ring a chance to accept features. */
> vring_transport_features(vdev);
>
> + /* Give virtio_mmio a chance to accept features. */
> + vm_transport_features(vdev, features);
> +
> /* Make sure there is are no mixed devices */
> if (vm_dev->version == 2 &&
> !__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> @@ -272,10 +288,13 @@ static void vm_reset(struct virtio_device *vdev)
> static bool vm_notify(struct virtqueue *vq)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> + struct virtio_mmio_vq_info *info = vq->priv;
>
> - /* We write the queue's selector into the notification register to
> + /* We write the queue's selector into the Notify Address to
> * signal the other end */
> - writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> + if (info)
> + writel(vq->index, vm_dev->base + info->notify_addr);
> +
> return true;
> }
>
> @@ -434,6 +453,12 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> vq->priv = info;
> info->vq = vq;
>
> + if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION))
> + info->notify_addr = vm_dev->notify_base +
> + vm_dev->notify_multiplier * vq->index;
> + else
> + info->notify_addr = VIRTIO_MMIO_QUEUE_NOTIFY;
> +
> spin_lock_irqsave(&vm_dev->lock, flags);
> list_add(&info->node, &vm_dev->virtqueues);
> spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -471,6 +496,14 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> return irq;
> }
>
> + if (__virtio_test_bit(vdev, VIRTIO_F_MMIO_NOTIFICATION)) {
> + unsigned int notify = readl(vm_dev->base +
> + VIRTIO_MMIO_QUEUE_NOTIFY);


that register is documented as:

/* Queue notifier - Write Only */
#define VIRTIO_MMIO_QUEUE_NOTIFY 0x050

so at least you need to update the doc.

> +
> + vm_dev->notify_base = notify & 0xffff;
> + vm_dev->notify_multiplier = (notify >> 16) & 0xffff;

are 16 bit base/limit always enough?
In fact won't we be short on 16 bit address space
in a rather short order if queues use up a page
of space at a time?


> + }
> +
> err = request_irq(irq, vm_interrupt, IRQF_SHARED,
> dev_name(&vdev->dev), vm_dev);
> if (err)
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index ff8e7dc..5d93c01 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -52,7 +52,7 @@
> * rest are per-device feature bits.
> */
> #define VIRTIO_TRANSPORT_F_START 28
> -#define VIRTIO_TRANSPORT_F_END 38
> +#define VIRTIO_TRANSPORT_F_END 40
>
> #ifndef VIRTIO_CONFIG_NO_LEGACY
> /* Do we get callbacks when the ring is completely used, even if we've
> @@ -88,4 +88,10 @@
> * Does the device support Single Root I/O Virtualization?
> */
> #define VIRTIO_F_SR_IOV 37
> +
> +/*
> + * This feature indicates the enhanced notification support on MMIO transport
> + * layer.

Let's replace this with an actual description of the enhancement please
otherwise it will not make sense in a couple of months.

e.g. "Per queue notification address"?


> + */
> +#define VIRTIO_F_MMIO_NOTIFICATION 39
> #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> --
> 1.8.3.1

2020-02-12 03:41:31

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue


On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
> On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
>> From: Liu Jiang<[email protected]>
>>
>> The standard virtio-mmio devices use notification register to signal
>> backend. This will cause vmexits and slow down the performance when we
>> passthrough the virtio-mmio devices to guest virtual machines.
>> We proposed to update virtio over MMIO spec to add the per-queue
>> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
>> configure notify location for each queue.
>>
>> [1]https://lkml.org/lkml/2020/1/21/31
>>
>> Signed-off-by: Liu Jiang<[email protected]>
>> Co-developed-by: Zha Bin<[email protected]>
>> Signed-off-by: Zha Bin<[email protected]>
>> Co-developed-by: Jing Liu<[email protected]>
>> Signed-off-by: Jing Liu<[email protected]>
>> Co-developed-by: Chao Peng<[email protected]>
>> Signed-off-by: Chao Peng<[email protected]>
> Hmm. Any way to make this static so we don't need
> base and multiplier?


E.g page per vq?

Thanks

2020-02-12 07:41:35

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] virtio-mmio: create a generic MSI irq domain

On Mon, Feb 10, 2020 at 05:05:19PM +0800, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> Create a generic irq domain for all architectures which
> supports virtio-mmio. The device offering VIRTIO_F_MMIO_MSI
> feature bit can use this irq domain.
>
> Signed-off-by: Liu Jiang <[email protected]>
> Co-developed-by: Zha Bin <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Co-developed-by: Jing Liu <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>
> Co-developed-by: Chao Peng <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>
> ---
> drivers/base/platform-msi.c | 4 +-
> drivers/virtio/Kconfig | 9 ++++
> drivers/virtio/virtio_mmio_msi.h | 93 ++++++++++++++++++++++++++++++++++++++++
> include/linux/msi.h | 1 +
> 4 files changed, 105 insertions(+), 2 deletions(-)
> create mode 100644 drivers/virtio/virtio_mmio_msi.h
>
> diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
> index 8da314b..45752f1 100644
> --- a/drivers/base/platform-msi.c
> +++ b/drivers/base/platform-msi.c
> @@ -31,12 +31,11 @@ struct platform_msi_priv_data {
> /* The devid allocator */
> static DEFINE_IDA(platform_msi_devid_ida);
>
> -#ifdef GENERIC_MSI_DOMAIN_OPS
> /*
> * Convert an msi_desc to a globaly unique identifier (per-device
> * devid + msi_desc position in the msi_list).
> */
> -static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> {
> u32 devid;
>
> @@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
> }
>
> +#ifdef GENERIC_MSI_DOMAIN_OPS
> static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> {
> arg->desc = desc;
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 078615c..551a9f7 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -84,6 +84,15 @@ config VIRTIO_MMIO
>
> If unsure, say N.
>
> +config VIRTIO_MMIO_MSI
> + bool "Memory-mapped virtio device MSI"
> + depends on VIRTIO_MMIO && GENERIC_MSI_IRQ_DOMAIN && GENERIC_MSI_IRQ
> + help
> + This allows device drivers to support msi interrupt handling for
> + virtio-mmio devices. It can improve performance greatly.
> +
> + If unsure, say N.
> +
> config VIRTIO_MMIO_CMDLINE_DEVICES
> bool "Memory mapped virtio devices parameter parsing"
> depends on VIRTIO_MMIO
> diff --git a/drivers/virtio/virtio_mmio_msi.h b/drivers/virtio/virtio_mmio_msi.h
> new file mode 100644
> index 0000000..27cb2af
> --- /dev/null
> +++ b/drivers/virtio/virtio_mmio_msi.h
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
> +#define _DRIVERS_VIRTIO_VIRTIO_MMIO_MSI_H
> +
> +#ifdef CONFIG_VIRTIO_MMIO_MSI
> +
> +#include <linux/msi.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/platform_device.h>
> +
> +static irq_hw_number_t mmio_msi_hwirq;
> +static struct irq_domain *mmio_msi_domain;

So each instance of including this header will create its own
copy of the variables. That won't do the right thing.


> +
> +struct irq_domain *__weak arch_msi_root_irq_domain(void)
> +{
> + return NULL;
> +}
> +
> +void __weak irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> +}
> +
> +static void mmio_msi_mask_irq(struct irq_data *data)
> +{
> +}
> +
> +static void mmio_msi_unmask_irq(struct irq_data *data)
> +{
> +}
> +
> +static struct irq_chip mmio_msi_controller = {
> + .name = "VIRTIO-MMIO-MSI",
> + .irq_mask = mmio_msi_mask_irq,
> + .irq_unmask = mmio_msi_unmask_irq,
> + .irq_ack = irq_chip_ack_parent,
> + .irq_retrigger = irq_chip_retrigger_hierarchy,
> + .irq_compose_msi_msg = irq_msi_compose_msg,
> + .flags = IRQCHIP_SKIP_SET_WAKE,
> +};
> +
> +static int mmio_msi_prepare(struct irq_domain *domain, struct device *dev,
> + int nvec, msi_alloc_info_t *arg)
> +{
> + memset(arg, 0, sizeof(*arg));
> + return 0;
> +}
> +
> +static void mmio_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> +{
> + mmio_msi_hwirq = platform_msi_calc_hwirq(desc);
> +}
> +
> +static irq_hw_number_t mmio_msi_get_hwirq(struct msi_domain_info *info,
> + msi_alloc_info_t *arg)
> +{
> + return mmio_msi_hwirq;
> +}
> +
> +static struct msi_domain_ops mmio_msi_domain_ops = {
> + .msi_prepare = mmio_msi_prepare,
> + .set_desc = mmio_msi_set_desc,
> + .get_hwirq = mmio_msi_get_hwirq,
> +};
> +
> +static struct msi_domain_info mmio_msi_domain_info = {
> + .flags = MSI_FLAG_USE_DEF_DOM_OPS |
> + MSI_FLAG_USE_DEF_CHIP_OPS |
> + MSI_FLAG_ACTIVATE_EARLY,
> + .ops = &mmio_msi_domain_ops,
> + .chip = &mmio_msi_controller,
> + .handler = handle_edge_irq,
> + .handler_name = "edge",
> +};
> +
> +static inline void mmio_msi_create_irq_domain(void)
> +{
> + struct fwnode_handle *fn;
> + struct irq_domain *parent = arch_msi_root_irq_domain();
> +
> + fn = irq_domain_alloc_named_fwnode("VIRTIO-MMIO-MSI");
> + if (fn && parent) {
> + mmio_msi_domain =
> + platform_msi_create_irq_domain(fn,
> + &mmio_msi_domain_info, parent);
> + irq_domain_free_fwnode(fn);
> + }
> +}

It does not look like anyone cleans up this domain.
So how does this work with a modular virtio mmio?
what happens when you unload the module?


> +#else
> +static inline void mmio_msi_create_irq_domain(void) {}
> +#endif
> +
> +#endif
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 8ad679e..ee5f566 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -362,6 +362,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
> unsigned int nvec);
> void *platform_msi_get_host_data(struct irq_domain *domain);
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc);
> #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
>
> #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> --
> 1.8.3.1

2020-02-12 08:20:24

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue

On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
>
> On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
> > On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
> > > From: Liu Jiang<[email protected]>
> > >
> > > The standard virtio-mmio devices use notification register to signal
> > > backend. This will cause vmexits and slow down the performance when we
> > > passthrough the virtio-mmio devices to guest virtual machines.
> > > We proposed to update virtio over MMIO spec to add the per-queue
> > > notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
> > > configure notify location for each queue.
> > >
> > > [1]https://lkml.org/lkml/2020/1/21/31
> > >
> > > Signed-off-by: Liu Jiang<[email protected]>
> > > Co-developed-by: Zha Bin<[email protected]>
> > > Signed-off-by: Zha Bin<[email protected]>
> > > Co-developed-by: Jing Liu<[email protected]>
> > > Signed-off-by: Jing Liu<[email protected]>
> > > Co-developed-by: Chao Peng<[email protected]>
> > > Signed-off-by: Chao Peng<[email protected]>
> > Hmm. Any way to make this static so we don't need
> > base and multiplier?
>
>
> E.g page per vq?
>
> Thanks

Problem is, is page size well defined enough?
Are there cases where guest and host page sizes differ?
I suspect there might be.

But I also think this whole patch is unproven. Is someone actually
working on QEMU code to support pass-trough of virtio-pci
as virtio-mmio for nested guests? What's the performance
gain like?

--
MST

2020-02-12 08:56:11

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue


On 2020/2/12 下午4:18, Michael S. Tsirkin wrote:
> On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
>> On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
>>> On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
>>>> From: Liu Jiang<[email protected]>
>>>>
>>>> The standard virtio-mmio devices use notification register to signal
>>>> backend. This will cause vmexits and slow down the performance when we
>>>> passthrough the virtio-mmio devices to guest virtual machines.
>>>> We proposed to update virtio over MMIO spec to add the per-queue
>>>> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
>>>> configure notify location for each queue.
>>>>
>>>> [1]https://lkml.org/lkml/2020/1/21/31
>>>>
>>>> Signed-off-by: Liu Jiang<[email protected]>
>>>> Co-developed-by: Zha Bin<[email protected]>
>>>> Signed-off-by: Zha Bin<[email protected]>
>>>> Co-developed-by: Jing Liu<[email protected]>
>>>> Signed-off-by: Jing Liu<[email protected]>
>>>> Co-developed-by: Chao Peng<[email protected]>
>>>> Signed-off-by: Chao Peng<[email protected]>
>>> Hmm. Any way to make this static so we don't need
>>> base and multiplier?
>> E.g page per vq?
>>
>> Thanks
> Problem is, is page size well defined enough?
> Are there cases where guest and host page sizes differ?
> I suspect there might be.


Right, so it looks better to keep base and multiplier, e.g for vDPA.


>
> But I also think this whole patch is unproven. Is someone actually
> working on QEMU code to support pass-trough of virtio-pci
> as virtio-mmio for nested guests? What's the performance
> gain like?


I don't know.

Thanks


>
> -- MST

2020-02-12 09:35:11

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue


On 2020/2/12 下午4:53, Jason Wang wrote:
>
> On 2020/2/12 下午4:18, Michael S. Tsirkin wrote:
>> On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
>>> On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
>>>> On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
>>>>> From: Liu Jiang<[email protected]>
>>>>>
>>>>> The standard virtio-mmio devices use notification register to signal
>>>>> backend. This will cause vmexits and slow down the performance
>>>>> when we
>>>>> passthrough the virtio-mmio devices to guest virtual machines.
>>>>> We proposed to update virtio over MMIO spec to add the per-queue
>>>>> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
>>>>> configure notify location for each queue.
>>>>>
>>>>> [1]https://lkml.org/lkml/2020/1/21/31
>>>>>
>>>>> Signed-off-by: Liu Jiang<[email protected]>
>>>>> Co-developed-by: Zha Bin<[email protected]>
>>>>> Signed-off-by: Zha Bin<[email protected]>
>>>>> Co-developed-by: Jing Liu<[email protected]>
>>>>> Signed-off-by: Jing Liu<[email protected]>
>>>>> Co-developed-by: Chao Peng<[email protected]>
>>>>> Signed-off-by: Chao Peng<[email protected]>
>>>> Hmm. Any way to make this static so we don't need
>>>> base and multiplier?
>>> E.g page per vq?
>>>
>>> Thanks
>> Problem is, is page size well defined enough?
>> Are there cases where guest and host page sizes differ?
>> I suspect there might be.
>
>
> Right, so it looks better to keep base and multiplier, e.g for vDPA.
>
>
>>
>> But I also think this whole patch is unproven. Is someone actually
>> working on QEMU code to support pass-trough of virtio-pci
>> as virtio-mmio for nested guests? What's the performance
>> gain like?
>
>
> I don't know.
>
> Thanks


Btw, I think there's no need for a nested environment to test. Current
eventfd hook to MSIX should still work for MMIO.

Thanks


2020-02-12 09:56:31

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue

On Wed, Feb 12, 2020 at 05:33:06PM +0800, Jason Wang wrote:
>
> On 2020/2/12 下午4:53, Jason Wang wrote:
> >
> > On 2020/2/12 下午4:18, Michael S. Tsirkin wrote:
> > > On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
> > > > On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
> > > > > On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
> > > > > > From: Liu Jiang<[email protected]>
> > > > > >
> > > > > > The standard virtio-mmio devices use notification register to signal
> > > > > > backend. This will cause vmexits and slow down the
> > > > > > performance when we
> > > > > > passthrough the virtio-mmio devices to guest virtual machines.
> > > > > > We proposed to update virtio over MMIO spec to add the per-queue
> > > > > > notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
> > > > > > configure notify location for each queue.
> > > > > >
> > > > > > [1]https://lkml.org/lkml/2020/1/21/31
> > > > > >
> > > > > > Signed-off-by: Liu Jiang<[email protected]>
> > > > > > Co-developed-by: Zha Bin<[email protected]>
> > > > > > Signed-off-by: Zha Bin<[email protected]>
> > > > > > Co-developed-by: Jing Liu<[email protected]>
> > > > > > Signed-off-by: Jing Liu<[email protected]>
> > > > > > Co-developed-by: Chao Peng<[email protected]>
> > > > > > Signed-off-by: Chao Peng<[email protected]>
> > > > > Hmm. Any way to make this static so we don't need
> > > > > base and multiplier?
> > > > E.g page per vq?
> > > >
> > > > Thanks
> > > Problem is, is page size well defined enough?
> > > Are there cases where guest and host page sizes differ?
> > > I suspect there might be.
> >
> >
> > Right, so it looks better to keep base and multiplier, e.g for vDPA.
> >
> >
> > >
> > > But I also think this whole patch is unproven. Is someone actually
> > > working on QEMU code to support pass-trough of virtio-pci
> > > as virtio-mmio for nested guests? What's the performance
> > > gain like?
> >
> >
> > I don't know.
> >
> > Thanks
>
>
> Btw, I think there's no need for a nested environment to test. Current
> eventfd hook to MSIX should still work for MMIO.
>
> Thanks


Oh yes it's the wildcard thingy but how much extra performance does one get
from it with MMIO? A couple % might not be worth the trouble for MMIO.

--
MST

2020-02-13 03:40:48

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] virtio-mmio: add notify feature for per-queue


On 2020/2/12 下午5:55, Michael S. Tsirkin wrote:
> On Wed, Feb 12, 2020 at 05:33:06PM +0800, Jason Wang wrote:
>> On 2020/2/12 下午4:53, Jason Wang wrote:
>>> On 2020/2/12 下午4:18, Michael S. Tsirkin wrote:
>>>> On Wed, Feb 12, 2020 at 11:39:54AM +0800, Jason Wang wrote:
>>>>> On 2020/2/11 下午7:33, Michael S. Tsirkin wrote:
>>>>>> On Mon, Feb 10, 2020 at 05:05:17PM +0800, Zha Bin wrote:
>>>>>>> From: Liu Jiang<[email protected]>
>>>>>>>
>>>>>>> The standard virtio-mmio devices use notification register to signal
>>>>>>> backend. This will cause vmexits and slow down the
>>>>>>> performance when we
>>>>>>> passthrough the virtio-mmio devices to guest virtual machines.
>>>>>>> We proposed to update virtio over MMIO spec to add the per-queue
>>>>>>> notify feature VIRTIO_F_MMIO_NOTIFICATION[1]. It can allow the VMM to
>>>>>>> configure notify location for each queue.
>>>>>>>
>>>>>>> [1]https://lkml.org/lkml/2020/1/21/31
>>>>>>>
>>>>>>> Signed-off-by: Liu Jiang<[email protected]>
>>>>>>> Co-developed-by: Zha Bin<[email protected]>
>>>>>>> Signed-off-by: Zha Bin<[email protected]>
>>>>>>> Co-developed-by: Jing Liu<[email protected]>
>>>>>>> Signed-off-by: Jing Liu<[email protected]>
>>>>>>> Co-developed-by: Chao Peng<[email protected]>
>>>>>>> Signed-off-by: Chao Peng<[email protected]>
>>>>>> Hmm. Any way to make this static so we don't need
>>>>>> base and multiplier?
>>>>> E.g page per vq?
>>>>>
>>>>> Thanks
>>>> Problem is, is page size well defined enough?
>>>> Are there cases where guest and host page sizes differ?
>>>> I suspect there might be.
>>>
>>> Right, so it looks better to keep base and multiplier, e.g for vDPA.
>>>
>>>
>>>> But I also think this whole patch is unproven. Is someone actually
>>>> working on QEMU code to support pass-trough of virtio-pci
>>>> as virtio-mmio for nested guests? What's the performance
>>>> gain like?
>>>
>>> I don't know.
>>>
>>> Thanks
>>
>> Btw, I think there's no need for a nested environment to test. Current
>> eventfd hook to MSIX should still work for MMIO.
>>
>> Thanks
>
> Oh yes it's the wildcard thingy but how much extra performance does one get
> from it with MMIO? A couple % might not be worth the trouble for MMIO.


The cover letter have some numbers but I'm not sure whether or not it
was measured by vhost or other which needs some clarification.

Thanks


>

2020-07-31 10:16:45

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] virtio mmio specification enhancement

On Thu, Jul 30, 2020 at 08:15:12PM +0000, Pincus, Josh wrote:
> We were looking into a similar enhancement for the Virt I/O MMIO transport and came across this project.
> This enhancement would be perfect for us.
>
> Has there been any progress since Feb, 2020? It looks like the effort might have stalled?

CCing Alex Bennee. I think he recently asked the same question.

Stefan


Attachments:
(No filename) (391.00 B)
signature.asc (499.00 B)
Download all attachments

2020-07-31 15:45:44

by Alex Bennée

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] virtio mmio specification enhancement


Pincus, Josh <[email protected]> writes:

> Hi,
>
>
>
> We were looking into a similar enhancement for the Virt I/O MMIO transport and came across this project.
>
> This enhancement would be perfect for us.

So there is certainly an interest in optimising MMIO based virtio and
the current read/ack cycle adds additional round trip time for any trap
and emulate hypervisor. However I think there is some resistance to
making MMIO a re-implementation of what PCI already gives us for "free".

I believe the current questions that need to be addressed are:

- Clear definitions in the spec on doorbells/notifications

The current virtio spec uses different terms in some places so it
would be nice to clarify the language and formalise what the
standard expects from transports w.r.t the capabilities of
notifications and doorbells.

- Quantifying the memory foot-print difference between PCI/MMIO

PCI gives a lot for free including a discovery and IRQ model already
designed to handle MSI/MSI-X. There is a claim that this brings in a
lot of bloat but I think there was some debate around the numbers.
My rough initial experiment with a PCI and non-PCI build with
otherwise identical VIRTIO configs results in the following:

16:40:15 c.282% [alex@zen:~/l/l/builds] review/rpmb|… + ls -l arm64/vmlinux arm64.nopci/vmlinux
-rwxr-xr-x 1 alex alex 83914728 Jul 31 16:39 arm64.nopci/vmlinux*
-rwxr-xr-x 1 alex alex 86368080 Jul 31 16:33 arm64/vmlinux*

which certainly implies there could be a fair amount of headroom for
an MMIO version to implement some features. However I don't know if
it's fully apples to apples as there maybe unneeded PCI bloat that a
virtio-only kernel doesn't need.


What are the features you are most interested in?

> Has there been any progress since Feb, 2020? It looks like the effort
> might have stalled?

I can't speak to the OP's but there is certainly interest from others
that are not the original posters.


--
Alex Bennée

2020-08-03 16:21:42

by Alex Bennée

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] virtio mmio specification enhancement


Alex Bennée <[email protected]> writes:

> Pincus, Josh <[email protected]> writes:
>
>> Hi,
>>
>>
>>
>> We were looking into a similar enhancement for the Virt I/O MMIO transport and came across this project.
>>
>> This enhancement would be perfect for us.
>
> So there is certainly an interest in optimising MMIO based virtio and
> the current read/ack cycle adds additional round trip time for any trap
> and emulate hypervisor. However I think there is some resistance to
> making MMIO a re-implementation of what PCI already gives us for "free".
<snip>
>
> - Quantifying the memory foot-print difference between PCI/MMIO
>
> PCI gives a lot for free including a discovery and IRQ model already
> designed to handle MSI/MSI-X. There is a claim that this brings in a
> lot of bloat but I think there was some debate around the numbers.
> My rough initial experiment with a PCI and non-PCI build with
> otherwise identical VIRTIO configs results in the following:
>
> 16:40:15 c.282% [alex@zen:~/l/l/builds] review/rpmb|… + ls -l arm64/vmlinux arm64.nopci/vmlinux
> -rwxr-xr-x 1 alex alex 83914728 Jul 31 16:39 arm64.nopci/vmlinux*
> -rwxr-xr-x 1 alex alex 86368080 Jul 31 16:33 arm64/vmlinux*
>
> which certainly implies there could be a fair amount of headroom for
> an MMIO version to implement some features. However I don't know if
> it's fully apples to apples as there maybe unneeded PCI bloat that a
> virtio-only kernel doesn't need.

Just following up after cutting the Xgene and ThunderX PCI bloat from
the kernel the margin is a little smaller:

-rwxr-xr-x 1 alex alex 83914728 Jul 31 16:39 arm64.nopci/vmlinux*
-rwxr-xr-x 1 alex alex 85639808 Aug 3 17:12 arm64/vmlinux*

--
Alex Bennée

2020-08-03 23:34:28

by Pincus, Josh

[permalink] [raw]
Subject: RE: [PATCH v2 0/5] virtio mmio specification enhancement

Hi Alex,

Thank you for the reply.

Please see my inline response below.

-----Original Message-----
From: Alex Bennée <[email protected]>
Sent: Friday, July 31, 2020 8:45 AM
To: Pincus, Josh <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH v2 0/5] virtio mmio specification enhancement


Pincus, Josh <[email protected]> writes:

> Hi,
>
>
>
> We were looking into a similar enhancement for the Virt I/O MMIO transport and came across this project.
>
> This enhancement would be perfect for us.

So there is certainly an interest in optimising MMIO based virtio and the current read/ack cycle adds additional round trip time for any trap and emulate hypervisor. However I think there is some resistance to making MMIO a re-implementation of what PCI already gives us for "free".

I believe the current questions that need to be addressed are:

- Clear definitions in the spec on doorbells/notifications

The current virtio spec uses different terms in some places so it
would be nice to clarify the language and formalise what the
standard expects from transports w.r.t the capabilities of
notifications and doorbells.

[JP] The read/ack cycle not only adds to the round-trip time for any trap and emulate HYP, but it also precludes an environment where one might want to avoid emulation completely. We're interested in using the MMIO transport combined with an augmented device node in the DTB to have device features, reserved memory for queues, and specific MSI interrupts per queue conveyed to the guest statically. In this kind of restricted environment, negotiation for features might be completely disabled; you see what the device node describes and you either support those features or not. Likewise, the standard list of state machine transitions for communicating driver and device state would be skipped. A driver in a guest comes up, reads the device node info, uses the queues as described, and assigns the MSI vectors per queue and config-has-changed service. When an interrupt comes in, there's no need to ack it beyond the normal way in which one conveys an EOI to hardware. It also means that with one dedicated interrupt per queue we won't have to select the queue in question and test which one got updated. In short, we are experimenting with getting rid of the emulation if we can.

- Quantifying the memory foot-print difference between PCI/MMIO

PCI gives a lot for free including a discovery and IRQ model already
designed to handle MSI/MSI-X. There is a claim that this brings in a
lot of bloat but I think there was some debate around the numbers.
My rough initial experiment with a PCI and non-PCI build with
otherwise identical VIRTIO configs results in the following:

16:40:15 c.282% [alex@zen:~/l/l/builds] review/rpmb|… + ls -l arm64/vmlinux arm64.nopci/vmlinux
-rwxr-xr-x 1 alex alex 83914728 Jul 31 16:39 arm64.nopci/vmlinux*
-rwxr-xr-x 1 alex alex 86368080 Jul 31 16:33 arm64/vmlinux*

which certainly implies there could be a fair amount of headroom for
an MMIO version to implement some features. However I don't know if
it's fully apples to apples as there maybe unneeded PCI bloat that a
virtio-only kernel doesn't need.

[JP] Apropos of your subsequent email on this topic, the PCI bloat isn't terrible. The major stumbling block in our case is that we would like to see if there's a restricted model in which the emulation can be removed completely. Case in point: Virt I/O RPMsgs in OpenAMP only use the queues to transfer data back and forth. (Unless I'm mistaken?) We'd like to see if that model can be a bit more generalized so that other kinds of drivers can be constructed that similarly don't rely on emulation for handling interrupt read/ack, feature negotiation, queue selection, etc. Memory is mapped into the guest for queues and R/O device registers, interrupts are assigned in the DTB for each queue, and features are, essentially, non-negotiable.

What are the features you are most interested in?

[JP] See above. ???? The restricted environment in question is for very simple applications that don't have any kind of PCI infrastructure and for virtual environments with no HYP or a very restricted HYP.

> Has there been any progress since Feb, 2020? It looks like the effort
> might have stalled?

I can't speak to the OP's but there is certainly interest from others that are not the original posters.

[JP] Maybe we can restart the thread/discussion and see where it goes from here.

--
Alex Bennée