2019-12-25 02:52:55

by Zha Bin

[permalink] [raw]
Subject: [PATCH v1 0/2] support virtio mmio specification Version 3

In cloud native environment, we need a lightweight and secure system. It
should benefit from the speed of containers and the security of VM, which
is classified as secure containers. The traditional solution of cloud VM
is Qemu. In fact we don't need to pay for the legacy devices. Currently,
more userspace VMMs, e.g. Qemu, Firecracker, Cloud Hypervisor and Alibaba
Cloud VMM which is called Dragonball, began to pay attention to a
lightweight solution.

The lightweight VMM is suitable to cloud native infrastructure which is
designed for creating secure sandbox to address the requirements of
multi-tenant. Meanwhile, with faster startup time and lower memory
overhead, it makes possible to launch thousands of microVMs on the same
machine. This VMM minimizes the emulation devices and uses virtio-mmio to
get a more lightweight transport layer. The virtio-mmio devices have less
code than virtio-pci, which can decrease boot time and increase deploy
density by customizing kernel such as setting pci=off. From another point
of view, the minimal device can reduce the attack surface.

We have compared the number of files and the lines of code between
virtio-mmio and virio-pci.

Virtio-PCI Virtio-MMIO
number of files(Linux) 161 1
lines of code(Linux) 78237 538
number of files(Qemu) 24 1
lines of code(Qemu) 8952 421

But the current standard virtio-mmio spec has some limitations which is
only support legacy interrupt and will cause performance penalties.

To address such limitation, we proposed to update virtio-mmio spec to
version 3 to add the MSI interrupt and doorbell notify features[1], which
can achieve the same performance as virtio-pci devices with only around 400
lines of code.

Here are the performance gain of MSI interrupt in virtio-mmio. Each case is
repeated three times.

netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024

Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
trans/s 9536 6939 9500
trans/s 9734 7029 9749
trans/s 9894 7095 9318

With the virtio spec proposal[1], other VMMs (e.g. Qemu) can also make use
of the new features to get a enhancing performance.

[1] https://lkml.org/lkml/2019/12/20/113

Liu Jiang (2):
x86/msi: Enhance x86 to support platform_msi
virtio-mmio: add features for virtio-mmio specification version 3

arch/x86/Kconfig | 6 ++
arch/x86/include/asm/hw_irq.h | 5 +
arch/x86/include/asm/irqdomain.h | 9 ++
arch/x86/kernel/apic/msi.c | 74 +++++++++++++
drivers/base/platform-msi.c | 4 +-
drivers/virtio/virtio_mmio.c | 226 ++++++++++++++++++++++++++++++++++++---
include/linux/msi.h | 1 +
include/uapi/linux/virtio_mmio.h | 28 +++++
8 files changed, 337 insertions(+), 16 deletions(-)

--
1.8.3.1


2019-12-25 02:52:55

by Zha Bin

[permalink] [raw]
Subject: [PATCH v1 1/2] x86/msi: Enhance x86 to support platform_msi

From: Liu Jiang <[email protected]>

The x86 platform currently only supports specific MSI/MSI-x for PCI
devices. To enable MSI to the platform devices such as virtio-mmio
device, this patch enhances x86 with platform MSI by leveraging the
already built-in platform-msi driver (drivers/base/platform-msi.c)
and makes it as a configurable option.

Signed-off-by: Liu Jiang <[email protected]>
Signed-off-by: Zha Bin <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Jing Liu <[email protected]>
---
arch/x86/Kconfig | 6 ++++
arch/x86/include/asm/hw_irq.h | 5 +++
arch/x86/include/asm/irqdomain.h | 9 +++++
arch/x86/kernel/apic/msi.c | 74 ++++++++++++++++++++++++++++++++++++++++
drivers/base/platform-msi.c | 4 +--
include/linux/msi.h | 1 +
6 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5e89499..c1178f2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1074,6 +1074,12 @@ config X86_IO_APIC
def_bool y
depends on X86_LOCAL_APIC || X86_UP_IOAPIC

+config X86_PLATFORM_MSI
+ def_bool y
+ depends on X86_LOCAL_APIC && X86_64
+ select GENERIC_MSI_IRQ
+ select GENERIC_MSI_IRQ_DOMAIN
+
config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
bool "Reroute for broken boot IRQs"
depends on X86_IO_APIC
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 4154bc5..5c96b49 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -113,6 +113,11 @@ struct irq_alloc_info {
struct msi_desc *desc;
};
#endif
+#ifdef CONFIG_X86_PLATFORM_MSI
+ struct {
+ irq_hw_number_t platform_msi_hwirq;
+ };
+#endif
};
};

diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index c066ffa..b53f51f 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -56,4 +56,13 @@ extern void mp_irqdomain_deactivate(struct irq_domain *domain,
static inline void arch_init_msi_domain(struct irq_domain *domain) { }
#endif

+#ifdef CONFIG_X86_PLATFORM_MSI
+extern struct irq_domain *platform_msi_get_def_irq_domain(void);
+#else
+static inline struct irq_domain *platform_msi_get_def_irq_domain(void)
+{
+ return NULL;
+}
+#endif
+
#endif
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 7f75334..ef558c7 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -22,6 +22,10 @@
#include <asm/irq_remapping.h>

static struct irq_domain *msi_default_domain;
+#ifdef CONFIG_X86_PLATFORM_MSI
+static struct irq_domain *platform_msi_default_domain;
+static struct msi_domain_info platform_msi_domain_info;
+#endif

static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
{
@@ -146,6 +150,17 @@ void __init arch_init_msi_domain(struct irq_domain *parent)
}
if (!msi_default_domain)
pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
+#ifdef CONFIG_X86_PLATFORM_MSI
+ fn = irq_domain_alloc_named_fwnode("PLATFORM-MSI");
+ if (fn) {
+ platform_msi_default_domain =
+ platform_msi_create_irq_domain(fn,
+ &platform_msi_domain_info, parent);
+ irq_domain_free_fwnode(fn);
+ }
+ if (!platform_msi_default_domain)
+ pr_warn("failed to initialize irqdomain for PLATFORM-MSI.\n");
+#endif
}

#ifdef CONFIG_IRQ_REMAP
@@ -384,3 +399,62 @@ int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
}
#endif
+
+#ifdef CONFIG_X86_PLATFORM_MSI
+static void platform_msi_mask_irq(struct irq_data *data)
+{
+}
+
+static void platform_msi_unmask_irq(struct irq_data *data)
+{
+}
+
+static struct irq_chip platform_msi_controller = {
+ .name = "PLATFORM-MSI",
+ .irq_mask = platform_msi_mask_irq,
+ .irq_unmask = platform_msi_unmask_irq,
+ .irq_ack = irq_chip_ack_parent,
+ .irq_retrigger = irq_chip_retrigger_hierarchy,
+ .irq_compose_msi_msg = irq_msi_compose_msg,
+ .flags = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static int platform_msi_prepare(struct irq_domain *domain, struct device *dev,
+ int nvec, msi_alloc_info_t *arg)
+{
+ memset(arg, 0, sizeof(*arg));
+ return 0;
+}
+
+static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+ arg->platform_msi_hwirq = platform_msi_calc_hwirq(desc);
+}
+
+static irq_hw_number_t platform_msi_get_hwirq(struct msi_domain_info *info,
+ msi_alloc_info_t *arg)
+{
+ return arg->platform_msi_hwirq;
+}
+
+static struct msi_domain_ops platform_msi_domain_ops = {
+ .msi_prepare = platform_msi_prepare,
+ .set_desc = platform_msi_set_desc,
+ .get_hwirq = platform_msi_get_hwirq,
+};
+
+static struct msi_domain_info platform_msi_domain_info = {
+ .flags = MSI_FLAG_USE_DEF_DOM_OPS |
+ MSI_FLAG_USE_DEF_CHIP_OPS |
+ MSI_FLAG_ACTIVATE_EARLY,
+ .ops = &platform_msi_domain_ops,
+ .chip = &platform_msi_controller,
+ .handler = handle_edge_irq,
+ .handler_name = "edge",
+};
+
+struct irq_domain *platform_msi_get_def_irq_domain(void)
+{
+ return platform_msi_default_domain;
+}
+#endif
diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 8da314b..45752f1 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -31,12 +31,11 @@ struct platform_msi_priv_data {
/* The devid allocator */
static DEFINE_IDA(platform_msi_devid_ida);

-#ifdef GENERIC_MSI_DOMAIN_OPS
/*
* Convert an msi_desc to a globaly unique identifier (per-device
* devid + msi_desc position in the msi_list).
*/
-static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
+irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
{
u32 devid;

@@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
}

+#ifdef GENERIC_MSI_DOMAIN_OPS
static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
{
arg->desc = desc;
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8ad679e..ee5f566 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -362,6 +362,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
unsigned int nvec);
void *platform_msi_get_host_data(struct irq_domain *domain);
+irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc);
#endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */

#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
--
1.8.3.1

2019-12-25 02:53:02

by Zha Bin

[permalink] [raw]
Subject: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

From: Liu Jiang <[email protected]>

Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
virtio over mmio devices as a lightweight machine model for modern
cloud. The standard virtio over MMIO transport layer only supports one
legacy interrupt, which is much heavier than virtio over PCI transport
layer using MSI. Legacy interrupt has long work path and causes specific
VMExits in following cases, which would considerably slow down the
performance:

1) read interrupt status register
2) update interrupt status register
3) write IOAPIC EOI register

We proposed to update virtio over MMIO to version 3[1] to add the
following new features and enhance the performance.

1) Support Message Signaled Interrupt(MSI), which increases the
interrupt performance for virtio multi-queue devices
2) Support per-queue doorbell, so the guest kernel may directly write
to the doorbells provided by virtio devices.

The following is the network tcp_rr performance testing report, tested
with virtio-pci device, vanilla virtio-mmio device and patched
virtio-mmio device (run test 3 times for each case):

netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024

Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
trans/s 9536 6939 9500
trans/s 9734 7029 9749
trans/s 9894 7095 9318

[1] https://lkml.org/lkml/2019/12/20/113

Signed-off-by: Liu Jiang <[email protected]>
Signed-off-by: Zha Bin <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Jing Liu <[email protected]>
---
drivers/virtio/virtio_mmio.c | 226 ++++++++++++++++++++++++++++++++++++---
include/uapi/linux/virtio_mmio.h | 28 +++++
2 files changed, 240 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index e09edb5..065c083 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -68,8 +68,8 @@
#include <linux/virtio_config.h>
#include <uapi/linux/virtio_mmio.h>
#include <linux/virtio_ring.h>
-
-
+#include <linux/msi.h>
+#include <asm/irqdomain.h>

/* The alignment to use between consumer and producer parts of vring.
* Currently hardcoded to the page size. */
@@ -90,6 +90,12 @@ struct virtio_mmio_device {
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
+
+ int doorbell_base;
+ int doorbell_scale;
+ bool msi_enabled;
+ /* Name strings for interrupts. */
+ char (*vm_vq_names)[256];
};

struct virtio_mmio_vq_info {
@@ -101,6 +107,8 @@ struct virtio_mmio_vq_info {
};


+static void vm_free_msi_irqs(struct virtio_device *vdev);
+static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs);

/* Configuration interface */

@@ -273,12 +281,28 @@ static bool vm_notify(struct virtqueue *vq)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);

+ if (vm_dev->version == 3) {
+ int offset = vm_dev->doorbell_base +
+ vm_dev->doorbell_scale * vq->index;
+ writel(vq->index, vm_dev->base + offset);
+ } else
/* We write the queue's selector into the notification register to
* signal the other end */
- writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
+ writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
+
return true;
}

+/* Handle a configuration change */
+static irqreturn_t vm_config_changed(int irq, void *opaque)
+{
+ struct virtio_mmio_device *vm_dev = opaque;
+
+ virtio_config_changed(&vm_dev->vdev);
+
+ return IRQ_HANDLED;
+}
+
/* Notify all virtqueues on an interrupt. */
static irqreturn_t vm_interrupt(int irq, void *opaque)
{
@@ -307,7 +331,12 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
return ret;
}

+static int vm_msi_irq_vector(struct device *dev, unsigned int nr)
+{
+ struct msi_desc *entry = first_msi_entry(dev);

+ return entry->irq + nr;
+}

static void vm_del_vq(struct virtqueue *vq)
{
@@ -316,6 +345,12 @@ static void vm_del_vq(struct virtqueue *vq)
unsigned long flags;
unsigned int index = vq->index;

+ if (vm_dev->version == 3 && vm_dev->msi_enabled) {
+ int irq = vm_msi_irq_vector(&vq->vdev->dev, index + 1);
+
+ free_irq(irq, vq);
+ }
+
spin_lock_irqsave(&vm_dev->lock, flags);
list_del(&info->node);
spin_unlock_irqrestore(&vm_dev->lock, flags);
@@ -334,15 +369,41 @@ static void vm_del_vq(struct virtqueue *vq)
kfree(info);
}

-static void vm_del_vqs(struct virtio_device *vdev)
+static void vm_free_irqs(struct virtio_device *vdev)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+ if (vm_dev->version == 3)
+ vm_free_msi_irqs(vdev);
+ else
+ free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
+}
+
+static void vm_del_vqs(struct virtio_device *vdev)
+{
struct virtqueue *vq, *n;

list_for_each_entry_safe(vq, n, &vdev->vqs, list)
vm_del_vq(vq);

- free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
+ vm_free_irqs(vdev);
+}
+
+static int vm_request_single_irq(struct virtio_device *vdev)
+{
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+ int irq = platform_get_irq(vm_dev->pdev, 0);
+ int err;
+
+ if (irq < 0) {
+ dev_err(&vdev->dev, "Cannot get IRQ resource\n");
+ return irq;
+ }
+
+ err = request_irq(irq, vm_interrupt, IRQF_SHARED,
+ dev_name(&vdev->dev), vm_dev);
+
+ return err;
}

static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
@@ -455,6 +516,72 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
return ERR_PTR(err);
}

+static inline void mmio_msi_set_enable(struct virtio_mmio_device *vm_dev,
+ int enable)
+{
+ u16 status;
+
+ status = readw(vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
+ if (enable)
+ status |= VIRTIO_MMIO_MSI_ENABLE_MASK;
+ else
+ status &= ~VIRTIO_MMIO_MSI_ENABLE_MASK;
+ writew(status, vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
+}
+
+static int vm_find_vqs_msi(struct virtio_device *vdev, unsigned int nvqs,
+ struct virtqueue *vqs[], vq_callback_t *callbacks[],
+ const char * const names[], const bool *ctx,
+ struct irq_affinity *desc)
+{
+ int i, err, irq;
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+ /* Allocate nvqs irqs for queues and one irq for configuration */
+ err = vm_request_msi_vectors(vdev, nvqs + 1);
+ if (err != 0)
+ return err;
+
+ for (i = 0; i < nvqs; i++) {
+ int index = i + 1;
+
+ if (!names[i]) {
+ vqs[i] = NULL;
+ continue;
+ }
+ vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
+ ctx ? ctx[i] : false);
+ if (IS_ERR(vqs[i])) {
+ err = PTR_ERR(vqs[i]);
+ goto error_find;
+ }
+
+ if (!callbacks[i])
+ continue;
+
+ irq = vm_msi_irq_vector(&vqs[i]->vdev->dev, index);
+ /* allocate per-vq irq if available and necessary */
+ snprintf(vm_dev->vm_vq_names[index],
+ sizeof(*vm_dev->vm_vq_names),
+ "%s-%s",
+ dev_name(&vm_dev->vdev.dev), names[i]);
+ err = request_irq(irq, vring_interrupt, 0,
+ vm_dev->vm_vq_names[index], vqs[i]);
+
+ if (err)
+ goto error_find;
+ }
+
+ mmio_msi_set_enable(vm_dev, 1);
+ vm_dev->msi_enabled = true;
+
+ return 0;
+
+error_find:
+ vm_del_vqs(vdev);
+ return err;
+}
+
static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
@@ -463,16 +590,16 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
struct irq_affinity *desc)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
- int irq = platform_get_irq(vm_dev->pdev, 0);
int i, err, queue_idx = 0;

- if (irq < 0) {
- dev_err(&vdev->dev, "Cannot get IRQ resource\n");
- return irq;
+ if (vm_dev->version == 3) {
+ err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
+ names, ctx, desc);
+ if (!err)
+ return 0;
}

- err = request_irq(irq, vm_interrupt, IRQF_SHARED,
- dev_name(&vdev->dev), vm_dev);
+ err = vm_request_single_irq(vdev);
if (err)
return err;

@@ -493,6 +620,73 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
return 0;
}

+static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
+{
+ struct device *dev = desc->dev;
+ struct virtio_device *vdev = dev_to_virtio(dev);
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+ void __iomem *pos = vm_dev->base;
+ uint16_t cmd = VIRTIO_MMIO_MSI_CMD(VIRTIO_MMIO_MSI_CMD_UPDATE,
+ desc->platform.msi_index);
+
+ writel(msg->address_lo, pos + VIRTIO_MMIO_MSI_ADDRESS_LOW);
+ writel(msg->address_hi, pos + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
+ writel(msg->data, pos + VIRTIO_MMIO_MSI_DATA);
+ writew(cmd, pos + VIRTIO_MMIO_MSI_COMMAND);
+}
+
+static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs)
+{
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+ int irq, err;
+ u16 csr = readw(vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
+
+ if (vm_dev->msi_enabled || (csr & VIRTIO_MMIO_MSI_ENABLE_MASK) == 0)
+ return -EINVAL;
+
+ vm_dev->vm_vq_names = kmalloc_array(nirqs, sizeof(*vm_dev->vm_vq_names),
+ GFP_KERNEL);
+ if (!vm_dev->vm_vq_names)
+ return -ENOMEM;
+
+ if (!vdev->dev.msi_domain)
+ vdev->dev.msi_domain = platform_msi_get_def_irq_domain();
+ err = platform_msi_domain_alloc_irqs(&vdev->dev, nirqs,
+ mmio_write_msi_msg);
+ if (err)
+ goto error_free_mem;
+
+ /* The first MSI vector is used for configuration change events. */
+ snprintf(vm_dev->vm_vq_names[0], sizeof(*vm_dev->vm_vq_names),
+ "%s-config", dev_name(&vdev->dev));
+ irq = vm_msi_irq_vector(&vdev->dev, 0);
+ err = request_irq(irq, vm_config_changed, 0, vm_dev->vm_vq_names[0],
+ vm_dev);
+ if (err)
+ goto error_free_mem;
+
+ return 0;
+
+error_free_mem:
+ kfree(vm_dev->vm_vq_names);
+ vm_dev->vm_vq_names = NULL;
+ return err;
+}
+
+static void vm_free_msi_irqs(struct virtio_device *vdev)
+{
+ struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+ if (vm_dev->msi_enabled) {
+ mmio_msi_set_enable(vm_dev, 0);
+ free_irq(vm_msi_irq_vector(&vdev->dev, 0), vm_dev);
+ platform_msi_domain_free_irqs(&vdev->dev);
+ kfree(vm_dev->vm_vq_names);
+ vm_dev->vm_vq_names = NULL;
+ vm_dev->msi_enabled = false;
+ }
+}
+
static const char *vm_bus_name(struct virtio_device *vdev)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
@@ -567,7 +761,7 @@ static int virtio_mmio_probe(struct platform_device *pdev)

/* Check device version */
vm_dev->version = readl(vm_dev->base + VIRTIO_MMIO_VERSION);
- if (vm_dev->version < 1 || vm_dev->version > 2) {
+ if (vm_dev->version < 1 || vm_dev->version > 3) {
dev_err(&pdev->dev, "Version %ld not supported!\n",
vm_dev->version);
return -ENXIO;
@@ -603,6 +797,12 @@ static int virtio_mmio_probe(struct platform_device *pdev)
dev_warn(&pdev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n");

platform_set_drvdata(pdev, vm_dev);
+ if (vm_dev->version == 3) {
+ u32 db = readl(vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
+
+ vm_dev->doorbell_base = db & 0xffff;
+ vm_dev->doorbell_scale = (db >> 16) & 0xffff;
+ }

rc = register_virtio_device(&vm_dev->vdev);
if (rc)
@@ -619,8 +819,6 @@ static int virtio_mmio_remove(struct platform_device *pdev)
return 0;
}

-
-
/* Devices list parameter */

#if defined(CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES)
diff --git a/include/uapi/linux/virtio_mmio.h b/include/uapi/linux/virtio_mmio.h
index c4b0968..148895e 100644
--- a/include/uapi/linux/virtio_mmio.h
+++ b/include/uapi/linux/virtio_mmio.h
@@ -122,6 +122,34 @@
#define VIRTIO_MMIO_QUEUE_USED_LOW 0x0a0
#define VIRTIO_MMIO_QUEUE_USED_HIGH 0x0a4

+/* MSI related registers */
+
+/* MSI status register */
+#define VIRTIO_MMIO_MSI_STATUS 0x0c0
+/* MSI command register */
+#define VIRTIO_MMIO_MSI_COMMAND 0x0c2
+/* MSI low 32 bit address, 64 bits in two halves */
+#define VIRTIO_MMIO_MSI_ADDRESS_LOW 0x0c4
+/* MSI high 32 bit address, 64 bits in two halves */
+#define VIRTIO_MMIO_MSI_ADDRESS_HIGH 0x0c8
+/* MSI data */
+#define VIRTIO_MMIO_MSI_DATA 0x0cc
+
+/* RO: MSI feature enabled mask */
+#define VIRTIO_MMIO_MSI_ENABLE_MASK 0x8000
+/* RO: Maximum queue size available */
+#define VIRTIO_MMIO_MSI_STATUS_QMASK 0x07ff
+/* Reserved */
+#define VIRTIO_MMIO_MSI_STATUS_RESERVED 0x7800
+
+#define VIRTIO_MMIO_MSI_CMD_UPDATE 0x1
+#define VIRTIO_MMIO_MSI_CMD_OP_MASK 0xf000
+#define VIRTIO_MMIO_MSI_CMD_ARG_MASK 0x0fff
+#define VIRTIO_MMIO_MSI_CMD(op, arg) ((op) << 12 | (arg))
+#define VIRITO_MMIO_MSI_CMD_ARG(cmd) ((cmd) & VIRTIO_MMIO_MSI_CMD_ARG_MASK)
+#define VIRTIO_MMIO_MSI_CMD_OP(cmd) \
+ (((cmd) & VIRTIO_MMIO_MSI_CMD_OP_MASK) >> 12)
+
/* Configuration atomicity value */
#define VIRTIO_MMIO_CONFIG_GENERATION 0x0fc

--
1.8.3.1

2019-12-25 10:24:36

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 2019/12/25 上午10:50, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> virtio over mmio devices as a lightweight machine model for modern
> cloud. The standard virtio over MMIO transport layer only supports one
> legacy interrupt, which is much heavier than virtio over PCI transport
> layer using MSI. Legacy interrupt has long work path and causes specific
> VMExits in following cases, which would considerably slow down the
> performance:
>
> 1) read interrupt status register
> 2) update interrupt status register
> 3) write IOAPIC EOI register
>
> We proposed to update virtio over MMIO to version 3[1] to add the
> following new features and enhance the performance.
>
> 1) Support Message Signaled Interrupt(MSI), which increases the
> interrupt performance for virtio multi-queue devices
> 2) Support per-queue doorbell, so the guest kernel may directly write
> to the doorbells provided by virtio devices.
>
> The following is the network tcp_rr performance testing report, tested
> with virtio-pci device, vanilla virtio-mmio device and patched
> virtio-mmio device (run test 3 times for each case):
>
> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>
> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
> trans/s 9536 6939 9500
> trans/s 9734 7029 9749
> trans/s 9894 7095 9318
>
> [1] https://lkml.org/lkml/2019/12/20/113


Thanks for the patch. Two questions after a quick glance:

1) In PCI we choose to support MSI-X instead of MSI for having extra
flexibility like alias, independent data and address (e.g for affinity)
. Any reason for not start from MSI-X? E.g having MSI-X table and PBA
(both of which looks pretty independent).
2) It's better to split notify_multiplexer out of MSI support to ease
the reviewers (apply to spec patch as well)

Thanks

2019-12-25 15:24:46

by Liu, Jiang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3



> On Dec 25, 2019, at 6:20 PM, Jason Wang <[email protected]> wrote:
>
>
> On 2019/12/25 上午10:50, Zha Bin wrote:
>> From: Liu Jiang <[email protected]>
>>
>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>> virtio over mmio devices as a lightweight machine model for modern
>> cloud. The standard virtio over MMIO transport layer only supports one
>> legacy interrupt, which is much heavier than virtio over PCI transport
>> layer using MSI. Legacy interrupt has long work path and causes specific
>> VMExits in following cases, which would considerably slow down the
>> performance:
>>
>> 1) read interrupt status register
>> 2) update interrupt status register
>> 3) write IOAPIC EOI register
>>
>> We proposed to update virtio over MMIO to version 3[1] to add the
>> following new features and enhance the performance.
>>
>> 1) Support Message Signaled Interrupt(MSI), which increases the
>> interrupt performance for virtio multi-queue devices
>> 2) Support per-queue doorbell, so the guest kernel may directly write
>> to the doorbells provided by virtio devices.
>>
>> The following is the network tcp_rr performance testing report, tested
>> with virtio-pci device, vanilla virtio-mmio device and patched
>> virtio-mmio device (run test 3 times for each case):
>>
>> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>
>> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
>> trans/s 9536 6939 9500
>> trans/s 9734 7029 9749
>> trans/s 9894 7095 9318
>>
>> [1] https://lkml.org/lkml/2019/12/20/113
>
>
> Thanks for the patch. Two questions after a quick glance:
>
> 1) In PCI we choose to support MSI-X instead of MSI for having extra flexibility like alias, independent data and address (e.g for affinity) . Any reason for not start from MSI-X? E.g having MSI-X table and PBA (both of which looks pretty independent).
Hi Jason,
Thanks for reviewing patches on Christmas Day:)
The PCI MSI-x has several advantages over PCI MSI, mainly
1) support 2048 vectors, much more than 32 vectors supported by MSI.
2) dedicated address/data for each vector,
3) per vector mask/pending bits.
The proposed MMIO MSI extension supports both 1) and 2), but the extension doesn’t support 3) because
we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.

On the other hand, we want to simplify VMM implementations as simple as possible, and mimicking the PCI MSI-x
will cause some complexity to VMM implementations.

> 2) It's better to split notify_multiplexer out of MSI support to ease the reviewers (apply to spec patch as well)
Great suggestion, we will try to split the patch.

Thanks,
Gerry

>
> Thanks

2019-12-25 22:32:22

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

Hi Zha,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[cannot apply to tip/auto-latest linux/master linus/master v5.5-rc3 next-20191220]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Zha-Bin/support-virtio-mmio-specification-Version-3/20191226-041757
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 004e8dce9c5595697951f7cd0e9f66b35c92265e
config: alpha-randconfig-a001-20191225 (attached as .config)
compiler: alpha-linux-gcc (GCC) 7.5.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.5.0 make.cross ARCH=alpha

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

>> drivers//virtio/virtio_mmio.c:72:10: fatal error: asm/irqdomain.h: No such file or directory
#include <asm/irqdomain.h>
^~~~~~~~~~~~~~~~~
compilation terminated.

vim +72 drivers//virtio/virtio_mmio.c

56
57 #include <linux/acpi.h>
58 #include <linux/dma-mapping.h>
59 #include <linux/highmem.h>
60 #include <linux/interrupt.h>
61 #include <linux/io.h>
62 #include <linux/list.h>
63 #include <linux/module.h>
64 #include <linux/platform_device.h>
65 #include <linux/slab.h>
66 #include <linux/spinlock.h>
67 #include <linux/virtio.h>
68 #include <linux/virtio_config.h>
69 #include <uapi/linux/virtio_mmio.h>
70 #include <linux/virtio_ring.h>
71 #include <linux/msi.h>
> 72 #include <asm/irqdomain.h>
73

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation


Attachments:
(No filename) (2.10 kB)
.config.gz (32.43 kB)
Download all attachments

2019-12-26 01:45:42

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

Hi Zha,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[cannot apply to tip/auto-latest linux/master linus/master v5.5-rc3 next-20191219]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Zha-Bin/support-virtio-mmio-specification-Version-3/20191226-041757
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 004e8dce9c5595697951f7cd0e9f66b35c92265e
config: i386-randconfig-c001-20191225 (attached as .config)
compiler: gcc-7 (Debian 7.5.0-3) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

In file included from include/linux/list.h:9:0,
from include/linux/kobject.h:19,
from include/linux/of.h:17,
from include/linux/irqdomain.h:35,
from include/linux/acpi.h:13,
from drivers//virtio/virtio_mmio.c:57:
drivers//virtio/virtio_mmio.c: In function 'vm_msi_irq_vector':
>> include/linux/msi.h:135:38: error: 'struct device' has no member named 'msi_list'
#define dev_to_msi_list(dev) (&(dev)->msi_list)
^
include/linux/kernel.h:993:26: note: in definition of macro 'container_of'
void *__mptr = (void *)(ptr); \
^~~
>> include/linux/list.h:490:2: note: in expansion of macro 'list_entry'
list_entry((ptr)->next, type, member)
^~~~~~~~~~
>> include/linux/msi.h:137:2: note: in expansion of macro 'list_first_entry'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~~
>> include/linux/msi.h:137:19: note: in expansion of macro 'dev_to_msi_list'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~
>> drivers//virtio/virtio_mmio.c:336:27: note: in expansion of macro 'first_msi_entry'
struct msi_desc *entry = first_msi_entry(dev);
^~~~~~~~~~~~~~~
In file included from include/linux/ioport.h:13:0,
from include/linux/acpi.h:12,
from drivers//virtio/virtio_mmio.c:57:
>> include/linux/msi.h:135:38: error: 'struct device' has no member named 'msi_list'
#define dev_to_msi_list(dev) (&(dev)->msi_list)
^
include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
if (!(condition)) \
^~~~~~~~~
include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
include/linux/kernel.h:994:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^~~~~~~~~~~~~~~~
include/linux/kernel.h:994:20: note: in expansion of macro '__same_type'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^~~~~~~~~~~
>> include/linux/list.h:479:2: note: in expansion of macro 'container_of'
container_of(ptr, type, member)
^~~~~~~~~~~~
>> include/linux/list.h:490:2: note: in expansion of macro 'list_entry'
list_entry((ptr)->next, type, member)
^~~~~~~~~~
>> include/linux/msi.h:137:2: note: in expansion of macro 'list_first_entry'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~~
>> include/linux/msi.h:137:19: note: in expansion of macro 'dev_to_msi_list'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~
>> drivers//virtio/virtio_mmio.c:336:27: note: in expansion of macro 'first_msi_entry'
struct msi_desc *entry = first_msi_entry(dev);
^~~~~~~~~~~~~~~
>> include/linux/msi.h:135:38: error: 'struct device' has no member named 'msi_list'
#define dev_to_msi_list(dev) (&(dev)->msi_list)
^
include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
if (!(condition)) \
^~~~~~~~~
include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
include/linux/kernel.h:994:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^~~~~~~~~~~~~~~~
include/linux/kernel.h:995:6: note: in expansion of macro '__same_type'
!__same_type(*(ptr), void), \
^~~~~~~~~~~
>> include/linux/list.h:479:2: note: in expansion of macro 'container_of'
container_of(ptr, type, member)
^~~~~~~~~~~~
>> include/linux/list.h:490:2: note: in expansion of macro 'list_entry'
list_entry((ptr)->next, type, member)
^~~~~~~~~~
>> include/linux/msi.h:137:2: note: in expansion of macro 'list_first_entry'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~~
>> include/linux/msi.h:137:19: note: in expansion of macro 'dev_to_msi_list'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~
>> drivers//virtio/virtio_mmio.c:336:27: note: in expansion of macro 'first_msi_entry'
struct msi_desc *entry = first_msi_entry(dev);
^~~~~~~~~~~~~~~
drivers//virtio/virtio_mmio.c: In function 'vm_request_msi_vectors':
>> drivers//virtio/virtio_mmio.c:652:17: error: 'struct device' has no member named 'msi_domain'; did you mean 'pm_domain'?
if (!vdev->dev.msi_domain)
^~~~~~~~~~
pm_domain
drivers//virtio/virtio_mmio.c:653:13: error: 'struct device' has no member named 'msi_domain'; did you mean 'pm_domain'?
vdev->dev.msi_domain = platform_msi_get_def_irq_domain();
^~~~~~~~~~
pm_domain
>> drivers//virtio/virtio_mmio.c:654:8: error: implicit declaration of function 'platform_msi_domain_alloc_irqs'; did you mean 'irq_domain_alloc_irqs'? [-Werror=implicit-function-declaration]
err = platform_msi_domain_alloc_irqs(&vdev->dev, nirqs,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
irq_domain_alloc_irqs
drivers//virtio/virtio_mmio.c: In function 'vm_free_msi_irqs':
>> drivers//virtio/virtio_mmio.c:683:3: error: implicit declaration of function 'platform_msi_domain_free_irqs'; did you mean 'irq_domain_free_irqs'? [-Werror=implicit-function-declaration]
platform_msi_domain_free_irqs(&vdev->dev);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
irq_domain_free_irqs
cc1: some warnings being treated as errors
--
In file included from include/linux/list.h:9:0,
from include/linux/kobject.h:19,
from include/linux/of.h:17,
from include/linux/irqdomain.h:35,
from include/linux/acpi.h:13,
from drivers/virtio/virtio_mmio.c:57:
drivers/virtio/virtio_mmio.c: In function 'vm_msi_irq_vector':
>> include/linux/msi.h:135:38: error: 'struct device' has no member named 'msi_list'
#define dev_to_msi_list(dev) (&(dev)->msi_list)
^
include/linux/kernel.h:993:26: note: in definition of macro 'container_of'
void *__mptr = (void *)(ptr); \
^~~
>> include/linux/list.h:490:2: note: in expansion of macro 'list_entry'
list_entry((ptr)->next, type, member)
^~~~~~~~~~
>> include/linux/msi.h:137:2: note: in expansion of macro 'list_first_entry'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~~
>> include/linux/msi.h:137:19: note: in expansion of macro 'dev_to_msi_list'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~
drivers/virtio/virtio_mmio.c:336:27: note: in expansion of macro 'first_msi_entry'
struct msi_desc *entry = first_msi_entry(dev);
^~~~~~~~~~~~~~~
In file included from include/linux/ioport.h:13:0,
from include/linux/acpi.h:12,
from drivers/virtio/virtio_mmio.c:57:
>> include/linux/msi.h:135:38: error: 'struct device' has no member named 'msi_list'
#define dev_to_msi_list(dev) (&(dev)->msi_list)
^
include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
if (!(condition)) \
^~~~~~~~~
include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
include/linux/kernel.h:994:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^~~~~~~~~~~~~~~~
include/linux/kernel.h:994:20: note: in expansion of macro '__same_type'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^~~~~~~~~~~
>> include/linux/list.h:479:2: note: in expansion of macro 'container_of'
container_of(ptr, type, member)
^~~~~~~~~~~~
>> include/linux/list.h:490:2: note: in expansion of macro 'list_entry'
list_entry((ptr)->next, type, member)
^~~~~~~~~~
>> include/linux/msi.h:137:2: note: in expansion of macro 'list_first_entry'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~~
>> include/linux/msi.h:137:19: note: in expansion of macro 'dev_to_msi_list'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~
drivers/virtio/virtio_mmio.c:336:27: note: in expansion of macro 'first_msi_entry'
struct msi_desc *entry = first_msi_entry(dev);
^~~~~~~~~~~~~~~
>> include/linux/msi.h:135:38: error: 'struct device' has no member named 'msi_list'
#define dev_to_msi_list(dev) (&(dev)->msi_list)
^
include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
if (!(condition)) \
^~~~~~~~~
include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
include/linux/kernel.h:994:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^~~~~~~~~~~~~~~~
include/linux/kernel.h:995:6: note: in expansion of macro '__same_type'
!__same_type(*(ptr), void), \
^~~~~~~~~~~
>> include/linux/list.h:479:2: note: in expansion of macro 'container_of'
container_of(ptr, type, member)
^~~~~~~~~~~~
>> include/linux/list.h:490:2: note: in expansion of macro 'list_entry'
list_entry((ptr)->next, type, member)
^~~~~~~~~~
>> include/linux/msi.h:137:2: note: in expansion of macro 'list_first_entry'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~~
>> include/linux/msi.h:137:19: note: in expansion of macro 'dev_to_msi_list'
list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
^~~~~~~~~~~~~~~
drivers/virtio/virtio_mmio.c:336:27: note: in expansion of macro 'first_msi_entry'
struct msi_desc *entry = first_msi_entry(dev);
^~~~~~~~~~~~~~~
drivers/virtio/virtio_mmio.c: In function 'vm_request_msi_vectors':
drivers/virtio/virtio_mmio.c:652:17: error: 'struct device' has no member named 'msi_domain'; did you mean 'pm_domain'?
if (!vdev->dev.msi_domain)
^~~~~~~~~~
pm_domain
drivers/virtio/virtio_mmio.c:653:13: error: 'struct device' has no member named 'msi_domain'; did you mean 'pm_domain'?
vdev->dev.msi_domain = platform_msi_get_def_irq_domain();
^~~~~~~~~~
pm_domain
drivers/virtio/virtio_mmio.c:654:8: error: implicit declaration of function 'platform_msi_domain_alloc_irqs'; did you mean 'irq_domain_alloc_irqs'? [-Werror=implicit-function-declaration]
err = platform_msi_domain_alloc_irqs(&vdev->dev, nirqs,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
irq_domain_alloc_irqs
drivers/virtio/virtio_mmio.c: In function 'vm_free_msi_irqs':
drivers/virtio/virtio_mmio.c:683:3: error: implicit declaration of function 'platform_msi_domain_free_irqs'; did you mean 'irq_domain_free_irqs'? [-Werror=implicit-function-declaration]
platform_msi_domain_free_irqs(&vdev->dev);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
irq_domain_free_irqs
cc1: some warnings being treated as errors

vim +135 include/linux/msi.h

3b7d1921f4cdd6 Eric W. Biederman 2006-10-04 132
d31eb342409b24 Jiang Liu 2014-11-15 133 /* Helpers to hide struct msi_desc implementation details */
25a98bd4ff9355 Jiang Liu 2015-07-09 134 #define msi_desc_to_dev(desc) ((desc)->dev)
4a7cc831670550 Jiang Liu 2015-07-09 @135 #define dev_to_msi_list(dev) (&(dev)->msi_list)
d31eb342409b24 Jiang Liu 2014-11-15 136 #define first_msi_entry(dev) \
d31eb342409b24 Jiang Liu 2014-11-15 @137 list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
d31eb342409b24 Jiang Liu 2014-11-15 138 #define for_each_msi_entry(desc, dev) \
d31eb342409b24 Jiang Liu 2014-11-15 139 list_for_each_entry((desc), dev_to_msi_list((dev)), list)
81b1e6e6a8590a Miquel Raynal 2018-10-11 140 #define for_each_msi_entry_safe(desc, tmp, dev) \
81b1e6e6a8590a Miquel Raynal 2018-10-11 141 list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list)
d31eb342409b24 Jiang Liu 2014-11-15 142

:::::: The code at line 135 was first introduced by commit
:::::: 4a7cc831670550e6b48ef5760e7213f89935ff0d genirq/MSI: Move msi_list from struct pci_dev to struct device

:::::: TO: Jiang Liu <[email protected]>
:::::: CC: Thomas Gleixner <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation


Attachments:
(No filename) (15.39 kB)
.config.gz (33.97 kB)
Download all attachments

2019-12-26 08:16:18

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 2019/12/25 下午11:20, Liu, Jiang wrote:
>
>> On Dec 25, 2019, at 6:20 PM, Jason Wang <[email protected]> wrote:
>>
>>
>> On 2019/12/25 上午10:50, Zha Bin wrote:
>>> From: Liu Jiang <[email protected]>
>>>
>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>> virtio over mmio devices as a lightweight machine model for modern
>>> cloud. The standard virtio over MMIO transport layer only supports one
>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>> layer using MSI. Legacy interrupt has long work path and causes specific
>>> VMExits in following cases, which would considerably slow down the
>>> performance:
>>>
>>> 1) read interrupt status register
>>> 2) update interrupt status register
>>> 3) write IOAPIC EOI register
>>>
>>> We proposed to update virtio over MMIO to version 3[1] to add the
>>> following new features and enhance the performance.
>>>
>>> 1) Support Message Signaled Interrupt(MSI), which increases the
>>> interrupt performance for virtio multi-queue devices
>>> 2) Support per-queue doorbell, so the guest kernel may directly write
>>> to the doorbells provided by virtio devices.
>>>
>>> The following is the network tcp_rr performance testing report, tested
>>> with virtio-pci device, vanilla virtio-mmio device and patched
>>> virtio-mmio device (run test 3 times for each case):
>>>
>>> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>>
>>> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
>>> trans/s 9536 6939 9500
>>> trans/s 9734 7029 9749
>>> trans/s 9894 7095 9318
>>>
>>> [1] https://lkml.org/lkml/2019/12/20/113
>>
>> Thanks for the patch. Two questions after a quick glance:
>>
>> 1) In PCI we choose to support MSI-X instead of MSI for having extra flexibility like alias, independent data and address (e.g for affinity) . Any reason for not start from MSI-X? E.g having MSI-X table and PBA (both of which looks pretty independent).
> Hi Jason,
> Thanks for reviewing patches on Christmas Day:)
> The PCI MSI-x has several advantages over PCI MSI, mainly
> 1) support 2048 vectors, much more than 32 vectors supported by MSI.
> 2) dedicated address/data for each vector,
> 3) per vector mask/pending bits.
> The proposed MMIO MSI extension supports both 1) and 2),


Aha right, I mis-read the patch. But more questions comes:

1) The association between vq and MSI-X vector is fixed. This means it
can't work for a device that have more than 2047 queues. We probably
need something similar to virtio-pci to allow a dynamic association.
2) The mask and unmask control is missed


> but the extension doesn’t support 3) because
> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.


Not directly used but masking/unmasking is widely used in irq subsystem
which allows lots of optimizations.


>
> On the other hand, we want to simplify VMM implementations as simple as possible, and mimicking the PCI MSI-x
> will cause some complexity to VMM implementations.


I agree to simplify VMM implementation, but it looks to me introducing
masking/pending won't cost too much code in the VMM implementation. Just
new type of command for VIRTIO_MMIO_MSI_COMMAND.

Thanks


>
>> 2) It's better to split notify_multiplexer out of MSI support to ease the reviewers (apply to spec patch as well)
> Great suggestion, we will try to split the patch.
>
> Thanks,
> Gerry
>
>> Thanks

2019-12-26 08:23:17

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] x86/msi: Enhance x86 to support platform_msi


On 2019/12/25 上午10:50, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> The x86 platform currently only supports specific MSI/MSI-x for PCI
> devices. To enable MSI to the platform devices such as virtio-mmio
> device, this patch enhances x86 with platform MSI by leveraging the
> already built-in platform-msi driver (drivers/base/platform-msi.c)
> and makes it as a configurable option.
>
> Signed-off-by: Liu Jiang <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>
> ---
> arch/x86/Kconfig | 6 ++++
> arch/x86/include/asm/hw_irq.h | 5 +++
> arch/x86/include/asm/irqdomain.h | 9 +++++
> arch/x86/kernel/apic/msi.c | 74 ++++++++++++++++++++++++++++++++++++++++
> drivers/base/platform-msi.c | 4 +--
> include/linux/msi.h | 1 +
> 6 files changed, 97 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5e89499..c1178f2 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1074,6 +1074,12 @@ config X86_IO_APIC
> def_bool y
> depends on X86_LOCAL_APIC || X86_UP_IOAPIC
>
> +config X86_PLATFORM_MSI
> + def_bool y
> + depends on X86_LOCAL_APIC && X86_64
> + select GENERIC_MSI_IRQ
> + select GENERIC_MSI_IRQ_DOMAIN
> +
> config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
> bool "Reroute for broken boot IRQs"
> depends on X86_IO_APIC
> diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> index 4154bc5..5c96b49 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h
> @@ -113,6 +113,11 @@ struct irq_alloc_info {
> struct msi_desc *desc;
> };
> #endif
> +#ifdef CONFIG_X86_PLATFORM_MSI
> + struct {
> + irq_hw_number_t platform_msi_hwirq;
> + };
> +#endif
> };
> };
>
> diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
> index c066ffa..b53f51f 100644
> --- a/arch/x86/include/asm/irqdomain.h
> +++ b/arch/x86/include/asm/irqdomain.h
> @@ -56,4 +56,13 @@ extern void mp_irqdomain_deactivate(struct irq_domain *domain,
> static inline void arch_init_msi_domain(struct irq_domain *domain) { }
> #endif
>
> +#ifdef CONFIG_X86_PLATFORM_MSI
> +extern struct irq_domain *platform_msi_get_def_irq_domain(void);
> +#else
> +static inline struct irq_domain *platform_msi_get_def_irq_domain(void)
> +{
> + return NULL;
> +}
> +#endif
> +
> #endif
> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> index 7f75334..ef558c7 100644
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -22,6 +22,10 @@
> #include <asm/irq_remapping.h>
>
> static struct irq_domain *msi_default_domain;
> +#ifdef CONFIG_X86_PLATFORM_MSI
> +static struct irq_domain *platform_msi_default_domain;
> +static struct msi_domain_info platform_msi_domain_info;
> +#endif
>
> static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> {
> @@ -146,6 +150,17 @@ void __init arch_init_msi_domain(struct irq_domain *parent)
> }
> if (!msi_default_domain)
> pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
> +#ifdef CONFIG_X86_PLATFORM_MSI
> + fn = irq_domain_alloc_named_fwnode("PLATFORM-MSI");
> + if (fn) {
> + platform_msi_default_domain =
> + platform_msi_create_irq_domain(fn,
> + &platform_msi_domain_info, parent);
> + irq_domain_free_fwnode(fn);
> + }
> + if (!platform_msi_default_domain)
> + pr_warn("failed to initialize irqdomain for PLATFORM-MSI.\n");
> +#endif
> }
>
> #ifdef CONFIG_IRQ_REMAP
> @@ -384,3 +399,62 @@ int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
> return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
> }
> #endif
> +
> +#ifdef CONFIG_X86_PLATFORM_MSI
> +static void platform_msi_mask_irq(struct irq_data *data)
> +{
> +}
> +
> +static void platform_msi_unmask_irq(struct irq_data *data)
> +{
> +}
> +
> +static struct irq_chip platform_msi_controller = {
> + .name = "PLATFORM-MSI",
> + .irq_mask = platform_msi_mask_irq,
> + .irq_unmask = platform_msi_unmask_irq,
> + .irq_ack = irq_chip_ack_parent,
> + .irq_retrigger = irq_chip_retrigger_hierarchy,
> + .irq_compose_msi_msg = irq_msi_compose_msg,
> + .flags = IRQCHIP_SKIP_SET_WAKE,


Except for irq_msi_compose_msg all the rest were arch independent. Can
we have allow arch to specify irq_compose_msi_msg then we can unfiy the
rest for all archs?


> +};
> +
> +static int platform_msi_prepare(struct irq_domain *domain, struct device *dev,
> + int nvec, msi_alloc_info_t *arg)
> +{
> + memset(arg, 0, sizeof(*arg));
> + return 0;
> +}
> +
> +static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> +{
> + arg->platform_msi_hwirq = platform_msi_calc_hwirq(desc);
> +}
> +
> +static irq_hw_number_t platform_msi_get_hwirq(struct msi_domain_info *info,
> + msi_alloc_info_t *arg)
> +{
> + return arg->platform_msi_hwirq;
> +}
> +
> +static struct msi_domain_ops platform_msi_domain_ops = {
> + .msi_prepare = platform_msi_prepare,
> + .set_desc = platform_msi_set_desc,
> + .get_hwirq = platform_msi_get_hwirq,
> +};
> +
> +static struct msi_domain_info platform_msi_domain_info = {
> + .flags = MSI_FLAG_USE_DEF_DOM_OPS |
> + MSI_FLAG_USE_DEF_CHIP_OPS |
> + MSI_FLAG_ACTIVATE_EARLY,
> + .ops = &platform_msi_domain_ops,
> + .chip = &platform_msi_controller,
> + .handler = handle_edge_irq,
> + .handler_name = "edge",
> +};


Is this better to have a virtio-mmio specific domain and irq_chip? It
looks to me the advantages are:

- works for all archs
- allow virtio-mmio specific method to do e.g mask/unmask.

Thanks


> +
> +struct irq_domain *platform_msi_get_def_irq_domain(void)
> +{
> + return platform_msi_default_domain;
> +}
> +#endif
> diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
> index 8da314b..45752f1 100644
> --- a/drivers/base/platform-msi.c
> +++ b/drivers/base/platform-msi.c
> @@ -31,12 +31,11 @@ struct platform_msi_priv_data {
> /* The devid allocator */
> static DEFINE_IDA(platform_msi_devid_ida);
>
> -#ifdef GENERIC_MSI_DOMAIN_OPS
> /*
> * Convert an msi_desc to a globaly unique identifier (per-device
> * devid + msi_desc position in the msi_list).
> */
> -static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> {
> u32 devid;
>
> @@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
> return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
> }
>
> +#ifdef GENERIC_MSI_DOMAIN_OPS
> static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> {
> arg->desc = desc;
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 8ad679e..ee5f566 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -362,6 +362,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
> unsigned int nvec);
> void *platform_msi_get_host_data(struct irq_domain *domain);
> +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc);
> #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
>
> #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN

2019-12-26 08:41:48

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 2019/12/25 上午10:50, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> virtio over mmio devices as a lightweight machine model for modern
> cloud. The standard virtio over MMIO transport layer only supports one
> legacy interrupt, which is much heavier than virtio over PCI transport
> layer using MSI. Legacy interrupt has long work path and causes specific
> VMExits in following cases, which would considerably slow down the
> performance:
>
> 1) read interrupt status register
> 2) update interrupt status register
> 3) write IOAPIC EOI register
>
> We proposed to update virtio over MMIO to version 3[1] to add the
> following new features and enhance the performance.
>
> 1) Support Message Signaled Interrupt(MSI), which increases the
> interrupt performance for virtio multi-queue devices
> 2) Support per-queue doorbell, so the guest kernel may directly write
> to the doorbells provided by virtio devices.
>
> The following is the network tcp_rr performance testing report, tested
> with virtio-pci device, vanilla virtio-mmio device and patched
> virtio-mmio device (run test 3 times for each case):
>
> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>
> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
> trans/s 9536 6939 9500
> trans/s 9734 7029 9749
> trans/s 9894 7095 9318
>
> [1] https://lkml.org/lkml/2019/12/20/113
>
> Signed-off-by: Liu Jiang <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>
> ---
> drivers/virtio/virtio_mmio.c | 226 ++++++++++++++++++++++++++++++++++++---
> include/uapi/linux/virtio_mmio.h | 28 +++++
> 2 files changed, 240 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index e09edb5..065c083 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -68,8 +68,8 @@
> #include <linux/virtio_config.h>
> #include <uapi/linux/virtio_mmio.h>
> #include <linux/virtio_ring.h>
> -
> -
> +#include <linux/msi.h>
> +#include <asm/irqdomain.h>
>
> /* The alignment to use between consumer and producer parts of vring.
> * Currently hardcoded to the page size. */
> @@ -90,6 +90,12 @@ struct virtio_mmio_device {
> /* a list of queues so we can dispatch IRQs */
> spinlock_t lock;
> struct list_head virtqueues;
> +
> + int doorbell_base;
> + int doorbell_scale;


It's better to use the terminology defined by spec, e.g
notify_base/notify_multiplexer etc.

And we usually use unsigned type for offset.


> + bool msi_enabled;
> + /* Name strings for interrupts. */
> + char (*vm_vq_names)[256];
> };
>
> struct virtio_mmio_vq_info {
> @@ -101,6 +107,8 @@ struct virtio_mmio_vq_info {
> };
>
>
> +static void vm_free_msi_irqs(struct virtio_device *vdev);
> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs);
>
> /* Configuration interface */
>
> @@ -273,12 +281,28 @@ static bool vm_notify(struct virtqueue *vq)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
>
> + if (vm_dev->version == 3) {
> + int offset = vm_dev->doorbell_base +
> + vm_dev->doorbell_scale * vq->index;
> + writel(vq->index, vm_dev->base + offset);


In virtio-pci we store the doorbell address in vq->priv to avoid doing
multiplication in fast path.


> + } else
> /* We write the queue's selector into the notification register to
> * signal the other end */
> - writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> + writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> return true;
> }
>
> +/* Handle a configuration change */
> +static irqreturn_t vm_config_changed(int irq, void *opaque)
> +{
> + struct virtio_mmio_device *vm_dev = opaque;
> +
> + virtio_config_changed(&vm_dev->vdev);
> +
> + return IRQ_HANDLED;
> +}
> +
> /* Notify all virtqueues on an interrupt. */
> static irqreturn_t vm_interrupt(int irq, void *opaque)
> {
> @@ -307,7 +331,12 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
> return ret;
> }
>
> +static int vm_msi_irq_vector(struct device *dev, unsigned int nr)
> +{
> + struct msi_desc *entry = first_msi_entry(dev);
>
> + return entry->irq + nr;
> +}
>
> static void vm_del_vq(struct virtqueue *vq)
> {
> @@ -316,6 +345,12 @@ static void vm_del_vq(struct virtqueue *vq)
> unsigned long flags;
> unsigned int index = vq->index;
>
> + if (vm_dev->version == 3 && vm_dev->msi_enabled) {
> + int irq = vm_msi_irq_vector(&vq->vdev->dev, index + 1);
> +
> + free_irq(irq, vq);
> + }
> +
> spin_lock_irqsave(&vm_dev->lock, flags);
> list_del(&info->node);
> spin_unlock_irqrestore(&vm_dev->lock, flags);
> @@ -334,15 +369,41 @@ static void vm_del_vq(struct virtqueue *vq)
> kfree(info);
> }
>
> -static void vm_del_vqs(struct virtio_device *vdev)
> +static void vm_free_irqs(struct virtio_device *vdev)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + if (vm_dev->version == 3)
> + vm_free_msi_irqs(vdev);
> + else
> + free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
> +}
> +
> +static void vm_del_vqs(struct virtio_device *vdev)
> +{
> struct virtqueue *vq, *n;
>
> list_for_each_entry_safe(vq, n, &vdev->vqs, list)
> vm_del_vq(vq);
>
> - free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
> + vm_free_irqs(vdev);
> +}
> +
> +static int vm_request_single_irq(struct virtio_device *vdev)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + int irq = platform_get_irq(vm_dev->pdev, 0);
> + int err;
> +
> + if (irq < 0) {
> + dev_err(&vdev->dev, "Cannot get IRQ resource\n");
> + return irq;
> + }
> +
> + err = request_irq(irq, vm_interrupt, IRQF_SHARED,
> + dev_name(&vdev->dev), vm_dev);
> +
> + return err;
> }
>
> static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> @@ -455,6 +516,72 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> return ERR_PTR(err);
> }
>
> +static inline void mmio_msi_set_enable(struct virtio_mmio_device *vm_dev,
> + int enable)
> +{
> + u16 status;
> +
> + status = readw(vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
> + if (enable)
> + status |= VIRTIO_MMIO_MSI_ENABLE_MASK;
> + else
> + status &= ~VIRTIO_MMIO_MSI_ENABLE_MASK;
> + writew(status, vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
> +}
> +
> +static int vm_find_vqs_msi(struct virtio_device *vdev, unsigned int nvqs,
> + struct virtqueue *vqs[], vq_callback_t *callbacks[],
> + const char * const names[], const bool *ctx,
> + struct irq_affinity *desc)
> +{
> + int i, err, irq;
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + /* Allocate nvqs irqs for queues and one irq for configuration */
> + err = vm_request_msi_vectors(vdev, nvqs + 1);
> + if (err != 0)
> + return err;
> +
> + for (i = 0; i < nvqs; i++) {
> + int index = i + 1;
> +
> + if (!names[i]) {
> + vqs[i] = NULL;
> + continue;
> + }
> + vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
> + ctx ? ctx[i] : false);
> + if (IS_ERR(vqs[i])) {
> + err = PTR_ERR(vqs[i]);
> + goto error_find;
> + }
> +
> + if (!callbacks[i])
> + continue;
> +
> + irq = vm_msi_irq_vector(&vqs[i]->vdev->dev, index);
> + /* allocate per-vq irq if available and necessary */
> + snprintf(vm_dev->vm_vq_names[index],
> + sizeof(*vm_dev->vm_vq_names),
> + "%s-%s",
> + dev_name(&vm_dev->vdev.dev), names[i]);
> + err = request_irq(irq, vring_interrupt, 0,
> + vm_dev->vm_vq_names[index], vqs[i]);
> +
> + if (err)
> + goto error_find;
> + }
> +
> + mmio_msi_set_enable(vm_dev, 1);
> + vm_dev->msi_enabled = true;
> +
> + return 0;
> +
> +error_find:
> + vm_del_vqs(vdev);
> + return err;
> +}
> +
> static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> struct virtqueue *vqs[],
> vq_callback_t *callbacks[],
> @@ -463,16 +590,16 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> struct irq_affinity *desc)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> - int irq = platform_get_irq(vm_dev->pdev, 0);
> int i, err, queue_idx = 0;
>
> - if (irq < 0) {
> - dev_err(&vdev->dev, "Cannot get IRQ resource\n");
> - return irq;
> + if (vm_dev->version == 3) {
> + err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
> + names, ctx, desc);
> + if (!err)
> + return 0;
> }
>
> - err = request_irq(irq, vm_interrupt, IRQF_SHARED,
> - dev_name(&vdev->dev), vm_dev);
> + err = vm_request_single_irq(vdev);
> if (err)
> return err;
>
> @@ -493,6 +620,73 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> return 0;
> }
>
> +static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
> +{
> + struct device *dev = desc->dev;
> + struct virtio_device *vdev = dev_to_virtio(dev);
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + void __iomem *pos = vm_dev->base;
> + uint16_t cmd = VIRTIO_MMIO_MSI_CMD(VIRTIO_MMIO_MSI_CMD_UPDATE,
> + desc->platform.msi_index);
> +
> + writel(msg->address_lo, pos + VIRTIO_MMIO_MSI_ADDRESS_LOW);
> + writel(msg->address_hi, pos + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
> + writel(msg->data, pos + VIRTIO_MMIO_MSI_DATA);
> + writew(cmd, pos + VIRTIO_MMIO_MSI_COMMAND);
> +}
> +
> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + int irq, err;
> + u16 csr = readw(vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
> +
> + if (vm_dev->msi_enabled || (csr & VIRTIO_MMIO_MSI_ENABLE_MASK) == 0)
> + return -EINVAL;
> +
> + vm_dev->vm_vq_names = kmalloc_array(nirqs, sizeof(*vm_dev->vm_vq_names),
> + GFP_KERNEL);
> + if (!vm_dev->vm_vq_names)
> + return -ENOMEM;
> +
> + if (!vdev->dev.msi_domain)
> + vdev->dev.msi_domain = platform_msi_get_def_irq_domain();
> + err = platform_msi_domain_alloc_irqs(&vdev->dev, nirqs,
> + mmio_write_msi_msg);


Can platform_msi_domain_alloc_irqs check msi_domain vs NULL?


> + if (err)
> + goto error_free_mem;
> +
> + /* The first MSI vector is used for configuration change events. */
> + snprintf(vm_dev->vm_vq_names[0], sizeof(*vm_dev->vm_vq_names),
> + "%s-config", dev_name(&vdev->dev));
> + irq = vm_msi_irq_vector(&vdev->dev, 0);
> + err = request_irq(irq, vm_config_changed, 0, vm_dev->vm_vq_names[0],
> + vm_dev);
> + if (err)
> + goto error_free_mem;
> +
> + return 0;
> +
> +error_free_mem:
> + kfree(vm_dev->vm_vq_names);
> + vm_dev->vm_vq_names = NULL;
> + return err;
> +}
> +
> +static void vm_free_msi_irqs(struct virtio_device *vdev)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + if (vm_dev->msi_enabled) {
> + mmio_msi_set_enable(vm_dev, 0);
> + free_irq(vm_msi_irq_vector(&vdev->dev, 0), vm_dev);
> + platform_msi_domain_free_irqs(&vdev->dev);
> + kfree(vm_dev->vm_vq_names);
> + vm_dev->vm_vq_names = NULL;
> + vm_dev->msi_enabled = false;
> + }
> +}
> +
> static const char *vm_bus_name(struct virtio_device *vdev)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> @@ -567,7 +761,7 @@ static int virtio_mmio_probe(struct platform_device *pdev)
>
> /* Check device version */
> vm_dev->version = readl(vm_dev->base + VIRTIO_MMIO_VERSION);
> - if (vm_dev->version < 1 || vm_dev->version > 2) {
> + if (vm_dev->version < 1 || vm_dev->version > 3) {
> dev_err(&pdev->dev, "Version %ld not supported!\n",
> vm_dev->version);
> return -ENXIO;
> @@ -603,6 +797,12 @@ static int virtio_mmio_probe(struct platform_device *pdev)
> dev_warn(&pdev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n");
>
> platform_set_drvdata(pdev, vm_dev);
> + if (vm_dev->version == 3) {
> + u32 db = readl(vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> + vm_dev->doorbell_base = db & 0xffff;
> + vm_dev->doorbell_scale = (db >> 16) & 0xffff;
> + }
>
> rc = register_virtio_device(&vm_dev->vdev);
> if (rc)
> @@ -619,8 +819,6 @@ static int virtio_mmio_remove(struct platform_device *pdev)
> return 0;
> }
>
> -
> -


Unnecessary changes.


> /* Devices list parameter */
>
> #if defined(CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES)
> diff --git a/include/uapi/linux/virtio_mmio.h b/include/uapi/linux/virtio_mmio.h
> index c4b0968..148895e 100644
> --- a/include/uapi/linux/virtio_mmio.h
> +++ b/include/uapi/linux/virtio_mmio.h
> @@ -122,6 +122,34 @@
> #define VIRTIO_MMIO_QUEUE_USED_LOW 0x0a0
> #define VIRTIO_MMIO_QUEUE_USED_HIGH 0x0a4
>
> +/* MSI related registers */
> +
> +/* MSI status register */
> +#define VIRTIO_MMIO_MSI_STATUS 0x0c0
> +/* MSI command register */
> +#define VIRTIO_MMIO_MSI_COMMAND 0x0c2
> +/* MSI low 32 bit address, 64 bits in two halves */
> +#define VIRTIO_MMIO_MSI_ADDRESS_LOW 0x0c4
> +/* MSI high 32 bit address, 64 bits in two halves */
> +#define VIRTIO_MMIO_MSI_ADDRESS_HIGH 0x0c8
> +/* MSI data */
> +#define VIRTIO_MMIO_MSI_DATA 0x0cc


I wonder what's the advantage of using registers instead of memory
mapped tables as PCI did. Is this because MMIO doesn't have capability
list which makes it hard to be extended if we have variable size of tables?


> +
> +/* RO: MSI feature enabled mask */
> +#define VIRTIO_MMIO_MSI_ENABLE_MASK 0x8000
> +/* RO: Maximum queue size available */
> +#define VIRTIO_MMIO_MSI_STATUS_QMASK 0x07ff
> +/* Reserved */
> +#define VIRTIO_MMIO_MSI_STATUS_RESERVED 0x7800
> +
> +#define VIRTIO_MMIO_MSI_CMD_UPDATE 0x1


I believe we need a command to read the number of vectors supported by
the device, or 2048 is assumed to be a fixed size here?

Thanks


> +#define VIRTIO_MMIO_MSI_CMD_OP_MASK 0xf000
> +#define VIRTIO_MMIO_MSI_CMD_ARG_MASK 0x0fff
> +#define VIRTIO_MMIO_MSI_CMD(op, arg) ((op) << 12 | (arg))
> +#define VIRITO_MMIO_MSI_CMD_ARG(cmd) ((cmd) & VIRTIO_MMIO_MSI_CMD_ARG_MASK)
> +#define VIRTIO_MMIO_MSI_CMD_OP(cmd) \
> + (((cmd) & VIRTIO_MMIO_MSI_CMD_OP_MASK) >> 12)
> +
> /* Configuration atomicity value */
> #define VIRTIO_MMIO_CONFIG_GENERATION 0x0fc
>

2019-12-26 12:44:25

by Liu, Jiang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3



> On Dec 26, 2019, at 4:09 PM, Jason Wang <[email protected]> wrote:
>
>
> On 2019/12/25 下午11:20, Liu, Jiang wrote:
>>
>>> On Dec 25, 2019, at 6:20 PM, Jason Wang <[email protected]> wrote:
>>>
>>>
>>> On 2019/12/25 上午10:50, Zha Bin wrote:
>>>> From: Liu Jiang <[email protected]>
>>>>
>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>>> virtio over mmio devices as a lightweight machine model for modern
>>>> cloud. The standard virtio over MMIO transport layer only supports one
>>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>>> layer using MSI. Legacy interrupt has long work path and causes specific
>>>> VMExits in following cases, which would considerably slow down the
>>>> performance:
>>>>
>>>> 1) read interrupt status register
>>>> 2) update interrupt status register
>>>> 3) write IOAPIC EOI register
>>>>
>>>> We proposed to update virtio over MMIO to version 3[1] to add the
>>>> following new features and enhance the performance.
>>>>
>>>> 1) Support Message Signaled Interrupt(MSI), which increases the
>>>> interrupt performance for virtio multi-queue devices
>>>> 2) Support per-queue doorbell, so the guest kernel may directly write
>>>> to the doorbells provided by virtio devices.
>>>>
>>>> The following is the network tcp_rr performance testing report, tested
>>>> with virtio-pci device, vanilla virtio-mmio device and patched
>>>> virtio-mmio device (run test 3 times for each case):
>>>>
>>>> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>>>
>>>> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
>>>> trans/s 9536 6939 9500
>>>> trans/s 9734 7029 9749
>>>> trans/s 9894 7095 9318
>>>>
>>>> [1] https://lkml.org/lkml/2019/12/20/113
>>>
>>> Thanks for the patch. Two questions after a quick glance:
>>>
>>> 1) In PCI we choose to support MSI-X instead of MSI for having extra flexibility like alias, independent data and address (e.g for affinity) . Any reason for not start from MSI-X? E.g having MSI-X table and PBA (both of which looks pretty independent).
>> Hi Jason,
>> Thanks for reviewing patches on Christmas Day:)
>> The PCI MSI-x has several advantages over PCI MSI, mainly
>> 1) support 2048 vectors, much more than 32 vectors supported by MSI.
>> 2) dedicated address/data for each vector,
>> 3) per vector mask/pending bits.
>> The proposed MMIO MSI extension supports both 1) and 2),
>
>
> Aha right, I mis-read the patch. But more questions comes:
>
> 1) The association between vq and MSI-X vector is fixed. This means it can't work for a device that have more than 2047 queues. We probably need something similar to virtio-pci to allow a dynamic association.
> 2) The mask and unmask control is missed
>
>
>> but the extension doesn’t support 3) because
>> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.
>
>
> Not directly used but masking/unmasking is widely used in irq subsystem which allows lots of optimizations.
>
>
>>
>> On the other hand, we want to simplify VMM implementations as simple as possible, and mimicking the PCI MSI-x
>> will cause some complexity to VMM implementations.
>
>
> I agree to simplify VMM implementation, but it looks to me introducing masking/pending won't cost too much code in the VMM implementation. Just new type of command for VIRTIO_MMIO_MSI_COMMAND.
>
> Thanks
>
>
>>
>>> 2) It's better to split notify_multiplexer out of MSI support to ease the reviewers (apply to spec patch as well)
>> Great suggestion, we will try to split the patch.
>>
>> Thanks,
>> Gerry
>>
>>> Thanks

2019-12-26 13:19:24

by Liu, Jiang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3



> On Dec 26, 2019, at 4:09 PM, Jason Wang <[email protected]> wrote:
>
>
> On 2019/12/25 下午11:20, Liu, Jiang wrote:
>>
>>> On Dec 25, 2019, at 6:20 PM, Jason Wang <[email protected]> wrote:
>>>
>>>
>>> On 2019/12/25 上午10:50, Zha Bin wrote:
>>>> From: Liu Jiang <[email protected]>
>>>>
>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>>> virtio over mmio devices as a lightweight machine model for modern
>>>> cloud. The standard virtio over MMIO transport layer only supports one
>>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>>> layer using MSI. Legacy interrupt has long work path and causes specific
>>>> VMExits in following cases, which would considerably slow down the
>>>> performance:
>>>>
>>>> 1) read interrupt status register
>>>> 2) update interrupt status register
>>>> 3) write IOAPIC EOI register
>>>>
>>>> We proposed to update virtio over MMIO to version 3[1] to add the
>>>> following new features and enhance the performance.
>>>>
>>>> 1) Support Message Signaled Interrupt(MSI), which increases the
>>>> interrupt performance for virtio multi-queue devices
>>>> 2) Support per-queue doorbell, so the guest kernel may directly write
>>>> to the doorbells provided by virtio devices.
>>>>
>>>> The following is the network tcp_rr performance testing report, tested
>>>> with virtio-pci device, vanilla virtio-mmio device and patched
>>>> virtio-mmio device (run test 3 times for each case):
>>>>
>>>> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>>>
>>>> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
>>>> trans/s 9536 6939 9500
>>>> trans/s 9734 7029 9749
>>>> trans/s 9894 7095 9318
>>>>
>>>> [1] https://lkml.org/lkml/2019/12/20/113
>>>
>>> Thanks for the patch. Two questions after a quick glance:
>>>
>>> 1) In PCI we choose to support MSI-X instead of MSI for having extra flexibility like alias, independent data and address (e.g for affinity) . Any reason for not start from MSI-X? E.g having MSI-X table and PBA (both of which looks pretty independent).
>> Hi Jason,
>> Thanks for reviewing patches on Christmas Day:)
>> The PCI MSI-x has several advantages over PCI MSI, mainly
>> 1) support 2048 vectors, much more than 32 vectors supported by MSI.
>> 2) dedicated address/data for each vector,
>> 3) per vector mask/pending bits.
>> The proposed MMIO MSI extension supports both 1) and 2),
>
>
> Aha right, I mis-read the patch. But more questions comes:
>
> 1) The association between vq and MSI-X vector is fixed. This means it can't work for a device that have more than 2047 queues. We probably need something similar to virtio-pci to allow a dynamic association.
We have considered both the PCI MSI-x like dynamic association design and fix mapping design.
The fix mapping design simplifies both the interrupt configuration process and VMM implementations.
And the virtio mmio transport layer is mainly used by light VMMs to support small scale virtual machines, 2048 vectors should be enough for these usage cases.
So the fix mapping design has been used.

> 2) The mask and unmask control is missed
>
>
>> but the extension doesn’t support 3) because
>> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.
>
>
> Not directly used but masking/unmasking is widely used in irq subsystem which allows lots of optimizations.
>
>
>>
>> On the other hand, we want to simplify VMM implementations as simple as possible, and mimicking the PCI MSI-x
>> will cause some complexity to VMM implementations.
>
>
> I agree to simplify VMM implementation, but it looks to me introducing masking/pending won't cost too much code in the VMM implementation. Just new type of command for VIRTIO_MMIO_MSI_COMMAND.
We want to make VMM implementations as simple as possible:)
And based on following observations, we have disabled support of mask/unmask,
1) MSI is edge triggering, which means it won’t be shared with other interrupt sources, so masking/unmasking won’t be used for normal interrupt management logic.
2) Linux virtio mmio transport layer doesn’t support suspend/resume yet, so there’s no need to quiesce the device by masking interrupts.
3) The legacy PCI 2.2 devices doesn’t support irq masking/unmasking, so irq masking/unmasking may be optional operations.
So we skipped support of irq masking/unmasking. We will recheck whether irq masking/unmasking is mandatory for MMIO devices.
On the other hand, we may enhance the spec to define command codes for masking/unmasking, and VMM may optionally support masking/unmasking.

Thanks,
Gerry

>
> Thanks
>
>
>>
>>> 2) It's better to split notify_multiplexer out of MSI support to ease the reviewers (apply to spec patch as well)
>> Great suggestion, we will try to split the patch.
>>
>> Thanks,
>> Gerry
>>
>>> Thanks

2019-12-27 09:40:11

by Liu, Jing2

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

Hi Jason,

Thanks for reviewing the patches!

 [...]
>> -
>> +#include <linux/msi.h>
>> +#include <asm/irqdomain.h>
>>     /* The alignment to use between consumer and producer parts of
>> vring.
>>    * Currently hardcoded to the page size. */
>> @@ -90,6 +90,12 @@ struct virtio_mmio_device {
>>       /* a list of queues so we can dispatch IRQs */
>>       spinlock_t lock;
>>       struct list_head virtqueues;
>> +
>> +    int doorbell_base;
>> +    int doorbell_scale;
>
>
> It's better to use the terminology defined by spec, e.g
> notify_base/notify_multiplexer etc.
>
> And we usually use unsigned type for offset.

We will fix this in next version.


>
>
>> +    bool msi_enabled;
>> +    /* Name strings for interrupts. */
>> +    char (*vm_vq_names)[256];
>>   };
>>     struct virtio_mmio_vq_info {
>> @@ -101,6 +107,8 @@ struct virtio_mmio_vq_info {
>>   };
>>     +static void vm_free_msi_irqs(struct virtio_device *vdev);
>> +static int vm_request_msi_vectors(struct virtio_device *vdev, int
>> nirqs);
>>     /* Configuration interface */
>>   @@ -273,12 +281,28 @@ static bool vm_notify(struct virtqueue *vq)
>>   {
>>       struct virtio_mmio_device *vm_dev =
>> to_virtio_mmio_device(vq->vdev);
>>   +    if (vm_dev->version == 3) {
>> +        int offset = vm_dev->doorbell_base +
>> +                 vm_dev->doorbell_scale * vq->index;
>> +        writel(vq->index, vm_dev->base + offset);
>
>
> In virtio-pci we store the doorbell address in vq->priv to avoid doing
> multiplication in fast path.

Good suggestion. We will fix this in next version.

[...]

>> +
>> +static int vm_request_msi_vectors(struct virtio_device *vdev, int
>> nirqs)
>> +{
>> +    struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
>> +    int irq, err;
>> +    u16 csr = readw(vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
>> +
>> +    if (vm_dev->msi_enabled || (csr & VIRTIO_MMIO_MSI_ENABLE_MASK)
>> == 0)
>> +        return -EINVAL;
>> +
>> +    vm_dev->vm_vq_names = kmalloc_array(nirqs,
>> sizeof(*vm_dev->vm_vq_names),
>> +            GFP_KERNEL);
>> +    if (!vm_dev->vm_vq_names)
>> +        return -ENOMEM;
>> +
>> +    if (!vdev->dev.msi_domain)
>> +        vdev->dev.msi_domain = platform_msi_get_def_irq_domain();
>> +    err = platform_msi_domain_alloc_irqs(&vdev->dev, nirqs,
>> +            mmio_write_msi_msg);
>
>
> Can platform_msi_domain_alloc_irqs check msi_domain vs NULL?
>
Actually here, we need to firstly consider the cases that @dev doesn't
have @msi_domain,

according to the report by lkp.

For the platform_msi_domain_alloc_irqs, it can help check inside.


> [...]
>>         rc = register_virtio_device(&vm_dev->vdev);
>>       if (rc)
>> @@ -619,8 +819,6 @@ static int virtio_mmio_remove(struct
>> platform_device *pdev)
>>       return 0;
>>   }
>>   -
>> -
>
>
> Unnecessary changes.

Got it. Will remove it later.


> [...]
>>   +/* MSI related registers */
>> +
>> +/* MSI status register */
>> +#define VIRTIO_MMIO_MSI_STATUS        0x0c0
>> +/* MSI command register */
>> +#define VIRTIO_MMIO_MSI_COMMAND        0x0c2
>> +/* MSI low 32 bit address, 64 bits in two halves */
>> +#define VIRTIO_MMIO_MSI_ADDRESS_LOW    0x0c4
>> +/* MSI high 32 bit address, 64 bits in two halves */
>> +#define VIRTIO_MMIO_MSI_ADDRESS_HIGH    0x0c8
>> +/* MSI data */
>> +#define VIRTIO_MMIO_MSI_DATA        0x0cc
>
>
> I wonder what's the advantage of using registers instead of memory
> mapped tables as PCI did. Is this because MMIO doesn't have capability
> list which makes it hard to be extended if we have variable size of
> tables?
>
Yes, mmio doesn't have capability which limits the extension.

It need some registers to specify the msi table/mask table/pending table
offsets if using what pci did.

As comments previously, mask/pending can be easily extended by MSI command.

>
>> +
>> +/* RO: MSI feature enabled mask */
>> +#define VIRTIO_MMIO_MSI_ENABLE_MASK    0x8000
>> +/* RO: Maximum queue size available */
>> +#define VIRTIO_MMIO_MSI_STATUS_QMASK    0x07ff
>> +/* Reserved */
>> +#define VIRTIO_MMIO_MSI_STATUS_RESERVED    0x7800
>> +
>> +#define VIRTIO_MMIO_MSI_CMD_UPDATE    0x1
>
>
> I believe we need a command to read the number of vectors supported by
> the device, or 2048 is assumed to be a fixed size here?

For not bringing much complexity, we proposed vector per queue and fixed
relationship between events and vectors.

So the number of vectors supported by device is equal to the total
number of vqs and config.

We will try to explicitly highlight this point in spec for later version.


Thanks!

Jing

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

2020-01-02 06:30:23

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 2019/12/26 下午9:16, Liu, Jiang wrote:
>
>> On Dec 26, 2019, at 4:09 PM, Jason Wang <[email protected]> wrote:
>>
>>
>> On 2019/12/25 下午11:20, Liu, Jiang wrote:
>>>> On Dec 25, 2019, at 6:20 PM, Jason Wang <[email protected]> wrote:
>>>>
>>>>
>>>> On 2019/12/25 上午10:50, Zha Bin wrote:
>>>>> From: Liu Jiang <[email protected]>
>>>>>
>>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>>>> virtio over mmio devices as a lightweight machine model for modern
>>>>> cloud. The standard virtio over MMIO transport layer only supports one
>>>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>>>> layer using MSI. Legacy interrupt has long work path and causes specific
>>>>> VMExits in following cases, which would considerably slow down the
>>>>> performance:
>>>>>
>>>>> 1) read interrupt status register
>>>>> 2) update interrupt status register
>>>>> 3) write IOAPIC EOI register
>>>>>
>>>>> We proposed to update virtio over MMIO to version 3[1] to add the
>>>>> following new features and enhance the performance.
>>>>>
>>>>> 1) Support Message Signaled Interrupt(MSI), which increases the
>>>>> interrupt performance for virtio multi-queue devices
>>>>> 2) Support per-queue doorbell, so the guest kernel may directly write
>>>>> to the doorbells provided by virtio devices.
>>>>>
>>>>> The following is the network tcp_rr performance testing report, tested
>>>>> with virtio-pci device, vanilla virtio-mmio device and patched
>>>>> virtio-mmio device (run test 3 times for each case):
>>>>>
>>>>> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>>>>
>>>>> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
>>>>> trans/s 9536 6939 9500
>>>>> trans/s 9734 7029 9749
>>>>> trans/s 9894 7095 9318
>>>>>
>>>>> [1] https://lkml.org/lkml/2019/12/20/113
>>>> Thanks for the patch. Two questions after a quick glance:
>>>>
>>>> 1) In PCI we choose to support MSI-X instead of MSI for having extra flexibility like alias, independent data and address (e.g for affinity) . Any reason for not start from MSI-X? E.g having MSI-X table and PBA (both of which looks pretty independent).
>>> Hi Jason,
>>> Thanks for reviewing patches on Christmas Day:)
>>> The PCI MSI-x has several advantages over PCI MSI, mainly
>>> 1) support 2048 vectors, much more than 32 vectors supported by MSI.
>>> 2) dedicated address/data for each vector,
>>> 3) per vector mask/pending bits.
>>> The proposed MMIO MSI extension supports both 1) and 2),
>>
>> Aha right, I mis-read the patch. But more questions comes:
>>
>> 1) The association between vq and MSI-X vector is fixed. This means it can't work for a device that have more than 2047 queues. We probably need something similar to virtio-pci to allow a dynamic association.
> We have considered both the PCI MSI-x like dynamic association design and fix mapping design.
> The fix mapping design simplifies both the interrupt configuration process and VMM implementations.


Well, for VMM just an indirection and for guest, it can choose to use
fixed mapping, just need to program once during probe.


> And the virtio mmio transport layer is mainly used by light VMMs to support small scale virtual machines,


Let's not limit the interface to be used by a specific case :).
Eliminating PCIE would be appealing for other scenarios.


> 2048 vectors should be enough for these usage cases.
> So the fix mapping design has been used.
>
>> 2) The mask and unmask control is missed
>>
>>
>>> but the extension doesn’t support 3) because
>>> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.
>>
>> Not directly used but masking/unmasking is widely used in irq subsystem which allows lots of optimizations.
>>
>>
>>> On the other hand, we want to simplify VMM implementations as simple as possible, and mimicking the PCI MSI-x
>>> will cause some complexity to VMM implementations.
>>
>> I agree to simplify VMM implementation, but it looks to me introducing masking/pending won't cost too much code in the VMM implementation. Just new type of command for VIRTIO_MMIO_MSI_COMMAND.
> We want to make VMM implementations as simple as possible:)
> And based on following observations, we have disabled support of mask/unmask,
> 1) MSI is edge triggering, which means it won’t be shared with other interrupt sources,


Is this true? I think the spec does not forbid such usages, e.g using
the same MSI address/command for different queues or devices?


> so masking/unmasking won’t be used for normal interrupt management logic.
> 2) Linux virtio mmio transport layer doesn’t support suspend/resume yet, so there’s no need to quiesce the device by masking interrupts.


Yes, but it's a limitation only for virtio mmio transport. We can add it.


> 3) The legacy PCI 2.2 devices doesn’t support irq masking/unmasking, so irq masking/unmasking may be optional operations.


Yes, but as you said, it helps for performance and some other cases. I
still prefer to implement that consider it is not complex. If we do MSI
without masking/unmasking, I suspect we will implement MSI-X finally
somedays then maintaining MSI will become a burden... (still takes
virtio-pci as an example, it choose to implement MSI-X not MSI).


> So we skipped support of irq masking/unmasking. We will recheck whether irq masking/unmasking is mandatory for MMIO devices.
> On the other hand, we may enhance the spec to define command codes for masking/unmasking, and VMM may optionally support masking/unmasking.


Yes, thanks.


>
> Thanks,
> Gerry
>
>> Thanks
>>
>>
>>>> 2) It's better to split notify_multiplexer out of MSI support to ease the reviewers (apply to spec patch as well)
>>> Great suggestion, we will try to split the patch.
>>>
>>> Thanks,
>>> Gerry
>>>
>>>> Thanks

2020-01-02 06:34:46

by Jason Wang

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 2019/12/27 下午5:37, Liu, Jing2 wrote:
> Hi Jason,
>
> Thanks for reviewing the patches!
>
>  [...]
>>> -
>>> +#include <linux/msi.h>
>>> +#include <asm/irqdomain.h>
>>>     /* The alignment to use between consumer and producer parts of
>>> vring.
>>>    * Currently hardcoded to the page size. */
>>> @@ -90,6 +90,12 @@ struct virtio_mmio_device {
>>>       /* a list of queues so we can dispatch IRQs */
>>>       spinlock_t lock;
>>>       struct list_head virtqueues;
>>> +
>>> +    int doorbell_base;
>>> +    int doorbell_scale;
>>
>>
>> It's better to use the terminology defined by spec, e.g
>> notify_base/notify_multiplexer etc.
>>
>> And we usually use unsigned type for offset.
>
> We will fix this in next version.
>
>
>>
>>
>>> +    bool msi_enabled;
>>> +    /* Name strings for interrupts. */
>>> +    char (*vm_vq_names)[256];
>>>   };
>>>     struct virtio_mmio_vq_info {
>>> @@ -101,6 +107,8 @@ struct virtio_mmio_vq_info {
>>>   };
>>>     +static void vm_free_msi_irqs(struct virtio_device *vdev);
>>> +static int vm_request_msi_vectors(struct virtio_device *vdev, int
>>> nirqs);
>>>     /* Configuration interface */
>>>   @@ -273,12 +281,28 @@ static bool vm_notify(struct virtqueue *vq)
>>>   {
>>>       struct virtio_mmio_device *vm_dev =
>>> to_virtio_mmio_device(vq->vdev);
>>>   +    if (vm_dev->version == 3) {
>>> +        int offset = vm_dev->doorbell_base +
>>> +                 vm_dev->doorbell_scale * vq->index;
>>> +        writel(vq->index, vm_dev->base + offset);
>>
>>
>> In virtio-pci we store the doorbell address in vq->priv to avoid
>> doing multiplication in fast path.
>
> Good suggestion. We will fix this in next version.
>
> [...]
>
>>> +
>>> +static int vm_request_msi_vectors(struct virtio_device *vdev, int
>>> nirqs)
>>> +{
>>> +    struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
>>> +    int irq, err;
>>> +    u16 csr = readw(vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
>>> +
>>> +    if (vm_dev->msi_enabled || (csr & VIRTIO_MMIO_MSI_ENABLE_MASK)
>>> == 0)
>>> +        return -EINVAL;
>>> +
>>> +    vm_dev->vm_vq_names = kmalloc_array(nirqs,
>>> sizeof(*vm_dev->vm_vq_names),
>>> +            GFP_KERNEL);
>>> +    if (!vm_dev->vm_vq_names)
>>> +        return -ENOMEM;
>>> +
>>> +    if (!vdev->dev.msi_domain)
>>> +        vdev->dev.msi_domain = platform_msi_get_def_irq_domain();
>>> +    err = platform_msi_domain_alloc_irqs(&vdev->dev, nirqs,
>>> +            mmio_write_msi_msg);
>>
>>
>> Can platform_msi_domain_alloc_irqs check msi_domain vs NULL?
>>
> Actually here, we need to firstly consider the cases that @dev doesn't
> have @msi_domain,
>
> according to the report by lkp.
>
> For the platform_msi_domain_alloc_irqs, it can help check inside.
>
>
>> [...]
>>>         rc = register_virtio_device(&vm_dev->vdev);
>>>       if (rc)
>>> @@ -619,8 +819,6 @@ static int virtio_mmio_remove(struct
>>> platform_device *pdev)
>>>       return 0;
>>>   }
>>>   -
>>> -
>>
>>
>> Unnecessary changes.
>
> Got it. Will remove it later.
>
>
>> [...]
>>>   +/* MSI related registers */
>>> +
>>> +/* MSI status register */
>>> +#define VIRTIO_MMIO_MSI_STATUS        0x0c0
>>> +/* MSI command register */
>>> +#define VIRTIO_MMIO_MSI_COMMAND        0x0c2
>>> +/* MSI low 32 bit address, 64 bits in two halves */
>>> +#define VIRTIO_MMIO_MSI_ADDRESS_LOW    0x0c4
>>> +/* MSI high 32 bit address, 64 bits in two halves */
>>> +#define VIRTIO_MMIO_MSI_ADDRESS_HIGH    0x0c8
>>> +/* MSI data */
>>> +#define VIRTIO_MMIO_MSI_DATA        0x0cc
>>
>>
>> I wonder what's the advantage of using registers instead of memory
>> mapped tables as PCI did. Is this because MMIO doesn't have
>> capability list which makes it hard to be extended if we have
>> variable size of tables?
>>
> Yes, mmio doesn't have capability which limits the extension.


We may want to revisit and add something like this for future virtio
MMIO versions.


>
> It need some registers to specify the msi table/mask table/pending
> table offsets if using what pci did.
>
> As comments previously, mask/pending can be easily extended by MSI
> command.
>
>>
>>> +
>>> +/* RO: MSI feature enabled mask */
>>> +#define VIRTIO_MMIO_MSI_ENABLE_MASK    0x8000
>>> +/* RO: Maximum queue size available */
>>> +#define VIRTIO_MMIO_MSI_STATUS_QMASK    0x07ff
>>> +/* Reserved */
>>> +#define VIRTIO_MMIO_MSI_STATUS_RESERVED    0x7800
>>> +
>>> +#define VIRTIO_MMIO_MSI_CMD_UPDATE    0x1
>>
>>
>> I believe we need a command to read the number of vectors supported
>> by the device, or 2048 is assumed to be a fixed size here?
>
> For not bringing much complexity, we proposed vector per queue and
> fixed relationship between events and vectors.


It's a about the number of MSIs not the mapping between queues to MSIs.
And it looks to me it won't bring obvious complexity, just need a
register to read the #MSIs. Device implementation may stick to a fixed size.

Having few pages for a device that only have one queue is kind of a waste.

Thanks


>
>
> So the number of vectors supported by device is equal to the total
> number of vqs and config.
>
> We will try to explicitly highlight this point in spec for later version.
>
>
> Thanks!
>
> Jing
>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>

2020-01-02 06:56:56

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] support virtio mmio specification Version 3


On 2019/12/25 上午10:50, Zha Bin wrote:
> With the virtio spec proposal[1], other VMMs (e.g. Qemu) can also make use
> of the new features to get a enhancing performance.
>
> [1]https://lkml.org/lkml/2019/12/20/113
>
> Liu Jiang (2):
> x86/msi: Enhance x86 to support platform_msi
> virtio-mmio: add features for virtio-mmio specification version 3


Btw, for next version I suggest to copy both kvm-devel list and
qemu-devel list.

Thanks


2020-01-02 09:14:30

by Liu, Jing2

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

[...]

>>>
>>>> +
>>>> +/* RO: MSI feature enabled mask */
>>>> +#define VIRTIO_MMIO_MSI_ENABLE_MASK    0x8000
>>>> +/* RO: Maximum queue size available */
>>>> +#define VIRTIO_MMIO_MSI_STATUS_QMASK    0x07ff
>>>> +/* Reserved */
>>>> +#define VIRTIO_MMIO_MSI_STATUS_RESERVED    0x7800
>>>> +
>>>> +#define VIRTIO_MMIO_MSI_CMD_UPDATE    0x1
>>>
>>>
>>> I believe we need a command to read the number of vectors supported
>>> by the device, or 2048 is assumed to be a fixed size here?
>>
>> For not bringing much complexity, we proposed vector per queue and
>> fixed relationship between events and vectors.
>
>
> It's a about the number of MSIs not the mapping between queues to
> MSIs.And it looks to me it won't bring obvious complexity, just need a
> register to read the #MSIs. Device implementation may stick to a fixed
> size.

Based on that assumption, the device supports #MSIs = #queues + #config.
Then driver need not read the register.

We're trying to make such kind of agreement on spec level.

>
> Having few pages for a device that only have one queue is kind of a
> waste.

Could I ask what's the meaning of few pages here? BTW, we didn't define
MSIx-like tables for virtio-mmio.

Thanks,

Jing

>
> Thanks
>
>
>>
>>
>> So the number of vectors supported by device is equal to the total
>> number of vqs and config.
>>
>> We will try to explicitly highlight this point in spec for later
>> version.
>>
>>
>> Thanks!
>>
>> Jing
>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>

2020-01-03 03:26:07

by Jason Wang

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 2020/1/2 下午5:13, Liu, Jing2 wrote:
> [...]
>
>>>>
>>>>> +
>>>>> +/* RO: MSI feature enabled mask */
>>>>> +#define VIRTIO_MMIO_MSI_ENABLE_MASK    0x8000
>>>>> +/* RO: Maximum queue size available */
>>>>> +#define VIRTIO_MMIO_MSI_STATUS_QMASK    0x07ff
>>>>> +/* Reserved */
>>>>> +#define VIRTIO_MMIO_MSI_STATUS_RESERVED    0x7800
>>>>> +
>>>>> +#define VIRTIO_MMIO_MSI_CMD_UPDATE    0x1
>>>>
>>>>
>>>> I believe we need a command to read the number of vectors supported
>>>> by the device, or 2048 is assumed to be a fixed size here?
>>>
>>> For not bringing much complexity, we proposed vector per queue and
>>> fixed relationship between events and vectors.
>>
>>
>> It's a about the number of MSIs not the mapping between queues to
>> MSIs.And it looks to me it won't bring obvious complexity, just need
>> a register to read the #MSIs. Device implementation may stick to a
>> fixed size.
>
> Based on that assumption, the device supports #MSIs = #queues +
> #config. Then driver need not read the register.
>
> We're trying to make such kind of agreement on spec level.


Ok, I get you now.

But still, having fixed number of MSIs is less flexible. E.g:

- for x86, processor can only deal with about 250 interrupt vectors.
- driver may choose to share MSI vectors [1] (which is not merged but we
will for sure need it)

[1] https://lkml.org/lkml/2014/12/25/169


>
>>
>> Having few pages for a device that only have one queue is kind of a
>> waste.
>
> Could I ask what's the meaning of few pages here? BTW, we didn't
> define MSIx-like tables for virtio-mmio.


I thought you're using a fixed size (2048) for each device. But looks not :)

Thanks


>
> Thanks,
>
> Jing
>
>>
>> Thanks
>>
>>
>>>
>>>
>>> So the number of vectors supported by device is equal to the total
>>> number of vqs and config.
>>>
>>> We will try to explicitly highlight this point in spec for later
>>> version.
>>>
>>>
>>> Thanks!
>>>
>>> Jing
>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

2020-01-03 06:17:34

by Liu, Jiang

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3



> On Jan 3, 2020, at 11:24 AM, Jason Wang <[email protected]> wrote:
>
>
> On 2020/1/2 下午5:13, Liu, Jing2 wrote:
>> [...]
>>
>>>>>
>>>>>> +
>>>>>> +/* RO: MSI feature enabled mask */
>>>>>> +#define VIRTIO_MMIO_MSI_ENABLE_MASK 0x8000
>>>>>> +/* RO: Maximum queue size available */
>>>>>> +#define VIRTIO_MMIO_MSI_STATUS_QMASK 0x07ff
>>>>>> +/* Reserved */
>>>>>> +#define VIRTIO_MMIO_MSI_STATUS_RESERVED 0x7800
>>>>>> +
>>>>>> +#define VIRTIO_MMIO_MSI_CMD_UPDATE 0x1
>>>>>
>>>>>
>>>>> I believe we need a command to read the number of vectors supported by the device, or 2048 is assumed to be a fixed size here?
>>>>
>>>> For not bringing much complexity, we proposed vector per queue and fixed relationship between events and vectors.
>>>
>>>
>>> It's a about the number of MSIs not the mapping between queues to MSIs.And it looks to me it won't bring obvious complexity, just need a register to read the #MSIs. Device implementation may stick to a fixed size.
>>
>> Based on that assumption, the device supports #MSIs = #queues + #config. Then driver need not read the register.
>>
>> We're trying to make such kind of agreement on spec level.
>
>
> Ok, I get you now.
>
> But still, having fixed number of MSIs is less flexible. E.g:
>
> - for x86, processor can only deal with about 250 interrupt vectors.
> - driver may choose to share MSI vectors [1] (which is not merged but we will for sure need it)
Thanks for the info:)
X86 systems roughly have NCPU * 200 vectors available for device interrupts.
The proposed patch tries to map multiple event sources to an interrupt vector, to avoid running out of x86 CPU vectors.
Many virtio mmio devices may have several or tens of event sources, and it’s rare to have hundreds of event sources.
So could we treat the dynamic mapping between event sources and interrupt vectors as an advanced optional feature?

>
> [1] https://lkml.org/lkml/2014/12/25/169
>
>
>>
>>>
>>> Having few pages for a device that only have one queue is kind of a waste.
>>
>> Could I ask what's the meaning of few pages here? BTW, we didn't define MSIx-like tables for virtio-mmio.
>
>
> I thought you're using a fixed size (2048) for each device. But looks not :)
>
> Thanks
>
>
>>
>> Thanks,
>>
>> Jing
>>
>>>
>>> Thanks
>>>
>>>
>>>>
>>>>
>>>> So the number of vectors supported by device is equal to the total number of vqs and config.
>>>>
>>>> We will try to explicitly highlight this point in spec for later version.
>>>>
>>>>
>>>> Thanks!
>>>>
>>>> Jing
>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]

2020-01-03 06:52:17

by Liu, Jiang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

On Jan 2, 2020, at 2:28 PM, Jason Wang <[email protected]> wrote:
>
>
> On 2019/12/26 下午9:16, Liu, Jiang wrote:
>>
>>> On Dec 26, 2019, at 4:09 PM, Jason Wang <[email protected]> wrote:
>>>
>>>
>>> On 2019/12/25 下午11:20, Liu, Jiang wrote:
>>>>> On Dec 25, 2019, at 6:20 PM, Jason Wang <[email protected]> wrote:
>>>>>
>>>>>
>>>>> On 2019/12/25 上午10:50, Zha Bin wrote:
>>>>>> From: Liu Jiang <[email protected]>
>>>>>>
>>>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>>>>> virtio over mmio devices as a lightweight machine model for modern
>>>>>> cloud. The standard virtio over MMIO transport layer only supports one
>>>>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>>>>> layer using MSI. Legacy interrupt has long work path and causes specific
>>>>>> VMExits in following cases, which would considerably slow down the
>>>>>> performance:
>>>>>>
>>>>>> 1) read interrupt status register
>>>>>> 2) update interrupt status register
>>>>>> 3) write IOAPIC EOI register
>>>>>>
>>>>>> We proposed to update virtio over MMIO to version 3[1] to add the
>>>>>> following new features and enhance the performance.
>>>>>>
>>>>>> 1) Support Message Signaled Interrupt(MSI), which increases the
>>>>>> interrupt performance for virtio multi-queue devices
>>>>>> 2) Support per-queue doorbell, so the guest kernel may directly write
>>>>>> to the doorbells provided by virtio devices.
>>>>>>
>>>>>> The following is the network tcp_rr performance testing report, tested
>>>>>> with virtio-pci device, vanilla virtio-mmio device and patched
>>>>>> virtio-mmio device (run test 3 times for each case):
>>>>>>
>>>>>> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>>>>>
>>>>>> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
>>>>>> trans/s 9536 6939 9500
>>>>>> trans/s 9734 7029 9749
>>>>>> trans/s 9894 7095 9318
>>>>>>
>>>>>> [1] https://lkml.org/lkml/2019/12/20/113
>>>>> Thanks for the patch. Two questions after a quick glance:
>>>>>
>>>>> 1) In PCI we choose to support MSI-X instead of MSI for having extra flexibility like alias, independent data and address (e.g for affinity) . Any reason for not start from MSI-X? E.g having MSI-X table and PBA (both of which looks pretty independent).
>>>> Hi Jason,
>>>> Thanks for reviewing patches on Christmas Day:)
>>>> The PCI MSI-x has several advantages over PCI MSI, mainly
>>>> 1) support 2048 vectors, much more than 32 vectors supported by MSI.
>>>> 2) dedicated address/data for each vector,
>>>> 3) per vector mask/pending bits.
>>>> The proposed MMIO MSI extension supports both 1) and 2),
>>>
>>> Aha right, I mis-read the patch. But more questions comes:
>>>
>>> 1) The association between vq and MSI-X vector is fixed. This means it can't work for a device that have more than 2047 queues. We probably need something similar to virtio-pci to allow a dynamic association.
>> We have considered both the PCI MSI-x like dynamic association design and fix mapping design.
>> The fix mapping design simplifies both the interrupt configuration process and VMM implementations.
>
>
> Well, for VMM just an indirection and for guest, it can choose to use fixed mapping, just need to program once during probe.
>
>
>> And the virtio mmio transport layer is mainly used by light VMMs to support small scale virtual machines,
>
>
> Let's not limit the interface to be used by a specific case :). Eliminating PCIE would be appealing for other scenarios.
>
>
>> 2048 vectors should be enough for these usage cases.
>> So the fix mapping design has been used.
>>
>>> 2) The mask and unmask control is missed
>>>
>>>
>>>> but the extension doesn’t support 3) because
>>>> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.
>>>
>>> Not directly used but masking/unmasking is widely used in irq subsystem which allows lots of optimizations.
>>>
>>>
>>>> On the other hand, we want to simplify VMM implementations as simple as possible, and mimicking the PCI MSI-x
>>>> will cause some complexity to VMM implementations.
>>>
>>> I agree to simplify VMM implementation, but it looks to me introducing masking/pending won't cost too much code in the VMM implementation. Just new type of command for VIRTIO_MMIO_MSI_COMMAND.
>> We want to make VMM implementations as simple as possible:)
>> And based on following observations, we have disabled support of mask/unmask,
>> 1) MSI is edge triggering, which means it won’t be shared with other interrupt sources,
>
>
> Is this true? I think the spec does not forbid such usages, e.g using the same MSI address/command for different queues or devices?
Yes, the spec doesn’t forbid this.
We could share the same MSIx vector for multiple queues, rx/tax etc.
But we can’t share a Linux MSI interrupt for different devices/MSIx vectors, this is an implementation constraint of the Linux interrupt subsystem.

>
>
>> so masking/unmasking won’t be used for normal interrupt management logic.
>> 2) Linux virtio mmio transport layer doesn’t support suspend/resume yet, so there’s no need to quiesce the device by masking interrupts.
>
>
> Yes, but it's a limitation only for virtio mmio transport. We can add it.
>
>
>> 3) The legacy PCI 2.2 devices doesn’t support irq masking/unmasking, so irq masking/unmasking may be optional operations.
>
>
> Yes, but as you said, it helps for performance and some other cases. I still prefer to implement that consider it is not complex. If we do MSI without masking/unmasking, I suspect we will implement MSI-X finally somedays then maintaining MSI will become a burden... (still takes virtio-pci as an example, it choose to implement MSI-X not MSI).
>
>
>> So we skipped support of irq masking/unmasking. We will recheck whether irq masking/unmasking is mandatory for MMIO devices.
>> On the other hand, we may enhance the spec to define command codes for masking/unmasking, and VMM may optionally support masking/unmasking.
>
>
> Yes, thanks.
>
>
>>
>> Thanks,
>> Gerry
>>
>>> Thanks
>>>
>>>
>>>>> 2) It's better to split notify_multiplexer out of MSI support to ease the reviewers (apply to spec patch as well)
>>>> Great suggestion, we will try to split the patch.
>>>>
>>>> Thanks,
>>>> Gerry
>>>>
>>>>> Thanks

2020-01-03 09:15:14

by Jason Wang

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 2020/1/3 下午2:14, Liu, Jiang wrote:
>> Ok, I get you now.
>>
>> But still, having fixed number of MSIs is less flexible. E.g:
>>
>> - for x86, processor can only deal with about 250 interrupt vectors.
>> - driver may choose to share MSI vectors [1] (which is not merged but we will for sure need it)
> Thanks for the info:)
> X86 systems roughly have NCPU * 200 vectors available for device interrupts.
> The proposed patch tries to map multiple event sources to an interrupt vector, to avoid running out of x86 CPU vectors.
> Many virtio mmio devices may have several or tens of event sources, and it’s rare to have hundreds of event sources.
> So could we treat the dynamic mapping between event sources and interrupt vectors as an advanced optional feature?
>

Maybe, but I still prefer to implement it if it is not too complex.
Let's see Michael's opinion on this.

Thanks

2020-01-05 10:45:11

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

On Thu, Dec 26, 2019 at 09:16:19PM +0800, Liu, Jiang wrote:
> > 2) The mask and unmask control is missed
> >
> >
> >> but the extension doesn’t support 3) because
> >> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.

Linux uses masking/unmasking in order to migrate interrupts between
CPUs.

--
MST

2020-01-05 11:06:28

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

On Wed, Dec 25, 2019 at 10:50:23AM +0800, Zha Bin wrote:
> From: Liu Jiang <[email protected]>
>
> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> virtio over mmio devices as a lightweight machine model for modern
> cloud. The standard virtio over MMIO transport layer only supports one
> legacy interrupt, which is much heavier than virtio over PCI transport
> layer using MSI. Legacy interrupt has long work path and causes specific
> VMExits in following cases, which would considerably slow down the
> performance:
>
> 1) read interrupt status register
> 2) update interrupt status register
> 3) write IOAPIC EOI register
>
> We proposed to update virtio over MMIO to version 3[1] to add the
> following new features and enhance the performance.
>
> 1) Support Message Signaled Interrupt(MSI), which increases the
> interrupt performance for virtio multi-queue devices
> 2) Support per-queue doorbell, so the guest kernel may directly write
> to the doorbells provided by virtio devices.

Do we need to come up with new "doorbell" terminology?
virtio spec calls these available event notifications,
let's stick to this.


>
> The following is the network tcp_rr performance testing report, tested
> with virtio-pci device, vanilla virtio-mmio device and patched
> virtio-mmio device (run test 3 times for each case):
>
> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>
> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
> trans/s 9536 6939 9500
> trans/s 9734 7029 9749
> trans/s 9894 7095 9318
>
> [1] https://lkml.org/lkml/2019/12/20/113
>
> Signed-off-by: Liu Jiang <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>


Do we need a new version though? What is wrong with
a feature bit? This way we can create compatible devices
and drivers.

> ---
> drivers/virtio/virtio_mmio.c | 226 ++++++++++++++++++++++++++++++++++++---
> include/uapi/linux/virtio_mmio.h | 28 +++++
> 2 files changed, 240 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index e09edb5..065c083 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -68,8 +68,8 @@
> #include <linux/virtio_config.h>
> #include <uapi/linux/virtio_mmio.h>
> #include <linux/virtio_ring.h>
> -
> -
> +#include <linux/msi.h>
> +#include <asm/irqdomain.h>
>
> /* The alignment to use between consumer and producer parts of vring.
> * Currently hardcoded to the page size. */
> @@ -90,6 +90,12 @@ struct virtio_mmio_device {
> /* a list of queues so we can dispatch IRQs */
> spinlock_t lock;
> struct list_head virtqueues;
> +
> + int doorbell_base;
> + int doorbell_scale;
> + bool msi_enabled;
> + /* Name strings for interrupts. */
> + char (*vm_vq_names)[256];

Add a comment explaining 256 pls.

> };
>
> struct virtio_mmio_vq_info {
> @@ -101,6 +107,8 @@ struct virtio_mmio_vq_info {
> };
>
>
> +static void vm_free_msi_irqs(struct virtio_device *vdev);
> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs);
>
> /* Configuration interface */
>
> @@ -273,12 +281,28 @@ static bool vm_notify(struct virtqueue *vq)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
>
> + if (vm_dev->version == 3) {
> + int offset = vm_dev->doorbell_base +
> + vm_dev->doorbell_scale * vq->index;
> + writel(vq->index, vm_dev->base + offset);
> + } else
> /* We write the queue's selector into the notification register to
> * signal the other end */
> - writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> + writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> return true;
> }

You might want to support VIRTIO_F_NOTIFICATION_DATA too.


> +/* Handle a configuration change */
> +static irqreturn_t vm_config_changed(int irq, void *opaque)
> +{
> + struct virtio_mmio_device *vm_dev = opaque;
> +
> + virtio_config_changed(&vm_dev->vdev);
> +
> + return IRQ_HANDLED;
> +}
> +
> /* Notify all virtqueues on an interrupt. */
> static irqreturn_t vm_interrupt(int irq, void *opaque)
> {
> @@ -307,7 +331,12 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
> return ret;
> }
>
> +static int vm_msi_irq_vector(struct device *dev, unsigned int nr)
> +{
> + struct msi_desc *entry = first_msi_entry(dev);
>
> + return entry->irq + nr;
> +}
>
> static void vm_del_vq(struct virtqueue *vq)
> {
> @@ -316,6 +345,12 @@ static void vm_del_vq(struct virtqueue *vq)
> unsigned long flags;
> unsigned int index = vq->index;
>
> + if (vm_dev->version == 3 && vm_dev->msi_enabled) {
> + int irq = vm_msi_irq_vector(&vq->vdev->dev, index + 1);
> +
> + free_irq(irq, vq);
> + }
> +
> spin_lock_irqsave(&vm_dev->lock, flags);
> list_del(&info->node);
> spin_unlock_irqrestore(&vm_dev->lock, flags);


I think tying MSI to versions is a mistake.



> @@ -334,15 +369,41 @@ static void vm_del_vq(struct virtqueue *vq)
> kfree(info);
> }
>
> -static void vm_del_vqs(struct virtio_device *vdev)
> +static void vm_free_irqs(struct virtio_device *vdev)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + if (vm_dev->version == 3)
> + vm_free_msi_irqs(vdev);
> + else
> + free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
> +}
> +
> +static void vm_del_vqs(struct virtio_device *vdev)
> +{
> struct virtqueue *vq, *n;
>
> list_for_each_entry_safe(vq, n, &vdev->vqs, list)
> vm_del_vq(vq);
>
> - free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
> + vm_free_irqs(vdev);
> +}
> +
> +static int vm_request_single_irq(struct virtio_device *vdev)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + int irq = platform_get_irq(vm_dev->pdev, 0);
> + int err;
> +
> + if (irq < 0) {
> + dev_err(&vdev->dev, "Cannot get IRQ resource\n");
> + return irq;
> + }
> +
> + err = request_irq(irq, vm_interrupt, IRQF_SHARED,
> + dev_name(&vdev->dev), vm_dev);
> +
> + return err;
> }
>
> static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> @@ -455,6 +516,72 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> return ERR_PTR(err);
> }
>
> +static inline void mmio_msi_set_enable(struct virtio_mmio_device *vm_dev,
> + int enable)
> +{
> + u16 status;
> +
> + status = readw(vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
> + if (enable)
> + status |= VIRTIO_MMIO_MSI_ENABLE_MASK;
> + else
> + status &= ~VIRTIO_MMIO_MSI_ENABLE_MASK;
> + writew(status, vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
> +}
> +
> +static int vm_find_vqs_msi(struct virtio_device *vdev, unsigned int nvqs,
> + struct virtqueue *vqs[], vq_callback_t *callbacks[],
> + const char * const names[], const bool *ctx,
> + struct irq_affinity *desc)
> +{
> + int i, err, irq;
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + /* Allocate nvqs irqs for queues and one irq for configuration */
> + err = vm_request_msi_vectors(vdev, nvqs + 1);
> + if (err != 0)
> + return err;

Not all devices need high speed. Some might want to share
irqs between VQs, or even with config change callback.
Balloon is a case in point.
A hint about max # of MSI necessary would be a good
idea for this case.


Sharing MSI doesn't necessarily require dedicated registers like PCI has,
you can just program same vector in multiple VQs.

> +
> + for (i = 0; i < nvqs; i++) {
> + int index = i + 1;
> +
> + if (!names[i]) {
> + vqs[i] = NULL;
> + continue;
> + }
> + vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
> + ctx ? ctx[i] : false);
> + if (IS_ERR(vqs[i])) {
> + err = PTR_ERR(vqs[i]);
> + goto error_find;
> + }
> +
> + if (!callbacks[i])
> + continue;
> +
> + irq = vm_msi_irq_vector(&vqs[i]->vdev->dev, index);
> + /* allocate per-vq irq if available and necessary */
> + snprintf(vm_dev->vm_vq_names[index],
> + sizeof(*vm_dev->vm_vq_names),
> + "%s-%s",
> + dev_name(&vm_dev->vdev.dev), names[i]);
> + err = request_irq(irq, vring_interrupt, 0,
> + vm_dev->vm_vq_names[index], vqs[i]);
> +
> + if (err)
> + goto error_find;
> + }
> +
> + mmio_msi_set_enable(vm_dev, 1);
> + vm_dev->msi_enabled = true;
> +
> + return 0;
> +
> +error_find:
> + vm_del_vqs(vdev);
> + return err;
> +}
> +
> static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> struct virtqueue *vqs[],
> vq_callback_t *callbacks[],
> @@ -463,16 +590,16 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> struct irq_affinity *desc)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> - int irq = platform_get_irq(vm_dev->pdev, 0);
> int i, err, queue_idx = 0;
>
> - if (irq < 0) {
> - dev_err(&vdev->dev, "Cannot get IRQ resource\n");
> - return irq;
> + if (vm_dev->version == 3) {
> + err = vm_find_vqs_msi(vdev, nvqs, vqs, callbacks,
> + names, ctx, desc);
> + if (!err)
> + return 0;
> }
>
> - err = request_irq(irq, vm_interrupt, IRQF_SHARED,
> - dev_name(&vdev->dev), vm_dev);
> + err = vm_request_single_irq(vdev);
> if (err)
> return err;
>
> @@ -493,6 +620,73 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> return 0;
> }
>
> +static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
> +{
> + struct device *dev = desc->dev;
> + struct virtio_device *vdev = dev_to_virtio(dev);
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + void __iomem *pos = vm_dev->base;
> + uint16_t cmd = VIRTIO_MMIO_MSI_CMD(VIRTIO_MMIO_MSI_CMD_UPDATE,
> + desc->platform.msi_index);
> +
> + writel(msg->address_lo, pos + VIRTIO_MMIO_MSI_ADDRESS_LOW);
> + writel(msg->address_hi, pos + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
> + writel(msg->data, pos + VIRTIO_MMIO_MSI_DATA);
> + writew(cmd, pos + VIRTIO_MMIO_MSI_COMMAND);
> +}

All this can happen when IRQ affinity changes while device
is sending interrupts. An interrupt sent between the writel
operations will then be directed incorrectly.



> +
> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + int irq, err;
> + u16 csr = readw(vm_dev->base + VIRTIO_MMIO_MSI_STATUS);
> +
> + if (vm_dev->msi_enabled || (csr & VIRTIO_MMIO_MSI_ENABLE_MASK) == 0)
> + return -EINVAL;
> +
> + vm_dev->vm_vq_names = kmalloc_array(nirqs, sizeof(*vm_dev->vm_vq_names),
> + GFP_KERNEL);
> + if (!vm_dev->vm_vq_names)
> + return -ENOMEM;
> +
> + if (!vdev->dev.msi_domain)
> + vdev->dev.msi_domain = platform_msi_get_def_irq_domain();
> + err = platform_msi_domain_alloc_irqs(&vdev->dev, nirqs,
> + mmio_write_msi_msg);
> + if (err)
> + goto error_free_mem;
> +
> + /* The first MSI vector is used for configuration change events. */
> + snprintf(vm_dev->vm_vq_names[0], sizeof(*vm_dev->vm_vq_names),
> + "%s-config", dev_name(&vdev->dev));
> + irq = vm_msi_irq_vector(&vdev->dev, 0);
> + err = request_irq(irq, vm_config_changed, 0, vm_dev->vm_vq_names[0],
> + vm_dev);
> + if (err)
> + goto error_free_mem;
> +
> + return 0;
> +
> +error_free_mem:
> + kfree(vm_dev->vm_vq_names);
> + vm_dev->vm_vq_names = NULL;
> + return err;
> +}
> +
> +static void vm_free_msi_irqs(struct virtio_device *vdev)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> +
> + if (vm_dev->msi_enabled) {
> + mmio_msi_set_enable(vm_dev, 0);
> + free_irq(vm_msi_irq_vector(&vdev->dev, 0), vm_dev);
> + platform_msi_domain_free_irqs(&vdev->dev);
> + kfree(vm_dev->vm_vq_names);
> + vm_dev->vm_vq_names = NULL;
> + vm_dev->msi_enabled = false;
> + }
> +}
> +
> static const char *vm_bus_name(struct virtio_device *vdev)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> @@ -567,7 +761,7 @@ static int virtio_mmio_probe(struct platform_device *pdev)
>
> /* Check device version */
> vm_dev->version = readl(vm_dev->base + VIRTIO_MMIO_VERSION);
> - if (vm_dev->version < 1 || vm_dev->version > 2) {
> + if (vm_dev->version < 1 || vm_dev->version > 3) {
> dev_err(&pdev->dev, "Version %ld not supported!\n",
> vm_dev->version);
> return -ENXIO;
> @@ -603,6 +797,12 @@ static int virtio_mmio_probe(struct platform_device *pdev)
> dev_warn(&pdev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n");
>
> platform_set_drvdata(pdev, vm_dev);
> + if (vm_dev->version == 3) {
> + u32 db = readl(vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> + vm_dev->doorbell_base = db & 0xffff;
> + vm_dev->doorbell_scale = (db >> 16) & 0xffff;
> + }
>
> rc = register_virtio_device(&vm_dev->vdev);
> if (rc)
> @@ -619,8 +819,6 @@ static int virtio_mmio_remove(struct platform_device *pdev)
> return 0;
> }
>
> -
> -
> /* Devices list parameter */
>
> #if defined(CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES)
> diff --git a/include/uapi/linux/virtio_mmio.h b/include/uapi/linux/virtio_mmio.h
> index c4b0968..148895e 100644
> --- a/include/uapi/linux/virtio_mmio.h
> +++ b/include/uapi/linux/virtio_mmio.h
> @@ -122,6 +122,34 @@
> #define VIRTIO_MMIO_QUEUE_USED_LOW 0x0a0
> #define VIRTIO_MMIO_QUEUE_USED_HIGH 0x0a4
>
> +/* MSI related registers */
> +
> +/* MSI status register */
> +#define VIRTIO_MMIO_MSI_STATUS 0x0c0
> +/* MSI command register */
> +#define VIRTIO_MMIO_MSI_COMMAND 0x0c2
> +/* MSI low 32 bit address, 64 bits in two halves */
> +#define VIRTIO_MMIO_MSI_ADDRESS_LOW 0x0c4
> +/* MSI high 32 bit address, 64 bits in two halves */
> +#define VIRTIO_MMIO_MSI_ADDRESS_HIGH 0x0c8
> +/* MSI data */
> +#define VIRTIO_MMIO_MSI_DATA 0x0cc
> +
> +/* RO: MSI feature enabled mask */
> +#define VIRTIO_MMIO_MSI_ENABLE_MASK 0x8000

I don't understand the comment. Is this a way for
a version 3 device to say "I want/do not want MSI"?
Why not just use a feature bit? We are not short on these.

> +/* RO: Maximum queue size available */
> +#define VIRTIO_MMIO_MSI_STATUS_QMASK 0x07ff


I don't know what above is. Seems unused.

> +/* Reserved */
> +#define VIRTIO_MMIO_MSI_STATUS_RESERVED 0x7800
> +
> +#define VIRTIO_MMIO_MSI_CMD_UPDATE 0x1
> +#define VIRTIO_MMIO_MSI_CMD_OP_MASK 0xf000
> +#define VIRTIO_MMIO_MSI_CMD_ARG_MASK 0x0fff
> +#define VIRTIO_MMIO_MSI_CMD(op, arg) ((op) << 12 | (arg))
> +#define VIRITO_MMIO_MSI_CMD_ARG(cmd) ((cmd) & VIRTIO_MMIO_MSI_CMD_ARG_MASK)
> +#define VIRTIO_MMIO_MSI_CMD_OP(cmd) \
> + (((cmd) & VIRTIO_MMIO_MSI_CMD_OP_MASK) >> 12)
> +
> /* Configuration atomicity value */
> #define VIRTIO_MMIO_CONFIG_GENERATION 0x0fc
>
> --
> 1.8.3.1

2020-01-05 11:27:05

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

On Fri, Jan 03, 2020 at 05:12:38PM +0800, Jason Wang wrote:
>
> On 2020/1/3 下午2:14, Liu, Jiang wrote:
> > > Ok, I get you now.
> > >
> > > But still, having fixed number of MSIs is less flexible. E.g:
> > >
> > > - for x86, processor can only deal with about 250 interrupt vectors.
> > > - driver may choose to share MSI vectors [1] (which is not merged but we will for sure need it)
> > Thanks for the info:)
> > X86 systems roughly have NCPU * 200 vectors available for device interrupts.
> > The proposed patch tries to map multiple event sources to an interrupt vector, to avoid running out of x86 CPU vectors.
> > Many virtio mmio devices may have several or tens of event sources, and it’s rare to have hundreds of event sources.
> > So could we treat the dynamic mapping between event sources and interrupt vectors as an advanced optional feature?
> >
>
> Maybe, but I still prefer to implement it if it is not too complex. Let's
> see Michael's opinion on this.
>
> Thanks

I think a way for the device to limit # of vectors in use by driver is
useful. But sharing of vectors doesn't really need any special
registers, just program the same vector for multiple Qs/interrupts.

--
MST

2020-01-06 02:52:23

by Jason Wang

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 2020/1/5 下午7:25, Michael S. Tsirkin wrote:
> On Fri, Jan 03, 2020 at 05:12:38PM +0800, Jason Wang wrote:
>> On 2020/1/3 下午2:14, Liu, Jiang wrote:
>>>> Ok, I get you now.
>>>>
>>>> But still, having fixed number of MSIs is less flexible. E.g:
>>>>
>>>> - for x86, processor can only deal with about 250 interrupt vectors.
>>>> - driver may choose to share MSI vectors [1] (which is not merged but we will for sure need it)
>>> Thanks for the info:)
>>> X86 systems roughly have NCPU * 200 vectors available for device interrupts.
>>> The proposed patch tries to map multiple event sources to an interrupt vector, to avoid running out of x86 CPU vectors.
>>> Many virtio mmio devices may have several or tens of event sources, and it’s rare to have hundreds of event sources.
>>> So could we treat the dynamic mapping between event sources and interrupt vectors as an advanced optional feature?
>>>
>> Maybe, but I still prefer to implement it if it is not too complex. Let's
>> see Michael's opinion on this.
>>
>> Thanks
> I think a way for the device to limit # of vectors in use by driver is
> useful. But sharing of vectors doesn't really need any special
> registers, just program the same vector for multiple Qs/interrupts.


Right, but sine the #vectors is limited, we still need dynamic mapping
like what is done in PCI.

Thanks


>

2020-01-06 07:25:27

by Liu, Jing2

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 1/5/2020 6:42 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 26, 2019 at 09:16:19PM +0800, Liu, Jiang wrote:
>>> 2) The mask and unmask control is missed
>>>
>>>
>>>> but the extension doesn’t support 3) because
>>>> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.
> Linux uses masking/unmasking in order to migrate interrupts between
> CPUs.

Hi Michael,

Thanks for reviewing the patches!

When trying to study the mask/unmask use case during migrating irq, it
seems being used e.g.

1) migrate irq(s) away from offline cpu

2) irq affinity is changing, while an interrupt comes so it sets
SETAFFINITY_PENDING and

the lapic (e.g. x86) does the mask and unmask to finish the pending
during ack.

Is this right? So we should have mask/unmask for each vector.

Thanks,

Jing

2020-01-09 06:16:46

by Liu, Jing2

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 1/5/2020 7:04 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 25, 2019 at 10:50:23AM +0800, Zha Bin wrote:
>> From: Liu Jiang<[email protected]>
>>
>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>> virtio over mmio devices as a lightweight machine model for modern
>> cloud. The standard virtio over MMIO transport layer only supports one
>> legacy interrupt, which is much heavier than virtio over PCI transport
>> layer using MSI. Legacy interrupt has long work path and causes specific
>> VMExits in following cases, which would considerably slow down the
>> performance:
>>
>> 1) read interrupt status register
>> 2) update interrupt status register
>> 3) write IOAPIC EOI register
>>
>> We proposed to update virtio over MMIO to version 3[1] to add the
>> following new features and enhance the performance.
>>
>> 1) Support Message Signaled Interrupt(MSI), which increases the
>> interrupt performance for virtio multi-queue devices
>> 2) Support per-queue doorbell, so the guest kernel may directly write
>> to the doorbells provided by virtio devices.
> Do we need to come up with new "doorbell" terminology?
> virtio spec calls these available event notifications,
> let's stick to this.

Yes, let's keep virtio words, which just calls notifications.

>> The following is the network tcp_rr performance testing report, tested
>> with virtio-pci device, vanilla virtio-mmio device and patched
>> virtio-mmio device (run test 3 times for each case):
>>
>> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>
>> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
>> trans/s 9536 6939 9500
>> trans/s 9734 7029 9749
>> trans/s 9894 7095 9318
>>
>> [1]https://lkml.org/lkml/2019/12/20/113
>>
>> Signed-off-by: Liu Jiang<[email protected]>
>> Signed-off-by: Zha Bin<[email protected]>
>> Signed-off-by: Chao Peng<[email protected]>
>> Signed-off-by: Jing Liu<[email protected]>
> Do we need a new version though? What is wrong with
> a feature bit? This way we can create compatible devices
> and drivers.

We considered using 1 feature bit of 24~37 to specify MSI capability, but

this feature bit only means for mmio transport layer, but not representing

comment feature negotiation of the virtio device. So we're not sure if
this is a good choice.

>> [...]
>>
>> +static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
>> +{
>> + struct device *dev = desc->dev;
>> + struct virtio_device *vdev = dev_to_virtio(dev);
>> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
>> + void __iomem *pos = vm_dev->base;
>> + uint16_t cmd = VIRTIO_MMIO_MSI_CMD(VIRTIO_MMIO_MSI_CMD_UPDATE,
>> + desc->platform.msi_index);
>> +
>> + writel(msg->address_lo, pos + VIRTIO_MMIO_MSI_ADDRESS_LOW);
>> + writel(msg->address_hi, pos + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
>> + writel(msg->data, pos + VIRTIO_MMIO_MSI_DATA);
>> + writew(cmd, pos + VIRTIO_MMIO_MSI_COMMAND);
>> +}
> All this can happen when IRQ affinity changes while device
> is sending interrupts. An interrupt sent between the writel
> operations will then be directed incorrectly.

When investigating kernel MSI behavior, I found in most case there's no
action during IRQ affinity changes to avoid the interrupt coming.

For example, when migrate_one_irq, it masks the irq before
irq_do_set_affinity. But for others, like user setting any irq affinity

via /proc/, it only holds desc->lock instead of disable/mask irq. In
such case, how can it ensure the interrupt sending between writel ops?


>> [...]
>> +
>> +/* RO: MSI feature enabled mask */
>> +#define VIRTIO_MMIO_MSI_ENABLE_MASK 0x8000
> I don't understand the comment. Is this a way for
> a version 3 device to say "I want/do not want MSI"?
> Why not just use a feature bit? We are not short on these.

This is just used for current MSI enabled/disabled status, after all MSI
configurations setup finished.

Not for showing MSI capability.

In other words, since the concern of feature bit, we choose to update
the virtio mmio

version that devices with v3 have MSI capability and notifications.


Thanks,

Jing

2020-01-09 13:40:18

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

On Thu, Jan 09, 2020 at 02:15:51PM +0800, Liu, Jing2 wrote:
>
> On 1/5/2020 7:04 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 25, 2019 at 10:50:23AM +0800, Zha Bin wrote:
> > > From: Liu Jiang<[email protected]>
> > >
> > > Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
> > > virtio over mmio devices as a lightweight machine model for modern
> > > cloud. The standard virtio over MMIO transport layer only supports one
> > > legacy interrupt, which is much heavier than virtio over PCI transport
> > > layer using MSI. Legacy interrupt has long work path and causes specific
> > > VMExits in following cases, which would considerably slow down the
> > > performance:
> > >
> > > 1) read interrupt status register
> > > 2) update interrupt status register
> > > 3) write IOAPIC EOI register
> > >
> > > We proposed to update virtio over MMIO to version 3[1] to add the
> > > following new features and enhance the performance.
> > >
> > > 1) Support Message Signaled Interrupt(MSI), which increases the
> > > interrupt performance for virtio multi-queue devices
> > > 2) Support per-queue doorbell, so the guest kernel may directly write
> > > to the doorbells provided by virtio devices.
> > Do we need to come up with new "doorbell" terminology?
> > virtio spec calls these available event notifications,
> > let's stick to this.
>
> Yes, let's keep virtio words, which just calls notifications.
>
> > > The following is the network tcp_rr performance testing report, tested
> > > with virtio-pci device, vanilla virtio-mmio device and patched
> > > virtio-mmio device (run test 3 times for each case):
> > >
> > > netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
> > >
> > > Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
> > > trans/s 9536 6939 9500
> > > trans/s 9734 7029 9749
> > > trans/s 9894 7095 9318
> > >
> > > [1]https://lkml.org/lkml/2019/12/20/113
> > >
> > > Signed-off-by: Liu Jiang<[email protected]>
> > > Signed-off-by: Zha Bin<[email protected]>
> > > Signed-off-by: Chao Peng<[email protected]>
> > > Signed-off-by: Jing Liu<[email protected]>
> > Do we need a new version though? What is wrong with
> > a feature bit? This way we can create compatible devices
> > and drivers.
>
> We considered using 1 feature bit of 24~37 to specify MSI capability, but
>
> this feature bit only means for mmio transport layer, but not representing
>
> comment feature negotiation of the virtio device. So we're not sure if this
> is a good choice.

We are not short on bits, just don't use bits below 32
since these are for legacy devices.


> > > [...]
> > > +static void mmio_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
> > > +{
> > > + struct device *dev = desc->dev;
> > > + struct virtio_device *vdev = dev_to_virtio(dev);
> > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> > > + void __iomem *pos = vm_dev->base;
> > > + uint16_t cmd = VIRTIO_MMIO_MSI_CMD(VIRTIO_MMIO_MSI_CMD_UPDATE,
> > > + desc->platform.msi_index);
> > > +
> > > + writel(msg->address_lo, pos + VIRTIO_MMIO_MSI_ADDRESS_LOW);
> > > + writel(msg->address_hi, pos + VIRTIO_MMIO_MSI_ADDRESS_HIGH);
> > > + writel(msg->data, pos + VIRTIO_MMIO_MSI_DATA);
> > > + writew(cmd, pos + VIRTIO_MMIO_MSI_COMMAND);
> > > +}
> > All this can happen when IRQ affinity changes while device
> > is sending interrupts. An interrupt sent between the writel
> > operations will then be directed incorrectly.
>
> When investigating kernel MSI behavior, I found in most case there's no
> action during IRQ affinity changes to avoid the interrupt coming.
>
> For example, when migrate_one_irq, it masks the irq before
> irq_do_set_affinity. But for others, like user setting any irq affinity
>
> via /proc/, it only holds desc->lock instead of disable/mask irq. In such
> case, how can it ensure the interrupt sending between writel ops?

Could be a bug too. E.g. PCI spec explicitly says it's illegal to
change non-masked interrupts exactly for this reason.



>
> > > [...]
> > > +
> > > +/* RO: MSI feature enabled mask */
> > > +#define VIRTIO_MMIO_MSI_ENABLE_MASK 0x8000
> > I don't understand the comment. Is this a way for
> > a version 3 device to say "I want/do not want MSI"?
> > Why not just use a feature bit? We are not short on these.
>
> This is just used for current MSI enabled/disabled status, after all MSI
> configurations setup finished.
>
> Not for showing MSI capability.
>
> In other words, since the concern of feature bit, we choose to update the
> virtio mmio
>
> version that devices with v3 have MSI capability and notifications.
>
>
> Thanks,
>
> Jing

MSI looks like an optimization. I don't see how that
justifies incrementing a major version and breaking
compat with all existing guests.

--
MST

2020-01-09 19:40:08

by Liu, Jiang

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

On Jan 5, 2020, at 6:42 PM, Michael S. Tsirkin <[email protected]> wrote:
>
> On Thu, Dec 26, 2019 at 09:16:19PM +0800, Liu, Jiang wrote:
>>> 2) The mask and unmask control is missed
>>>
>>>
>>>> but the extension doesn’t support 3) because
>>>> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.
>
> Linux uses masking/unmasking in order to migrate interrupts between
> CPUs.
This is a limitation of the PCI MSI/MSIx spec.
To update the MSI/MSIx vector configuration, we need to write to msg_high/msg_low/msg_data registers.
Because write to three 32-bit registers is not an atomic operation on PCI bus, so it may cause incorrect interrupt delivery if interrupt happens after writing 1 or 2 registers.
When Intel remapping is enabled on x86 platforms, we don’t need to mask/unmask PCI MSI/MSIx interrupts when setting affinity.
For MMIO MSI extension, we have special design to solve this race window. The flow to update MMIO MSI vector configuration is:
1) write msg_high
2) write msg_low
3) write msg_data
4) write the command register to update the vector configuration.
During step 1-3, the hardware/device backend driver only caches the value written. And update the vector configuration in step 4, so it’s an atomic operation now.
So mask/unmask becomes optional for MMIO MSI interrupts.

>
> --
> MST

2020-01-09 20:30:33

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

On Fri, Jan 10, 2020 at 12:06:06AM +0800, Liu, Jiang wrote:
> On Jan 5, 2020, at 6:42 PM, Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Thu, Dec 26, 2019 at 09:16:19PM +0800, Liu, Jiang wrote:
> >>> 2) The mask and unmask control is missed
> >>>
> >>>
> >>>> but the extension doesn’t support 3) because
> >>>> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.
> >
> > Linux uses masking/unmasking in order to migrate interrupts between
> > CPUs.
> This is a limitation of the PCI MSI/MSIx spec.
> To update the MSI/MSIx vector configuration, we need to write to msg_high/msg_low/msg_data registers.
> Because write to three 32-bit registers is not an atomic operation on PCI bus, so it may cause incorrect interrupt delivery if interrupt happens after writing 1 or 2 registers.
> When Intel remapping is enabled on x86 platforms, we don’t need to mask/unmask PCI MSI/MSIx interrupts when setting affinity.
> For MMIO MSI extension, we have special design to solve this race window. The flow to update MMIO MSI vector configuration is:
> 1) write msg_high
> 2) write msg_low
> 3) write msg_data
> 4) write the command register to update the vector configuration.
> During step 1-3, the hardware/device backend driver only caches the value written. And update the vector configuration in step 4, so it’s an atomic operation now.
> So mask/unmask becomes optional for MMIO MSI interrupts.

Oh I see. That needs some documentation I guess.

One question though: how do you block VQ from generating interrupts?

There's value in doing that for performance since then device can avoid
re-checking interrupt enable/event idx values in memory.



> >
> > --
> > MST

2020-01-15 07:09:40

by Liu, Jing2

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 1/5/2020 7:04 PM, Michael S. Tsirkin wrote:
> [...]
>> +static int vm_find_vqs_msi(struct virtio_device *vdev, unsigned int nvqs,
>> + struct virtqueue *vqs[], vq_callback_t *callbacks[],
>> + const char * const names[], const bool *ctx,
>> + struct irq_affinity *desc)
>> +{
>> + int i, err, irq;
>> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
>> +
>> + /* Allocate nvqs irqs for queues and one irq for configuration */
>> + err = vm_request_msi_vectors(vdev, nvqs + 1);
>> + if (err != 0)
>> + return err;
> Not all devices need high speed. Some might want to share
> irqs between VQs, or even with config change callback.
> Balloon is a case in point.
> A hint about max # of MSI necessary would be a good
> idea for this case.

This seems being a hint about telling MSI number device supported and
whether it wants MSI sharing.

For devices with tens of queues at most or need high speed, they choose
vector per queue, and can simply use fixed mapping.

For others, it can ask for advanced mode, which means MSI sharing and
dynamic mapping.

What about let device decide the mode it would use, as follows.

MaxVecNum 32bit - The max msi vector number that device supports.

MsiState 32bit

- bit[x]=0 implies vec per queue/config and fixed mapping. In this case,
MsiVecNum>=num_queue+1

- bit [x]=1 implies the hint of msi sharing and dynamic mapping. In this
case, MsiVecNum<num_queue+1


Thanks,

Jing

>
> Sharing MSI doesn't necessarily require dedicated registers like PCI has,
> you can just program same vector in multiple VQs.

2020-01-17 14:00:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] x86/msi: Enhance x86 to support platform_msi

Zha Bin <[email protected]> writes:

> From: Liu Jiang <[email protected]>
>
> The x86 platform currently only supports specific MSI/MSI-x for PCI
> devices. To enable MSI to the platform devices such as virtio-mmio
> device, this patch enhances x86 with platform MSI by leveraging the
> already built-in platform-msi driver (drivers/base/platform-msi.c)
> and makes it as a configurable option.

Why are you trying to make this an architecture feature instead of
having a generic implementation which can work on all architectures
which support virtio-mmio?

> Signed-off-by: Liu Jiang <[email protected]>
> Signed-off-by: Zha Bin <[email protected]>
> Signed-off-by: Chao Peng <[email protected]>
> Signed-off-by: Jing Liu <[email protected]>

This Signed-off-by chain is invalid.

Thanks,

tglx

2020-01-17 14:08:12

by Liu, Jiang

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] x86/msi: Enhance x86 to support platform_msi



> On Jan 17, 2020, at 9:58 PM, Thomas Gleixner <[email protected]> wrote:
>
> Zha Bin <[email protected]> writes:
>
>> From: Liu Jiang <[email protected]>
>>
>> The x86 platform currently only supports specific MSI/MSI-x for PCI
>> devices. To enable MSI to the platform devices such as virtio-mmio
>> device, this patch enhances x86 with platform MSI by leveraging the
>> already built-in platform-msi driver (drivers/base/platform-msi.c)
>> and makes it as a configurable option.
>
> Why are you trying to make this an architecture feature instead of
> having a generic implementation which can work on all architectures
> which support virtio-mmio?
Thanks, Thomas!
We are reworking to implement it as a generic irqdomain for all virtio MMIO devices.

>
>> Signed-off-by: Liu Jiang <[email protected]>
>> Signed-off-by: Zha Bin <[email protected]>
>> Signed-off-by: Chao Peng <[email protected]>
>> Signed-off-by: Jing Liu <[email protected]>
>
> This Signed-off-by chain is invalid.
>
> Thanks,
>
> tglx

2020-01-19 02:28:55

by Liu, Jing2

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] x86/msi: Enhance x86 to support platform_msi


On 1/17/2020 9:58 PM, Thomas Gleixner wrote:
>> Signed-off-by: Liu Jiang <[email protected]>
>> Signed-off-by: Zha Bin <[email protected]>
>> Signed-off-by: Chao Peng <[email protected]>
>> Signed-off-by: Jing Liu <[email protected]>
> This Signed-off-by chain is invalid.
>
> Thanks,
>
> tglx

Could I ask how can we handle such co-author situation?

Thanks!

Jing

2020-01-19 15:00:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] x86/msi: Enhance x86 to support platform_msi

"Liu, Jing2" <[email protected]> writes:
> On 1/17/2020 9:58 PM, Thomas Gleixner wrote:
>>> Signed-off-by: Liu Jiang <[email protected]>
>>> Signed-off-by: Zha Bin <[email protected]>
>>> Signed-off-by: Chao Peng <[email protected]>
>>> Signed-off-by: Jing Liu <[email protected]>
>> This Signed-off-by chain is invalid.
>>
>
> Could I ask how can we handle such co-author situation?

Search for Co-developed-by in Documentaiton,

Thanks,

tglx

2020-01-21 05:04:29

by Liu, Jing2

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3


On 1/5/2020 7:04 PM, Michael S. Tsirkin wrote:
>
>>
>> struct virtio_mmio_vq_info {
>> @@ -101,6 +107,8 @@ struct virtio_mmio_vq_info {
>> };
>>
>>
>> +static void vm_free_msi_irqs(struct virtio_device *vdev);
>> +static int vm_request_msi_vectors(struct virtio_device *vdev, int nirqs);
>>
>> /* Configuration interface */
>>
>> @@ -273,12 +281,28 @@ static bool vm_notify(struct virtqueue *vq)
>> {
>> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
>>
>> + if (vm_dev->version == 3) {
>> + int offset = vm_dev->doorbell_base +
>> + vm_dev->doorbell_scale * vq->index;
>> + writel(vq->index, vm_dev->base + offset);
>> + } else
>> /* We write the queue's selector into the notification register to
>> * signal the other end */
>> - writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
>> + writel(vq->index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
>> +
>> return true;
>> }
> You might want to support VIRTIO_F_NOTIFICATION_DATA too.
>
Yes, the feature is new in virtio1.1 and the kernel has not defined and
implemented it yet.

To implement it for both PCI and MMIO, maybe it would be good to work a
separate patch set later?

BTW, we are working on v2 and will send out later.

Thanks,

Jing