2015-12-03 18:36:49

by Yunhong Jiang

[permalink] [raw]
Subject: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

When assigning a VFIO device to a KVM guest with low latency requirement, it
is better to handle the interrupt in the hard interrupt context, to reduce
the context switch to/from the IRQ thread.

Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi
interrupt is changed to use request_threaded_irq(). The primary interrupt
handler tries to set the guest interrupt atomically. If it fails to achieve
it, a threaded interrupt handler will be invoked.

The irq_bypass manager is extended for this purpose. The KVM eventfd will
provide a irqbypass consumer to handle the interrupt at hard interrupt
context. The producer will invoke the consumer's handler then.

Yunhong Jiang (5):
Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup
Support runtime irq_bypass consumer
Support threaded interrupt handling on VFIO
Add the irq handling consumer
Expose x86 kvm_arch_set_irq_inatomic()

arch/x86/kvm/Kconfig | 1 +
drivers/vfio/pci/vfio_pci_intrs.c | 39 ++++++++++--
include/linux/irqbypass.h | 8 +++
include/linux/kvm_host.h | 19 +++++-
include/linux/kvm_irqfd.h | 1 +
virt/kvm/Kconfig | 3 +
virt/kvm/eventfd.c | 131 ++++++++++++++++++++++++++------------
virt/lib/irqbypass.c | 82 ++++++++++++++++++------
8 files changed, 214 insertions(+), 70 deletions(-)

--
1.8.3.1


2015-12-03 18:38:57

by Yunhong Jiang

[permalink] [raw]
Subject: [PATCH 1/5] KVM: Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup

Separate the irqfd_wakeup_pollin/irqfd_wakeup_pollup from the
irqfd_wakeup, so that we can reuse the logic for MSI fastpath injection.

Signed-off-by: Yunhong Jiang <[email protected]>
---
virt/kvm/eventfd.c | 86 ++++++++++++++++++++++++++++++++----------------------
1 file changed, 51 insertions(+), 35 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 46dbc0a7dfc1..c31d43b762db 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -180,6 +180,53 @@ int __attribute__((weak)) kvm_arch_set_irq_inatomic(
return -EWOULDBLOCK;
}

+static int
+irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
+{
+ struct kvm *kvm = irqfd->kvm;
+ struct kvm_kernel_irq_routing_entry irq;
+ unsigned seq;
+ int idx, ret;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ do {
+ seq = read_seqcount_begin(&irqfd->irq_entry_sc);
+ irq = irqfd->irq_entry;
+ } while (read_seqcount_retry(&irqfd->irq_entry_sc, seq));
+ /* An event has been signaled, inject an interrupt */
+ ret = kvm_arch_set_irq_inatomic(&irq, kvm,
+ KVM_USERSPACE_IRQ_SOURCE_ID, 1,
+ false);
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+
+ return ret;
+}
+
+static int
+irqfd_wakeup_pollup(struct kvm_kernel_irqfd *irqfd)
+{
+ struct kvm *kvm = irqfd->kvm;
+ unsigned long flags;
+
+ spin_lock_irqsave(&kvm->irqfds.lock, flags);
+
+ /*
+ * We must check if someone deactivated the irqfd before
+ * we could acquire the irqfds.lock since the item is
+ * deactivated from the KVM side before it is unhooked from
+ * the wait-queue. If it is already deactivated, we can
+ * simply return knowing the other side will cleanup for us.
+ * We cannot race against the irqfd going away since the
+ * other side is required to acquire wqh->lock, which we hold
+ */
+ if (irqfd_is_active(irqfd))
+ irqfd_deactivate(irqfd);
+
+ spin_unlock_irqrestore(&kvm->irqfds.lock, flags);
+
+ return 0;
+}
+
/*
* Called with wqh->lock held and interrupts disabled
*/
@@ -189,45 +236,14 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
struct kvm_kernel_irqfd *irqfd =
container_of(wait, struct kvm_kernel_irqfd, wait);
unsigned long flags = (unsigned long)key;
- struct kvm_kernel_irq_routing_entry irq;
- struct kvm *kvm = irqfd->kvm;
- unsigned seq;
- int idx;

- if (flags & POLLIN) {
- idx = srcu_read_lock(&kvm->irq_srcu);
- do {
- seq = read_seqcount_begin(&irqfd->irq_entry_sc);
- irq = irqfd->irq_entry;
- } while (read_seqcount_retry(&irqfd->irq_entry_sc, seq));
- /* An event has been signaled, inject an interrupt */
- if (kvm_arch_set_irq_inatomic(&irq, kvm,
- KVM_USERSPACE_IRQ_SOURCE_ID, 1,
- false) == -EWOULDBLOCK)
+ if (flags & POLLIN)
+ if (irqfd_wakeup_pollin(irqfd) == -EWOULDBLOCK)
schedule_work(&irqfd->inject);
- srcu_read_unlock(&kvm->irq_srcu, idx);
- }

- if (flags & POLLHUP) {
+ if (flags & POLLHUP)
/* The eventfd is closing, detach from KVM */
- unsigned long flags;
-
- spin_lock_irqsave(&kvm->irqfds.lock, flags);
-
- /*
- * We must check if someone deactivated the irqfd before
- * we could acquire the irqfds.lock since the item is
- * deactivated from the KVM side before it is unhooked from
- * the wait-queue. If it is already deactivated, we can
- * simply return knowing the other side will cleanup for us.
- * We cannot race against the irqfd going away since the
- * other side is required to acquire wqh->lock, which we hold
- */
- if (irqfd_is_active(irqfd))
- irqfd_deactivate(irqfd);
-
- spin_unlock_irqrestore(&kvm->irqfds.lock, flags);
- }
+ irqfd_wakeup_pollup(irqfd);

return 0;
}
--
1.8.3.1

2015-12-03 18:36:48

by Yunhong Jiang

[permalink] [raw]
Subject: [PATCH 2/5] VIRT: Support runtime irq_bypass consumer

Extend the irq_bypass manager to support runtime consumers. A runtime
irq_bypass consumer can handle interrupt when an interrupt triggered. A
runtime consumer has it's handle_irq() function set and passing a
irq_context for the irq handling.

A producer keep a link for the runtime consumers, so that it can invoke
each consumer's handle_irq() when irq invoked.

Currently the irq_bypass manager has several code path assuming there is
only one consumer/producer pair for each token. For example, when
register the producer, it exits the loop after finding one match
consumer. This is updated to support both static consumer (like for
Posted Interrupt consumer) and runtime consumer.

Signed-off-by: Yunhong Jiang <[email protected]>
---
include/linux/irqbypass.h | 8 +++++
virt/lib/irqbypass.c | 82 +++++++++++++++++++++++++++++++++++------------
2 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
index 1551b5b2f4c2..d5bec0c7be3a 100644
--- a/include/linux/irqbypass.h
+++ b/include/linux/irqbypass.h
@@ -12,6 +12,7 @@
#define IRQBYPASS_H

#include <linux/list.h>
+#include <linux/srcu.h>

struct irq_bypass_consumer;

@@ -47,6 +48,9 @@ struct irq_bypass_consumer;
*/
struct irq_bypass_producer {
struct list_head node;
+ /* Update side is synchronized by the lock on irqbypass.c */
+ struct srcu_struct srcu;
+ struct list_head consumers;
void *token;
int irq;
int (*add_consumer)(struct irq_bypass_producer *,
@@ -61,6 +65,7 @@ struct irq_bypass_producer {
* struct irq_bypass_consumer - IRQ bypass consumer definition
* @node: IRQ bypass manager private list management
* @token: opaque token to match between producer and consumer
+ * @sibling: consumers with same token list management
* @add_producer: Connect the IRQ consumer to an IRQ producer
* @del_producer: Disconnect the IRQ consumer from an IRQ producer
* @stop: Perform any quiesce operations necessary prior to add/del (optional)
@@ -73,6 +78,7 @@ struct irq_bypass_producer {
*/
struct irq_bypass_consumer {
struct list_head node;
+ struct list_head sibling;
void *token;
int (*add_producer)(struct irq_bypass_consumer *,
struct irq_bypass_producer *);
@@ -80,6 +86,8 @@ struct irq_bypass_consumer {
struct irq_bypass_producer *);
void (*stop)(struct irq_bypass_consumer *);
void (*start)(struct irq_bypass_consumer *);
+ int (*handle_irq)(void *arg);
+ void *irq_context;
};

int irq_bypass_register_producer(struct irq_bypass_producer *);
diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index 09a03b5a21ff..43ef9e2c77dc 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -21,6 +21,7 @@
#include <linux/list.h>
#include <linux/module.h>
#include <linux/mutex.h>
+#include <linux/rculist.h>

MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("IRQ bypass manager utility module");
@@ -49,11 +50,8 @@ static int __connect(struct irq_bypass_producer *prod,
prod->del_consumer(prod, cons);
}

- if (cons->start)
- cons->start(cons);
- if (prod->start)
- prod->start(prod);
-
+ if (!ret && cons->handle_irq)
+ list_add_rcu(&cons->sibling, &prod->consumers);
return ret;
}

@@ -71,6 +69,11 @@ static void __disconnect(struct irq_bypass_producer *prod,
if (prod->del_consumer)
prod->del_consumer(prod, cons);

+ if (cons->handle_irq) {
+ list_del_rcu(&cons->sibling);
+ synchronize_srcu(&prod->srcu);
+ }
+
if (cons->start)
cons->start(cons);
if (prod->start)
@@ -87,7 +90,8 @@ static void __disconnect(struct irq_bypass_producer *prod,
int irq_bypass_register_producer(struct irq_bypass_producer *producer)
{
struct irq_bypass_producer *tmp;
- struct irq_bypass_consumer *consumer;
+ struct list_head *node, *next, siblings = LIST_HEAD_INIT(siblings);
+ int ret;

might_sleep();

@@ -96,6 +100,9 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)

mutex_lock(&lock);

+ INIT_LIST_HEAD(&producer->consumers);
+ init_srcu_struct(&producer->srcu);
+
list_for_each_entry(tmp, &producers, node) {
if (tmp->token == producer->token) {
mutex_unlock(&lock);
@@ -104,23 +111,48 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
}
}

- list_for_each_entry(consumer, &consumers, node) {
+ list_for_each_safe(node, next, &consumers) {
+ struct irq_bypass_consumer *consumer = container_of(
+ node, struct irq_bypass_consumer, node);
+
if (consumer->token == producer->token) {
- int ret = __connect(producer, consumer);
- if (ret) {
- mutex_unlock(&lock);
- module_put(THIS_MODULE);
- return ret;
- }
- break;
+ ret = __connect(producer, consumer);
+ if (ret)
+ goto error;
+ /* Keep the connected consumers temply */
+ list_del(&consumer->node);
+ list_add_rcu(&consumer->node, &siblings);
}
}

+ list_for_each_safe(node, next, &siblings) {
+ struct irq_bypass_consumer *consumer = container_of(
+ node, struct irq_bypass_consumer, node);
+
+ list_del(&consumer->node);
+ list_add(&consumer->node, &consumers);
+ if (consumer->start)
+ consumer->start(consumer);
+ }
+
+ if (producer->start)
+ producer->start(producer);
list_add(&producer->node, &producers);

mutex_unlock(&lock);
-
return 0;
+
+error:
+ list_for_each_safe(node, next, &siblings) {
+ struct irq_bypass_consumer *consumer = container_of(
+ node, struct irq_bypass_consumer, node);
+
+ list_del(&consumer->node);
+ list_add(&consumer->node, &consumers);
+ }
+ mutex_unlock(&lock);
+ module_put(THIS_MODULE);
+ return ret;
}
EXPORT_SYMBOL_GPL(irq_bypass_register_producer);

@@ -133,7 +165,7 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
*/
void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
{
- struct irq_bypass_producer *tmp;
+ struct irq_bypass_producer *tmp, *n;
struct irq_bypass_consumer *consumer;

might_sleep();
@@ -143,7 +175,7 @@ void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)

mutex_lock(&lock);

- list_for_each_entry(tmp, &producers, node) {
+ list_for_each_entry_safe(tmp, n, &producers, node) {
if (tmp->token != producer->token)
continue;

@@ -159,6 +191,7 @@ void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
break;
}

+ cleanup_srcu_struct(&producer->srcu);
mutex_unlock(&lock);

module_put(THIS_MODULE);
@@ -180,6 +213,9 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
if (!consumer->add_producer || !consumer->del_producer)
return -EINVAL;

+ if (consumer->handle_irq && !consumer->irq_context)
+ return -EINVAL;
+
might_sleep();

if (!try_module_get(THIS_MODULE))
@@ -188,7 +224,7 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
mutex_lock(&lock);

list_for_each_entry(tmp, &consumers, node) {
- if (tmp->token == consumer->token) {
+ if (tmp == consumer) {
mutex_unlock(&lock);
module_put(THIS_MODULE);
return -EBUSY;
@@ -203,6 +239,10 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
module_put(THIS_MODULE);
return ret;
}
+ if (consumer->start)
+ consumer->start(consumer);
+ if (producer->start)
+ producer->start(producer);
break;
}
}
@@ -224,7 +264,7 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
*/
void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
{
- struct irq_bypass_consumer *tmp;
+ struct irq_bypass_consumer *tmp, *n;
struct irq_bypass_producer *producer;

might_sleep();
@@ -234,8 +274,8 @@ void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)

mutex_lock(&lock);

- list_for_each_entry(tmp, &consumers, node) {
- if (tmp->token != consumer->token)
+ list_for_each_entry_safe(tmp, n, &consumers, node) {
+ if (tmp != consumer)
continue;

list_for_each_entry(producer, &producers, node) {
--
1.8.3.1

2015-12-03 18:38:40

by Yunhong Jiang

[permalink] [raw]
Subject: [PATCH 3/5] VFIO: Support threaded interrupt handling on VFIO

For VFIO device with MSI interrupt type, it's possible to handle the
interrupt on hard interrupt context without invoking the interrupt
thread. Handling the interrupt on hard interrupt context reduce the
interrupt latency.

Signed-off-by: Yunhong Jiang <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 39 ++++++++++++++++++++++++++++++++++-----
1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 3b3ba15558b7..108d335c5656 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -236,12 +236,35 @@ static void vfio_intx_disable(struct vfio_pci_device *vdev)
kfree(vdev->ctx);
}

+static irqreturn_t vfio_msihandler(int irq, void *arg)
+{
+ struct vfio_pci_irq_ctx *ctx = arg;
+ struct irq_bypass_producer *producer = &ctx->producer;
+ struct irq_bypass_consumer *consumer;
+ int ret = IRQ_HANDLED, idx;
+
+ idx = srcu_read_lock(&producer->srcu);
+
+ list_for_each_entry_rcu(consumer, &producer->consumers, sibling) {
+ /*
+ * Invoke the thread handler if any consumer would block, but
+ * finish all consumes.
+ */
+ if (consumer->handle_irq(consumer->irq_context) == -EWOULDBLOCK)
+ ret = IRQ_WAKE_THREAD;
+ continue;
+ }
+
+ srcu_read_unlock(&producer->srcu, idx);
+ return ret;
+}
+
/*
* MSI/MSI-X
*/
-static irqreturn_t vfio_msihandler(int irq, void *arg)
+static irqreturn_t vfio_msihandler_threaded(int irq, void *arg)
{
- struct eventfd_ctx *trigger = arg;
+ struct eventfd_ctx *trigger = ((struct vfio_pci_irq_ctx *)arg)->trigger;

eventfd_signal(trigger, 1);
return IRQ_HANDLED;
@@ -318,7 +341,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
return -EINVAL;

if (vdev->ctx[vector].trigger) {
- free_irq(irq, vdev->ctx[vector].trigger);
+ free_irq(irq, &vdev->ctx[vector]);
irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(vdev->ctx[vector].trigger);
@@ -353,8 +376,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
pci_write_msi_msg(irq, &msg);
}

- ret = request_irq(irq, vfio_msihandler, 0,
- vdev->ctx[vector].name, trigger);
+ /*
+ * Currently the primary handler for the thread_irq will be invoked on
+ * a thread, the IRQF_ONESHOT is a hack for it.
+ */
+ ret = request_threaded_irq(irq, vfio_msihandler,
+ vfio_msihandler_threaded,
+ IRQF_ONESHOT, vdev->ctx[vector].name,
+ &vdev->ctx[vector]);
if (ret) {
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(trigger);
--
1.8.3.1

2015-12-03 18:38:00

by Yunhong Jiang

[permalink] [raw]
Subject: [PATCH 4/5] KVM: Add the irq handling consumer

Add an irq_bypass consumer to the KVM eventfd, so that when a MSI interrupt
happens and triggerred from VFIO, it can be handled fast.

Signed-off-by: Yunhong Jiang <[email protected]>
---
include/linux/kvm_irqfd.h | 1 +
virt/kvm/eventfd.c | 42 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index 0c1de05098c8..5573d53ccebb 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -65,6 +65,7 @@ struct kvm_kernel_irqfd {
poll_table pt;
struct work_struct shutdown;
struct irq_bypass_consumer consumer;
+ struct irq_bypass_consumer fastpath;
struct irq_bypass_producer *producer;
};

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c31d43b762db..b20a2d1bbf73 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -144,6 +144,8 @@ irqfd_shutdown(struct work_struct *work)
#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
irq_bypass_unregister_consumer(&irqfd->consumer);
#endif
+ if (irqfd->fastpath.token)
+ irq_bypass_unregister_consumer(&irqfd->fastpath);
eventfd_ctx_put(irqfd->eventfd);
kfree(irqfd);
}
@@ -203,6 +205,14 @@ irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
}

static int
+kvm_fastpath_irq(void *arg)
+{
+ struct kvm_kernel_irqfd *irqfd = arg;
+
+ return irqfd_wakeup_pollin(irqfd);
+}
+
+static int
irqfd_wakeup_pollup(struct kvm_kernel_irqfd *irqfd)
{
struct kvm *kvm = irqfd->kvm;
@@ -296,6 +306,34 @@ int __attribute__((weak)) kvm_arch_update_irqfd_routing(
}
#endif

+static int kvm_fastpath_stub(struct irq_bypass_consumer *stub,
+ struct irq_bypass_producer *stub1)
+{
+ return 0;
+}
+
+static void kvm_fastpath_stub1(struct irq_bypass_consumer *stub,
+ struct irq_bypass_producer *stub1)
+{
+}
+
+static int setup_fastpath_consumer(struct kvm_kernel_irqfd *irqfd)
+{
+ int ret;
+
+ irqfd->fastpath.token = (void *)irqfd->eventfd;
+ irqfd->fastpath.add_producer = kvm_fastpath_stub;
+ irqfd->fastpath.del_producer = kvm_fastpath_stub1;
+ irqfd->fastpath.handle_irq = kvm_fastpath_irq;
+ irqfd->fastpath.irq_context = irqfd;
+ ret = irq_bypass_register_consumer(&irqfd->fastpath);
+
+ if (ret)
+ /* A special tag to indicate consumer not working */
+ irqfd->fastpath.token = (void *)0;
+ return ret;
+}
+
static int
kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
{
@@ -435,6 +473,10 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
irqfd->consumer.token, ret);
#endif

+ if (setup_fastpath_consumer(irqfd))
+ pr_info("irq bypass fastpath consumer (toke %p) registration fails: %d\n",
+ irqfd->eventfd, ret);
+
return 0;

fail:
--
1.8.3.1

2015-12-03 18:37:36

by Yunhong Jiang

[permalink] [raw]
Subject: [PATCH 5/5] KVM: Expose x86 kvm_arch_set_irq_inatomic()

The x86 support setting irq in atomic, expose it to vfio driver.

Signed-off-by: Yunhong Jiang <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
include/linux/kvm_host.h | 19 ++++++++++++++++---
virt/kvm/Kconfig | 3 +++
virt/kvm/eventfd.c | 9 ---------
4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 639a6e34500c..642e8b905c96 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -30,6 +30,7 @@ config KVM
select HAVE_KVM_IRQFD
select IRQ_BYPASS_MANAGER
select HAVE_KVM_IRQ_BYPASS
+ select KVM_SET_IRQ_INATOMIC
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 590c46e672df..a6e237275928 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -852,9 +852,6 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level,
bool line_status);
int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
int irq_source_id, int level, bool line_status);
-int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
- struct kvm *kvm, int irq_source_id,
- int level, bool line_status);
bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
void kvm_notify_acked_gsi(struct kvm *kvm, int gsi);
void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
@@ -1207,4 +1204,20 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
uint32_t guest_irq, bool set);
#endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */

+#ifndef CONFIG_KVM_SET_IRQ_INATOMIC
+int __attribute__((weak)) kvm_arch_set_irq_inatomic(
+ struct kvm_kernel_irq_routing_entry *irq,
+ struct kvm *kvm, int irq_source_id,
+ int level,
+ bool line_status)
+{
+ return -EWOULDBLOCK;
+}
+#else
+extern int kvm_arch_set_irq_inatomic(
+ struct kvm_kernel_irq_routing_entry *e,
+ struct kvm *kvm, int irq_source_id, int level,
+ bool line_status);
+#endif
+
#endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 7a79b6853583..7c99dd4724a4 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -50,3 +50,6 @@ config KVM_COMPAT

config HAVE_KVM_IRQ_BYPASS
bool
+
+config KVM_SET_IRQ_INATOMIC
+ bool
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index b20a2d1bbf73..405c26742380 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -173,15 +173,6 @@ irqfd_deactivate(struct kvm_kernel_irqfd *irqfd)
queue_work(irqfd_cleanup_wq, &irqfd->shutdown);
}

-int __attribute__((weak)) kvm_arch_set_irq_inatomic(
- struct kvm_kernel_irq_routing_entry *irq,
- struct kvm *kvm, int irq_source_id,
- int level,
- bool line_status)
-{
- return -EWOULDBLOCK;
-}
-
static int
irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
{
--
1.8.3.1

2015-12-03 18:55:55

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

On Thu, 2015-12-03 at 10:22 -0800, Yunhong Jiang wrote:
> When assigning a VFIO device to a KVM guest with low latency requirement, it
> is better to handle the interrupt in the hard interrupt context, to reduce
> the context switch to/from the IRQ thread.
>
> Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi
> interrupt is changed to use request_threaded_irq(). The primary interrupt
> handler tries to set the guest interrupt atomically. If it fails to achieve
> it, a threaded interrupt handler will be invoked.
>
> The irq_bypass manager is extended for this purpose. The KVM eventfd will
> provide a irqbypass consumer to handle the interrupt at hard interrupt
> context. The producer will invoke the consumer's handler then.

Do you have any performance data? Thanks,

Alex

2015-12-03 22:45:22

by Yunhong Jiang

[permalink] [raw]
Subject: Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

On Thu, Dec 03, 2015 at 11:55:53AM -0700, Alex Williamson wrote:
> On Thu, 2015-12-03 at 10:22 -0800, Yunhong Jiang wrote:
> > When assigning a VFIO device to a KVM guest with low latency requirement, it
> > is better to handle the interrupt in the hard interrupt context, to reduce
> > the context switch to/from the IRQ thread.
> >
> > Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi
> > interrupt is changed to use request_threaded_irq(). The primary interrupt
> > handler tries to set the guest interrupt atomically. If it fails to achieve
> > it, a threaded interrupt handler will be invoked.
> >
> > The irq_bypass manager is extended for this purpose. The KVM eventfd will
> > provide a irqbypass consumer to handle the interrupt at hard interrupt
> > context. The producer will invoke the consumer's handler then.
>
> Do you have any performance data? Thanks,

Sorry, I should include the data on the commit messages.

The test:
Launch a VM with a FPGA device, which triggers an interrpt every 1ms. The VM
is launched on a isolated CPU with NONHZ_FULL enabled.
Two data are collected. On the guest, a special program will check the
latency from the time the interrupt been triggered to the time the interrupt
received by guest IRS. On the host, I use the perf to collect the perf data.

The performance Data with the patch:

Host perf data:
[root@otcnfv02 test-tools]# perf kvm stat record -C 34 sleep 10
[root@otcnfv02 test-tools]# perf kvm stat report
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time
Avg time
EXTERNAL_INTERRUPT 9997 98.55% 99.31% 1.73us 17.07us
2.09us ( +- 0.21% )
PAUSE_INSTRUCTION 127 1.25% 0.51% 0.69us 1.20us
0.84us ( +- 1.34% )
MSR_WRITE 20 0.20% 0.18% 1.62us 3.21us
1.93us ( +- 3.95% )

Guest data:
[nfv@rt-test-1 ~]$ ./run-int-test.sh
Latency is Min: 3.74us Max: 20.08us Avg: 4.49us
No of interrupts = 74995

The performance data without the patch:
Host perf data:
[root@otcnfv02 test-tools]# perf kvm stat record -C 34 sleep 10
[root@otcnfv02 test-tools]# perf kvm stat report
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time
Avg time
PAUSE_INSTRUCTION 141136 87.74% 50.39% 0.69us 8.51us
0.77us ( +- 0.07% )
EXTERNAL_INTERRUPT 19701 12.25% 49.59% 2.31us 15.93us
5.46us ( +- 0.12% )
MSR_WRITE 24 0.01% 0.02% 1.51us 2.22us
1.91us ( +- 1.91% )

Notice:
The EXTERNAL_INTERRUPT VMExit is different w/ the patch (9997) and w/o the
patch (19701). It is because with threaded IRQ, the NOHZ_FULL it not working
because two threads on the pCPU, thus we have both the FPGA device interrupt
and the tick timer interrupt. After calculation, the average time for the
FPGA device interrupt is 4.72 us.

Guest data:
[nfv@rt-test-1 ~]$ ./run-int-test.sh
Latency is Min: 6.70us Max: 50.38us Avg: 7.44us
No of interrupts = 42639

Thanks
--jyh

>
> Alex
>

2015-12-04 00:35:25

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: Add the irq handling consumer

[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
Hi Yunhong,

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.4-rc3 next-20151202]

url: https://github.com/0day-ci/linux/commits/Yunhong-Jiang/KVM-Extract-the-irqfd_wakeup_pollin-irqfd_wakeup_pollup/20151204-024034
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: arm64-allmodconfig (attached as .config)
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64

All errors (new ones prefixed by >>):

arch/arm64/kvm/built-in.o: In function `kvm_irqfd_assign':
>> :(.text+0x10a70): undefined reference to `irq_bypass_register_consumer'
arch/arm64/kvm/built-in.o: In function `irqfd_shutdown':
>> :(.text+0x10cd4): undefined reference to `irq_bypass_unregister_consumer'

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.16 kB)
.config.gz (45.53 kB)
Download all attachments

2015-12-16 17:56:52

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

Alex,

can you take a look at the extension to the irq bypass interface in
patch 2? I'm not sure I understand what is the case where you have
multiple consumers for the same token.

Paolo

On 03/12/2015 19:22, Yunhong Jiang wrote:
> When assigning a VFIO device to a KVM guest with low latency requirement, it
> is better to handle the interrupt in the hard interrupt context, to reduce
> the context switch to/from the IRQ thread.
>
> Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi
> interrupt is changed to use request_threaded_irq(). The primary interrupt
> handler tries to set the guest interrupt atomically. If it fails to achieve
> it, a threaded interrupt handler will be invoked.
>
> The irq_bypass manager is extended for this purpose. The KVM eventfd will
> provide a irqbypass consumer to handle the interrupt at hard interrupt
> context. The producer will invoke the consumer's handler then.
>
> Yunhong Jiang (5):
> Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup
> Support runtime irq_bypass consumer
> Support threaded interrupt handling on VFIO
> Add the irq handling consumer
> Expose x86 kvm_arch_set_irq_inatomic()
>
> arch/x86/kvm/Kconfig | 1 +
> drivers/vfio/pci/vfio_pci_intrs.c | 39 ++++++++++--
> include/linux/irqbypass.h | 8 +++
> include/linux/kvm_host.h | 19 +++++-
> include/linux/kvm_irqfd.h | 1 +
> virt/kvm/Kconfig | 3 +
> virt/kvm/eventfd.c | 131 ++++++++++++++++++++++++++------------
> virt/lib/irqbypass.c | 82 ++++++++++++++++++------
> 8 files changed, 214 insertions(+), 70 deletions(-)
>

2015-12-16 19:15:25

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

On Wed, 2015-12-16 at 18:56 +0100, Paolo Bonzini wrote:
> Alex,
>
> can you take a look at the extension to the irq bypass interface in
> patch 2?  I'm not sure I understand what is the case where you have
> multiple consumers for the same token.

The consumers would be, for instance, Intel PI + the threaded handler
added in this series.  These run independently, the PI bypass simply
makes the interrupt disappear from the host when it catches it, but if
the vCPU isn't running in the right place at the time of the interrupt,
it gets delivered to the host, in which case the secondary consumer
implementing handle_irq() provides a lower latency injection than the
eventfd path.  If PI isn't supported, only this latter consumer is
registered.

On the surface it seems like a reasonable solution, though having
multiple consumers implementing handle_irq() seems problematic.  Do we
get multiple injections if we call them all?  Should we have some way
to prioritize one handler versus another?  Perhaps KVM should have a
single unified consumer that can provide that sort of logic, though we
still need the srcu code added here to protect against registration and
irq_handler() races.  Thanks,

Alex

> On 03/12/2015 19:22, Yunhong Jiang wrote:
> > When assigning a VFIO device to a KVM guest with low latency
> > requirement, it  
> > is better to handle the interrupt in the hard interrupt context, to
> > reduce
> > the context switch to/from the IRQ thread.
> >
> > Based on discussion on https://lkml.org/lkml/2015/10/26/764, the
> > VFIO msi
> > interrupt is changed to use request_threaded_irq(). The primary
> > interrupt
> > handler tries to set the guest interrupt atomically. If it fails to
> > achieve
> > it, a threaded interrupt handler will be invoked.
> >
> > The irq_bypass manager is extended for this purpose. The KVM
> > eventfd will
> > provide a irqbypass consumer to handle the interrupt at hard
> > interrupt
> > context. The producer will invoke the consumer's handler then.
> >
> > Yunhong Jiang (5):
> >   Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup
> >   Support runtime irq_bypass consumer
> >   Support threaded interrupt handling on VFIO
> >   Add the irq handling consumer
> >   Expose x86 kvm_arch_set_irq_inatomic()
> >
> >  arch/x86/kvm/Kconfig              |   1 +
> >  drivers/vfio/pci/vfio_pci_intrs.c |  39 ++++++++++--
> >  include/linux/irqbypass.h         |   8 +++
> >  include/linux/kvm_host.h          |  19 +++++-
> >  include/linux/kvm_irqfd.h         |   1 +
> >  virt/kvm/Kconfig                  |   3 +
> >  virt/kvm/eventfd.c                | 131
> > ++++++++++++++++++++++++++------------
> >  virt/lib/irqbypass.c              |  82 ++++++++++++++++++------
> >  8 files changed, 214 insertions(+), 70 deletions(-)
> >

2015-12-16 19:48:58

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 2/5] VIRT: Support runtime irq_bypass consumer

On Thu, 2015-12-03 at 10:22 -0800, Yunhong Jiang wrote:
> Extend the irq_bypass manager to support runtime consumers. A runtime
> irq_bypass consumer can handle interrupt when an interrupt triggered. A
> runtime consumer has it's handle_irq() function set and passing a
> irq_context for the irq handling.
>
> A producer keep a link for the runtime consumers, so that it can invoke
> each consumer's handle_irq() when irq invoked.
>
> Currently the irq_bypass manager has several code path assuming there is
> only one consumer/producer pair for each token. For example, when
> register the producer, it exits the loop after finding one match
> consumer.  This is updated to support both static consumer (like for
> Posted Interrupt consumer) and runtime consumer.
>
> Signed-off-by: Yunhong Jiang <[email protected]>
> ---
>  include/linux/irqbypass.h |  8 +++++
>  virt/lib/irqbypass.c      | 82 +++++++++++++++++++++++++++++++++++------------
>  2 files changed, 69 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> index 1551b5b2f4c2..d5bec0c7be3a 100644
> --- a/include/linux/irqbypass.h
> +++ b/include/linux/irqbypass.h
> @@ -12,6 +12,7 @@
>  #define IRQBYPASS_H
>  
>  #include
> +#include
>  
>  struct irq_bypass_consumer;
>  
> @@ -47,6 +48,9 @@ struct irq_bypass_consumer;
>   */
>  struct irq_bypass_producer {
>   struct list_head node;
> + /* Update side is synchronized by the lock on irqbypass.c */
> + struct srcu_struct srcu;
> + struct list_head consumers;
>   void *token;
>   int irq;
>   int (*add_consumer)(struct irq_bypass_producer *,

Documentation?

> @@ -61,6 +65,7 @@ struct irq_bypass_producer {
>   * struct irq_bypass_consumer - IRQ bypass consumer definition
>   * @node: IRQ bypass manager private list management
>   * @token: opaque token to match between producer and consumer
> + * @sibling: consumers with same token list management
>   * @add_producer: Connect the IRQ consumer to an IRQ producer
>   * @del_producer: Disconnect the IRQ consumer from an IRQ producer
>   * @stop: Perform any quiesce operations necessary prior to add/del (optional)

What about @handle_irq and @irq_context?

> @@ -73,6 +78,7 @@ struct irq_bypass_producer {
>   */
>  struct irq_bypass_consumer {
>   struct list_head node;
> + struct list_head sibling;
>   void *token;
>   int (*add_producer)(struct irq_bypass_consumer *,
>       struct irq_bypass_producer *);
> @@ -80,6 +86,8 @@ struct irq_bypass_consumer {
>        struct irq_bypass_producer *);
>   void (*stop)(struct irq_bypass_consumer *);
>   void (*start)(struct irq_bypass_consumer *);
> + int (*handle_irq)(void *arg);

If we called this with a pointer to the consumer, like the other
functions, the consumer could embed arg (irq_context) into their own
structure, or in this case, do a container_of and avoid storing the
irqfd pointer entirely.

> + void *irq_context;
>  };
>  
>  int irq_bypass_register_producer(struct irq_bypass_producer *);

2015-12-16 19:49:11

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 3/5] VFIO: Support threaded interrupt handling on VFIO

On Thu, 2015-12-03 at 10:22 -0800, Yunhong Jiang wrote:
> For VFIO device with MSI interrupt type, it's possible to handle the
> interrupt on hard interrupt context without invoking the interrupt
> thread. Handling the interrupt on hard interrupt context reduce the
> interrupt latency.
>
> Signed-off-by: Yunhong Jiang <[email protected]>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c | 39 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 3b3ba15558b7..108d335c5656 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -236,12 +236,35 @@ static void vfio_intx_disable(struct vfio_pci_device *vdev)
>   kfree(vdev->ctx);
>  }
>  
> +static irqreturn_t vfio_msihandler(int irq, void *arg)
> +{
> + struct vfio_pci_irq_ctx *ctx = arg;
> + struct irq_bypass_producer *producer = &ctx->producer;
> + struct irq_bypass_consumer *consumer;
> + int ret = IRQ_HANDLED, idx;
> +
> + idx = srcu_read_lock(&producer->srcu);
> +
> + list_for_each_entry_rcu(consumer, &producer->consumers, sibling) {
> + /*
> +  * Invoke the thread handler if any consumer would block, but
> +  * finish all consumes.
> +  */
> + if (consumer->handle_irq(consumer->irq_context) == -EWOULDBLOCK)
> + ret = IRQ_WAKE_THREAD;
> + continue;
> + }
> +
> + srcu_read_unlock(&producer->srcu, idx);


There should be an irq bypass manager interface to abstract this.

2015-12-16 21:55:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device



On 16/12/2015 20:15, Alex Williamson wrote:
> The consumers would be, for instance, Intel PI + the threaded handler
> added in this series. These run independently, the PI bypass simply
> makes the interrupt disappear from the host when it catches it, but if
> the vCPU isn't running in the right place at the time of the interrupt,
> it gets delivered to the host, in which case the secondary consumer
> implementing handle_irq() provides a lower latency injection than the
> eventfd path. If PI isn't supported, only this latter consumer is
> registered.

I would implement the two in a single consumer, knowing that only one of
the two parts would effectively run. But because of the possibility of
multiple consumers implementing handle_irq(), I am not sure if this is
feasible.

> On the surface it seems like a reasonable solution, though having
> multiple consumers implementing handle_irq() seems problematic. Do we
> get multiple injections if we call them all?

Indeed.

> Should we have some way
> to prioritize one handler versus another? Perhaps KVM should have a
> single unified consumer that can provide that sort of logic, though we
> still need the srcu code added here to protect against registration and
> irq_handler() races. Thanks,

I'm happy to see that we have the same doubts. :)

Paolo