2009-10-21 14:34:50

by Gregory Haskins

[permalink] [raw]
Subject: [KVM PATCH 0/2] irqfd enhancements

(Applies to kvm.git/master:11b06403)

The following patches are cleanups/enhancements for IRQFD now that
we have lockless interrupt injection. For more details, please see
the patch headers.

These patches pass checkpatch, and are fully tested. Please consider
for merging. They are an enhancement only, so there is no urgency
to push to mainline until a suitable merge window presents itself.

Kind Regards,
-Greg

---

Gregory Haskins (2):
KVM: Remove unecessary irqfd-cleanup-wq
KVM: Directly inject interrupts via irqfd


virt/kvm/eventfd.c | 45 ++++++---------------------------------------
1 files changed, 6 insertions(+), 39 deletions(-)

--
Signature


2009-10-21 14:35:06

by Gregory Haskins

[permalink] [raw]
Subject: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

IRQFD currently uses a deferred workqueue item to execute the injection
operation. It was originally designed this way because kvm_set_irq()
required the caller to hold the irq_lock mutex, and the eventfd callback
is invoked from within a non-preemptible critical section.

With the advent of lockless injection support in kvm_set_irq, the deferment
mechanism is no longer technically needed. Since context switching to the
workqueue is a source of interrupt latency, lets switch to a direct
method.

Signed-off-by: Gregory Haskins <[email protected]>
---

virt/kvm/eventfd.c | 15 +++------------
1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 30f70fd..1a529d4 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -49,16 +49,14 @@ struct _irqfd {
poll_table pt;
wait_queue_head_t *wqh;
wait_queue_t wait;
- struct work_struct inject;
struct work_struct shutdown;
};

static struct workqueue_struct *irqfd_cleanup_wq;

static void
-irqfd_inject(struct work_struct *work)
+irqfd_inject(struct _irqfd *irqfd)
{
- struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
struct kvm *kvm = irqfd->kvm;

kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
@@ -80,12 +78,6 @@ irqfd_shutdown(struct work_struct *work)
remove_wait_queue(irqfd->wqh, &irqfd->wait);

/*
- * We know no new events will be scheduled at this point, so block
- * until all previously outstanding events have completed
- */
- flush_work(&irqfd->inject);
-
- /*
* It is now safe to release the object's resources
*/
eventfd_ctx_put(irqfd->eventfd);
@@ -126,7 +118,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)

if (flags & POLLIN)
/* An event has been signaled, inject an interrupt */
- schedule_work(&irqfd->inject);
+ irqfd_inject(irqfd);

if (flags & POLLHUP) {
/* The eventfd is closing, detach from KVM */
@@ -179,7 +171,6 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
irqfd->kvm = kvm;
irqfd->gsi = gsi;
INIT_LIST_HEAD(&irqfd->list);
- INIT_WORK(&irqfd->inject, irqfd_inject);
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);

file = eventfd_fget(fd);
@@ -214,7 +205,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
* before we registered, and trigger it as if we didn't miss it.
*/
if (events & POLLIN)
- schedule_work(&irqfd->inject);
+ irqfd_inject(irqfd);

/*
* do not drop the file until the irqfd is fully initialized, otherwise

2009-10-21 14:38:33

by Gregory Haskins

[permalink] [raw]
Subject: [KVM PATCH 2/2] KVM: Remove unecessary irqfd-cleanup-wq

We originally created the irqfd-cleanup-wq so that we could safely
implement a shutdown that blocked on outstanding injection-requests
that may have been in flight. We couldn't reuse something like kevent
to shutdown since the injection path was also using kevent and that may
have caused a deadlock if we tried.

Since the injection path is now no longer utilizing a work-item, it is
no longer necessary to maintain a separate cleanup WQ. The standard
kevent queues should be sufficient, and thus we can eliminate an extra
kthread from the system.

Signed-off-by: Gregory Haskins <[email protected]>
---

virt/kvm/eventfd.c | 30 +++---------------------------
1 files changed, 3 insertions(+), 27 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 1a529d4..fb698f4 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -52,8 +52,6 @@ struct _irqfd {
struct work_struct shutdown;
};

-static struct workqueue_struct *irqfd_cleanup_wq;
-
static void
irqfd_inject(struct _irqfd *irqfd)
{
@@ -104,7 +102,7 @@ irqfd_deactivate(struct _irqfd *irqfd)

list_del_init(&irqfd->list);

- queue_work(irqfd_cleanup_wq, &irqfd->shutdown);
+ schedule_work(&irqfd->shutdown);
}

/*
@@ -262,7 +260,7 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
* so that we guarantee there will not be any more interrupts on this
* gsi once this deassign function returns.
*/
- flush_workqueue(irqfd_cleanup_wq);
+ flush_scheduled_work();

return 0;
}
@@ -296,33 +294,11 @@ kvm_irqfd_release(struct kvm *kvm)
* Block until we know all outstanding shutdown jobs have completed
* since we do not take a kvm* reference.
*/
- flush_workqueue(irqfd_cleanup_wq);
+ flush_scheduled_work();

}

/*
- * create a host-wide workqueue for issuing deferred shutdown requests
- * aggregated from all vm* instances. We need our own isolated single-thread
- * queue to prevent deadlock against flushing the normal work-queue.
- */
-static int __init irqfd_module_init(void)
-{
- irqfd_cleanup_wq = create_singlethread_workqueue("kvm-irqfd-cleanup");
- if (!irqfd_cleanup_wq)
- return -ENOMEM;
-
- return 0;
-}
-
-static void __exit irqfd_module_exit(void)
-{
- destroy_workqueue(irqfd_cleanup_wq);
-}
-
-module_init(irqfd_module_init);
-module_exit(irqfd_module_exit);
-
-/*
* --------------------------------------------------------------------
* ioeventfd: translate a PIO/MMIO memory write to an eventfd signal.
*

2009-10-21 15:26:39

by Gleb Natapov

[permalink] [raw]
Subject: Re: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote:
> IRQFD currently uses a deferred workqueue item to execute the injection
> operation. It was originally designed this way because kvm_set_irq()
> required the caller to hold the irq_lock mutex, and the eventfd callback
> is invoked from within a non-preemptible critical section.
>
> With the advent of lockless injection support in kvm_set_irq, the deferment
> mechanism is no longer technically needed. Since context switching to the
> workqueue is a source of interrupt latency, lets switch to a direct
> method.
>
kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes.

> Signed-off-by: Gregory Haskins <[email protected]>
> ---
>
> virt/kvm/eventfd.c | 15 +++------------
> 1 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 30f70fd..1a529d4 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -49,16 +49,14 @@ struct _irqfd {
> poll_table pt;
> wait_queue_head_t *wqh;
> wait_queue_t wait;
> - struct work_struct inject;
> struct work_struct shutdown;
> };
>
> static struct workqueue_struct *irqfd_cleanup_wq;
>
> static void
> -irqfd_inject(struct work_struct *work)
> +irqfd_inject(struct _irqfd *irqfd)
> {
> - struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> struct kvm *kvm = irqfd->kvm;
>
> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> @@ -80,12 +78,6 @@ irqfd_shutdown(struct work_struct *work)
> remove_wait_queue(irqfd->wqh, &irqfd->wait);
>
> /*
> - * We know no new events will be scheduled at this point, so block
> - * until all previously outstanding events have completed
> - */
> - flush_work(&irqfd->inject);
> -
> - /*
> * It is now safe to release the object's resources
> */
> eventfd_ctx_put(irqfd->eventfd);
> @@ -126,7 +118,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>
> if (flags & POLLIN)
> /* An event has been signaled, inject an interrupt */
> - schedule_work(&irqfd->inject);
> + irqfd_inject(irqfd);
>
> if (flags & POLLHUP) {
> /* The eventfd is closing, detach from KVM */
> @@ -179,7 +171,6 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> irqfd->kvm = kvm;
> irqfd->gsi = gsi;
> INIT_LIST_HEAD(&irqfd->list);
> - INIT_WORK(&irqfd->inject, irqfd_inject);
> INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
>
> file = eventfd_fget(fd);
> @@ -214,7 +205,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> * before we registered, and trigger it as if we didn't miss it.
> */
> if (events & POLLIN)
> - schedule_work(&irqfd->inject);
> + irqfd_inject(irqfd);
>
> /*
> * do not drop the file until the irqfd is fully initialized, otherwise
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Gleb.

2009-10-21 15:34:52

by Gregory Haskins

[permalink] [raw]
Subject: Re: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

Gleb Natapov wrote:
> On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote:
>> IRQFD currently uses a deferred workqueue item to execute the injection
>> operation. It was originally designed this way because kvm_set_irq()
>> required the caller to hold the irq_lock mutex, and the eventfd callback
>> is invoked from within a non-preemptible critical section.
>>
>> With the advent of lockless injection support in kvm_set_irq, the deferment
>> mechanism is no longer technically needed. Since context switching to the
>> workqueue is a source of interrupt latency, lets switch to a direct
>> method.
>>
> kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes.

Right, but irqfd by design only works with MSI (or MSI like edge
triggers) anyway. Legacy-type injections follow a different path.

In any case, I didn't change the locking (you did ;). You recently
patched the irqfd code to remove the irq_lock, but we still had the
deferment mechanism in place to avoid the mutex_lock from within the
POLLIN callback. Since the mutex_lock is now no longer acquired in this
path, the deferment technique is not needed either. Its only adding
overhead for no purpose. So I am simply cleaning that up to improve
interrupt performance.

HTH,
-Greg



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-10-21 15:36:52

by Gleb Natapov

[permalink] [raw]
Subject: Re: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

On Wed, Oct 21, 2009 at 11:34:51AM -0400, Gregory Haskins wrote:
> Gleb Natapov wrote:
> > On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote:
> >> IRQFD currently uses a deferred workqueue item to execute the injection
> >> operation. It was originally designed this way because kvm_set_irq()
> >> required the caller to hold the irq_lock mutex, and the eventfd callback
> >> is invoked from within a non-preemptible critical section.
> >>
> >> With the advent of lockless injection support in kvm_set_irq, the deferment
> >> mechanism is no longer technically needed. Since context switching to the
> >> workqueue is a source of interrupt latency, lets switch to a direct
> >> method.
> >>
> > kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes.
>
> Right, but irqfd by design only works with MSI (or MSI like edge
> triggers) anyway. Legacy-type injections follow a different path.
>
Ah, If this the case and it will stay that way then the change looks OK
to me.

> In any case, I didn't change the locking (you did ;). You recently
> patched the irqfd code to remove the irq_lock, but we still had the
> deferment mechanism in place to avoid the mutex_lock from within the
> POLLIN callback. Since the mutex_lock is now no longer acquired in this
> path, the deferment technique is not needed either. Its only adding
> overhead for no purpose. So I am simply cleaning that up to improve
> interrupt performance.
>
> HTH,
> -Greg
>
>



--
Gleb.

2009-10-21 15:42:24

by Gregory Haskins

[permalink] [raw]
Subject: Re: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

Gleb Natapov wrote:
> On Wed, Oct 21, 2009 at 11:34:51AM -0400, Gregory Haskins wrote:
>> Gleb Natapov wrote:
>>> On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote:
>>>> IRQFD currently uses a deferred workqueue item to execute the injection
>>>> operation. It was originally designed this way because kvm_set_irq()
>>>> required the caller to hold the irq_lock mutex, and the eventfd callback
>>>> is invoked from within a non-preemptible critical section.
>>>>
>>>> With the advent of lockless injection support in kvm_set_irq, the deferment
>>>> mechanism is no longer technically needed. Since context switching to the
>>>> workqueue is a source of interrupt latency, lets switch to a direct
>>>> method.
>>>>
>>> kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes.
>> Right, but irqfd by design only works with MSI (or MSI like edge
>> triggers) anyway. Legacy-type injections follow a different path.
>>
> Ah, If this the case and it will stay that way then the change looks OK
> to me.


I believe Avi, Michael, et. al. were in agreement with me on that design
choice. I believe the reason is that there is no good way to do EOI/ACK
feedback within the constraints of an eventfd pipe which would be
required for the legacy pin-type interrupts. Therefore, we won't even
bother trying. High-performance subsystems will use irqfd/msi, and
legacy emulation can use the existing injection code (which includes the
necessary feedback for ack/eoi).

To that point, perhaps it should be better documented. If we need a v2,
I will add a comment.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-10-22 15:07:58

by Avi Kivity

[permalink] [raw]
Subject: Re: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

On 10/21/2009 05:42 PM, Gregory Haskins wrote:
> I believe Avi, Michael, et. al. were in agreement with me on that design
> choice. I believe the reason is that there is no good way to do EOI/ACK
> feedback within the constraints of an eventfd pipe which would be
> required for the legacy pin-type interrupts. Therefore, we won't even
> bother trying. High-performance subsystems will use irqfd/msi, and
> legacy emulation can use the existing injection code (which includes the
> necessary feedback for ack/eoi).
>
>

Right. But we don't actually prevent anyone using non-msi with irqfd,
which can trigger the bad lock usage from irq context, with a nice boom
afterwards. So we need to either prevent it during registration, or to
gracefully handle it afterwards.

--
error compiling committee.c: too many arguments to function

2009-10-22 15:14:48

by Gregory Haskins

[permalink] [raw]
Subject: Re: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

Avi Kivity wrote:
> On 10/21/2009 05:42 PM, Gregory Haskins wrote:
>> I believe Avi, Michael, et. al. were in agreement with me on that design
>> choice. I believe the reason is that there is no good way to do EOI/ACK
>> feedback within the constraints of an eventfd pipe which would be
>> required for the legacy pin-type interrupts. Therefore, we won't even
>> bother trying. High-performance subsystems will use irqfd/msi, and
>> legacy emulation can use the existing injection code (which includes the
>> necessary feedback for ack/eoi).
>>
>>
>
> Right. But we don't actually prevent anyone using non-msi with irqfd,
> which can trigger the bad lock usage from irq context, with a nice boom
> afterwards. So we need to either prevent it during registration, or to
> gracefully handle it afterwards.
>

Yeah, I was thinking about that after I initially responded to Gleb.

I am thinking something along these lines:

Provide a function that lets you query a GSI for whether it supports
LOCKLESS or not. Then we can either do one of two things:

1) Check for the LOCKLESS attribute at irqfd registration, fail if not
present

2) Cache the LOCKLESS attribute in the irqfd structure, and either go
direct or defer to a workqueue depending on the flag.

Thoughts?
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-10-22 15:20:08

by Avi Kivity

[permalink] [raw]
Subject: Re: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

On 10/22/2009 05:14 PM, Gregory Haskins wrote:
> Yeah, I was thinking about that after I initially responded to Gleb.
>
> I am thinking something along these lines:
>
> Provide a function that lets you query a GSI for whether it supports
> LOCKLESS or not. Then we can either do one of two things:
>
> 1) Check for the LOCKLESS attribute at irqfd registration, fail if not
> present
>

This is the most practical path and leads to the smallest code. However
it has the deficiency of exposing internal implementation details to
userspace. In theory userspace could use msi and edge-triggered
pic/ioapic interrupts equally well, it shouldn't have to know that we
didn't bother to lockfree ioapic/pic.

> 2) Cache the LOCKLESS attribute in the irqfd structure, and either go
> direct or defer to a workqueue depending on the flag.
>

While this leads to larger code, it is more consistent.

--
error compiling committee.c: too many arguments to function

2009-10-22 15:33:15

by Gregory Haskins

[permalink] [raw]
Subject: Re: [KVM PATCH 1/2] KVM: Directly inject interrupts via irqfd

Avi Kivity wrote:
> On 10/22/2009 05:14 PM, Gregory Haskins wrote:
>> Yeah, I was thinking about that after I initially responded to Gleb.
>>
>> I am thinking something along these lines:
>>
>> Provide a function that lets you query a GSI for whether it supports
>> LOCKLESS or not. Then we can either do one of two things:
>>
>> 1) Check for the LOCKLESS attribute at irqfd registration, fail if not
>> present
>>
>
> This is the most practical path and leads to the smallest code. However
> it has the deficiency of exposing internal implementation details to
> userspace. In theory userspace could use msi and edge-triggered
> pic/ioapic interrupts equally well, it shouldn't have to know that we
> didn't bother to lockfree ioapic/pic.
>
>> 2) Cache the LOCKLESS attribute in the irqfd structure, and either go
>> direct or defer to a workqueue depending on the flag.
>>
>
> While this leads to larger code, it is more consistent.
>

Yeah, I think you are right. Consider these two patches retracted, and
I will rewrite it with this concept in place.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature