2009-10-23 02:38:57

by Gregory Haskins

[permalink] [raw]
Subject: [KVM PATCH v2 0/2] irqfd enhancements

(Applies to kvm.git/master:11b06403)

The following patches are cleanups/enhancements for IRQFD now that
we have lockless interrupt injection. For more details, please see
the patch headers.

These patches pass checkpatch, and are fully tested. Please consider
for merging. They are an enhancement only, so there is no urgency
to push to mainline until a suitable merge window presents itself.

Kind Regards,
-Greg

[ Change log:

v2:
*) dropped original cleanup which relied on the user registering
MSI based GSIs or we may crash at runtime. Instead, we now
check at registration whether the GSI supports lockless
operation and dynamically adapt to either the original
deferred path for lock-based injections, or direct for lockless.

v1:
*) original release
]

---

Gregory Haskins (2):
KVM: Directly inject interrupts if they support lockless operation
KVM: export lockless GSI attribute


include/linux/kvm_host.h | 2 ++
virt/kvm/eventfd.c | 31 +++++++++++++++++++++++++++----
virt/kvm/irq_comm.c | 19 +++++++++++++++++++
3 files changed, 48 insertions(+), 4 deletions(-)

--
Signature


2009-10-23 02:39:03

by Gregory Haskins

[permalink] [raw]
Subject: [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

Certain GSI's support lockless injecton, but we have no way to detect
which ones at the GSI level. Knowledge of this attribute will be
useful later in the series so that we can optimize irqfd injection
paths for cases where we know the code will not sleep. Therefore,
we provide an API to query a specific GSI.

Signed-off-by: Gregory Haskins <[email protected]>
---

include/linux/kvm_host.h | 2 ++
virt/kvm/irq_comm.c | 19 +++++++++++++++++++
2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bd5a616..93393a4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -119,6 +119,7 @@ struct kvm_memory_slot {
struct kvm_kernel_irq_routing_entry {
u32 gsi;
u32 type;
+ bool lockless;
int (*set)(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id, int level);
union {
@@ -417,6 +418,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
unsigned long *deliver_bitmask);
#endif
int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_irq_check_lockless(struct kvm *kvm, u32 irq);
void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
void kvm_register_irq_ack_notifier(struct kvm *kvm,
struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 00c68d2..04f0134 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -174,6 +174,23 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
return ret;
}

+int kvm_irq_check_lockless(struct kvm *kvm, u32 irq)
+{
+ struct kvm_kernel_irq_routing_entry *e;
+ struct kvm_irq_routing_table *irq_rt;
+ struct hlist_node *n;
+ int ret = -ENOENT;
+
+ rcu_read_lock();
+ irq_rt = rcu_dereference(kvm->irq_routing);
+ if (irq < irq_rt->nr_rt_entries)
+ hlist_for_each_entry(e, n, &irq_rt->map[irq], link)
+ ret = e->lockless ? 1 : 0;
+ rcu_read_unlock();
+
+ return ret;
+}
+
void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
{
struct kvm_irq_ack_notifier *kian;
@@ -314,6 +331,7 @@ static int setup_routing_entry(struct kvm_irq_routing_table *rt,

e->gsi = ue->gsi;
e->type = ue->type;
+ e->lockless = false;
switch (ue->type) {
case KVM_IRQ_ROUTING_IRQCHIP:
delta = 0;
@@ -342,6 +360,7 @@ static int setup_routing_entry(struct kvm_irq_routing_table *rt,
e->msi.address_lo = ue->u.msi.address_lo;
e->msi.address_hi = ue->u.msi.address_hi;
e->msi.data = ue->u.msi.data;
+ e->lockless = true;
break;
default:
goto out;

2009-10-23 02:39:21

by Gregory Haskins

[permalink] [raw]
Subject: [KVM PATCH v2 2/2] KVM: Directly inject interrupts if they support lockless operation

IRQFD currently uses a deferred workqueue item to execute the injection
operation. It was originally designed this way because kvm_set_irq()
required the caller to hold the irq_lock mutex, and the eventfd callback
is invoked from within a non-preemptible critical section.

With the advent of lockless injection support for certain GSIs, the
deferment mechanism is no longer technically needed in all cases.
Since context switching to the workqueue is a source of interrupt
latency, lets switch to a direct method whenever possible. Fortunately
for us, the most common use of irqfd (MSI-based GSIs) readily support
lockless injection.

Signed-off-by: Gregory Haskins <[email protected]>
---

virt/kvm/eventfd.c | 31 +++++++++++++++++++++++++++----
1 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 30f70fd..e6cc958 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -51,20 +51,34 @@ struct _irqfd {
wait_queue_t wait;
struct work_struct inject;
struct work_struct shutdown;
+ void (*execute)(struct _irqfd *);
};

static struct workqueue_struct *irqfd_cleanup_wq;

static void
-irqfd_inject(struct work_struct *work)
+irqfd_inject(struct _irqfd *irqfd)
{
- struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
struct kvm *kvm = irqfd->kvm;

kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
}

+static void
+irqfd_deferred_inject(struct work_struct *work)
+{
+ struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+
+ irqfd_inject(irqfd);
+}
+
+static void
+irqfd_schedule(struct _irqfd *irqfd)
+{
+ schedule_work(&irqfd->inject);
+}
+
/*
* Race-free decouple logic (ordering is critical)
*/
@@ -126,7 +140,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)

if (flags & POLLIN)
/* An event has been signaled, inject an interrupt */
- schedule_work(&irqfd->inject);
+ irqfd->execute(irqfd);

if (flags & POLLHUP) {
/* The eventfd is closing, detach from KVM */
@@ -179,7 +193,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
irqfd->kvm = kvm;
irqfd->gsi = gsi;
INIT_LIST_HEAD(&irqfd->list);
- INIT_WORK(&irqfd->inject, irqfd_inject);
+ INIT_WORK(&irqfd->inject, irqfd_deferred_inject);
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);

file = eventfd_fget(fd);
@@ -209,6 +223,15 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
list_add_tail(&irqfd->list, &kvm->irqfds.items);
spin_unlock_irq(&kvm->irqfds.lock);

+ ret = kvm_irq_check_lockless(kvm, gsi);
+ if (ret < 0)
+ goto fail;
+
+ if (ret)
+ irqfd->execute = &irqfd_inject;
+ else
+ irqfd->execute = &irqfd_schedule;
+
/*
* Check if there was an event already pending on the eventfd
* before we registered, and trigger it as if we didn't miss it.

2009-10-23 02:43:29

by Gregory Haskins

[permalink] [raw]
Subject: Re: [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

Gregory Haskins wrote:
> Certain GSI's support lockless injecton, but we have no way to detect
> which ones at the GSI level. Knowledge of this attribute will be
> useful later in the series so that we can optimize irqfd injection
> paths for cases where we know the code will not sleep. Therefore,
> we provide an API to query a specific GSI.
>
> Signed-off-by: Gregory Haskins <[email protected]>
> ---
>
> include/linux/kvm_host.h | 2 ++
> virt/kvm/irq_comm.c | 19 +++++++++++++++++++
> 2 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index bd5a616..93393a4 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -119,6 +119,7 @@ struct kvm_memory_slot {
> struct kvm_kernel_irq_routing_entry {
> u32 gsi;
> u32 type;
> + bool lockless;
> int (*set)(struct kvm_kernel_irq_routing_entry *e,
> struct kvm *kvm, int irq_source_id, int level);
> union {
> @@ -417,6 +418,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
> unsigned long *deliver_bitmask);
> #endif
> int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
> +int kvm_irq_check_lockless(struct kvm *kvm, u32 irq);
> void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
> void kvm_register_irq_ack_notifier(struct kvm *kvm,
> struct kvm_irq_ack_notifier *kian);
> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> index 00c68d2..04f0134 100644
> --- a/virt/kvm/irq_comm.c
> +++ b/virt/kvm/irq_comm.c
> @@ -174,6 +174,23 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
> return ret;
> }
>
> +int kvm_irq_check_lockless(struct kvm *kvm, u32 irq)
> +{
> + struct kvm_kernel_irq_routing_entry *e;
> + struct kvm_irq_routing_table *irq_rt;
> + struct hlist_node *n;
> + int ret = -ENOENT;
> +
> + rcu_read_lock();
> + irq_rt = rcu_dereference(kvm->irq_routing);
> + if (irq < irq_rt->nr_rt_entries)
> + hlist_for_each_entry(e, n, &irq_rt->map[irq], link)
> + ret = e->lockless ? 1 : 0;

Sigh... just noticed this as it hits the list. I should probably break
out of the loop here.

> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> {
> struct kvm_irq_ack_notifier *kian;
> @@ -314,6 +331,7 @@ static int setup_routing_entry(struct kvm_irq_routing_table *rt,
>
> e->gsi = ue->gsi;
> e->type = ue->type;
> + e->lockless = false;
> switch (ue->type) {
> case KVM_IRQ_ROUTING_IRQCHIP:
> delta = 0;
> @@ -342,6 +360,7 @@ static int setup_routing_entry(struct kvm_irq_routing_table *rt,
> e->msi.address_lo = ue->u.msi.address_lo;
> e->msi.address_hi = ue->u.msi.address_hi;
> e->msi.data = ue->u.msi.data;
> + e->lockless = true;
> break;
> default:
> goto out;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-10-25 14:30:23

by Avi Kivity

[permalink] [raw]
Subject: Re: [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

On 10/23/2009 04:38 AM, Gregory Haskins wrote:
> Certain GSI's support lockless injecton, but we have no way to detect
> which ones at the GSI level. Knowledge of this attribute will be
> useful later in the series so that we can optimize irqfd injection
> paths for cases where we know the code will not sleep. Therefore,
> we provide an API to query a specific GSI.
>
>

Instead of a lockless attribute, how about a ->set_atomic() method. For
msi this can be the same as ->set(), for non-msi it can be a function
that schedules the work (which will eventually call ->set()).

The benefit is that we make a decision only once, when preparing the
routing entry, and install that decision in the routing entry instead of
making it again and again later.

> +int kvm_irq_check_lockless(struct kvm *kvm, u32 irq)
>

bool kvm_irq_check_lockless(...)

> +{
> + struct kvm_kernel_irq_routing_entry *e;
> + struct kvm_irq_routing_table *irq_rt;
> + struct hlist_node *n;
> + int ret = -ENOENT;
> +
> + rcu_read_lock();
> + irq_rt = rcu_dereference(kvm->irq_routing);
> + if (irq< irq_rt->nr_rt_entries)
> + hlist_for_each_entry(e, n,&irq_rt->map[irq], link)
> + ret = e->lockless ? 1 : 0;
>

ret = e->lockless;

> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
>

--
error compiling committee.c: too many arguments to function

2009-10-26 13:25:14

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

Avi Kivity wrote:
> On 10/23/2009 04:38 AM, Gregory Haskins wrote:
>> Certain GSI's support lockless injecton, but we have no way to detect
>> which ones at the GSI level. Knowledge of this attribute will be
>> useful later in the series so that we can optimize irqfd injection
>> paths for cases where we know the code will not sleep. Therefore,
>> we provide an API to query a specific GSI.
>>
>>
>
> Instead of a lockless attribute, how about a ->set_atomic() method. For
> msi this can be the same as ->set(), for non-msi it can be a function
> that schedules the work (which will eventually call ->set()).
>
> The benefit is that we make a decision only once, when preparing the
> routing entry, and install that decision in the routing entry instead of
> making it again and again later.

Yeah, I like this idea. I think we can also get rid of the custom
workqueue if we do this as well, TBD.

>
>> +int kvm_irq_check_lockless(struct kvm *kvm, u32 irq)
>>
>
> bool kvm_irq_check_lockless(...)

We lose the ability to detect failure (such as ENOENT) if we do this,
but its moot if we move to the ->set_atomic() model, since this
attribute is no longer necessary and this patch can be dropped.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-10-26 15:38:21

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

Gregory Haskins wrote:
> Avi Kivity wrote:
>> On 10/23/2009 04:38 AM, Gregory Haskins wrote:
>>> Certain GSI's support lockless injecton, but we have no way to detect
>>> which ones at the GSI level. Knowledge of this attribute will be
>>> useful later in the series so that we can optimize irqfd injection
>>> paths for cases where we know the code will not sleep. Therefore,
>>> we provide an API to query a specific GSI.
>>>
>>>
>> Instead of a lockless attribute, how about a ->set_atomic() method. For
>> msi this can be the same as ->set(), for non-msi it can be a function
>> that schedules the work (which will eventually call ->set()).
>>
>> The benefit is that we make a decision only once, when preparing the
>> routing entry, and install that decision in the routing entry instead of
>> making it again and again later.
>
> Yeah, I like this idea. I think we can also get rid of the custom
> workqueue if we do this as well, TBD.

So I looked into this. It isn't straight forward because you need to
retain some kind of state across the deferment on a per-request basis
(not per-GSI). Today, this state is neatly tracked into the irqfd
object itself (e.g. it knows to toggle the GSI).

So while generalizing this perhaps makes sense at some point, especially
if irqfd-like interfaces get added, it probably doesn't make a ton of
sense to expend energy on it ATM. It is basically a generalization of
the irqfd deferrment code. Lets just wait until we have a user beyond
irqfd for now. Sound acceptable?

In the meantime, I found a bug in the irq_routing code, so I will submit
a v3 with this fix, as well as a few other things I improved in the v2
series.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-10-28 10:04:18

by Avi Kivity

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

On 10/26/2009 05:38 PM, Gregory Haskins wrote:
>>> Instead of a lockless attribute, how about a ->set_atomic() method. For
>>> msi this can be the same as ->set(), for non-msi it can be a function
>>> that schedules the work (which will eventually call ->set()).
>>>
>>> The benefit is that we make a decision only once, when preparing the
>>> routing entry, and install that decision in the routing entry instead of
>>> making it again and again later.
>>>
>> Yeah, I like this idea. I think we can also get rid of the custom
>> workqueue if we do this as well, TBD.
>>
> So I looked into this. It isn't straight forward because you need to
> retain some kind of state across the deferment on a per-request basis
> (not per-GSI). Today, this state is neatly tracked into the irqfd
> object itself (e.g. it knows to toggle the GSI).
>

Yes, and it also contains the work_struct.

What if we make the work_struct (and any additional state) part of the
set_atomic() argument list? Does it simplify things?

> So while generalizing this perhaps makes sense at some point, especially
> if irqfd-like interfaces get added, it probably doesn't make a ton of
> sense to expend energy on it ATM. It is basically a generalization of
> the irqfd deferrment code. Lets just wait until we have a user beyond
> irqfd for now. Sound acceptable?
>

I'll look at v3, but would really like to disentangle this.

> In the meantime, I found a bug in the irq_routing code, so I will submit
> a v3 with this fix, as well as a few other things I improved in the v2
> series.
>
>


--
error compiling committee.c: too many arguments to function

2009-10-28 13:20:00

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

Avi Kivity wrote:
> On 10/26/2009 05:38 PM, Gregory Haskins wrote:
>>>> Instead of a lockless attribute, how about a ->set_atomic() method.
>>>> For
>>>> msi this can be the same as ->set(), for non-msi it can be a function
>>>> that schedules the work (which will eventually call ->set()).
>>>>
>>>> The benefit is that we make a decision only once, when preparing the
>>>> routing entry, and install that decision in the routing entry
>>>> instead of
>>>> making it again and again later.
>>>>
>>> Yeah, I like this idea. I think we can also get rid of the custom
>>> workqueue if we do this as well, TBD.
>>>
>> So I looked into this. It isn't straight forward because you need to
>> retain some kind of state across the deferment on a per-request basis
>> (not per-GSI). Today, this state is neatly tracked into the irqfd
>> object itself (e.g. it knows to toggle the GSI).
>>
>
> Yes, and it also contains the work_struct.
>
> What if we make the work_struct (and any additional state) part of the
> set_atomic() argument list? Does it simplify things?

Hmmm, that might not, but we could do a kmalloc(GFP_ATOMIC) for such
parameters. Considering this is just a safety net, perhaps this would
work fine.

>
>> So while generalizing this perhaps makes sense at some point, especially
>> if irqfd-like interfaces get added, it probably doesn't make a ton of
>> sense to expend energy on it ATM. It is basically a generalization of
>> the irqfd deferrment code. Lets just wait until we have a user beyond
>> irqfd for now. Sound acceptable?
>>
>
> I'll look at v3, but would really like to disentangle this.

Ok, I will see what I can do. I need at least a v4 to get rid of the
dependency on the now defunct v3:1/3 patch per yesterdays discussion.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-10-28 13:28:05

by Avi Kivity

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

On 10/28/2009 03:19 PM, Gregory Haskins wrote:
>> Yes, and it also contains the work_struct.
>>
>> What if we make the work_struct (and any additional state) part of the
>> set_atomic() argument list? Does it simplify things?
>>
> Hmmm, that might not, but we could do a kmalloc(GFP_ATOMIC) for such
> parameters. Considering this is just a safety net, perhaps this would
> work fine.
>

Can't you simply pass the same work_struct from irqfd as we use now?

>>> So while generalizing this perhaps makes sense at some point, especially
>>> if irqfd-like interfaces get added, it probably doesn't make a ton of
>>> sense to expend energy on it ATM. It is basically a generalization of
>>> the irqfd deferrment code. Lets just wait until we have a user beyond
>>> irqfd for now. Sound acceptable?
>>>
>>>
>> I'll look at v3, but would really like to disentangle this.
>>
> Ok, I will see what I can do. I need at least a v4 to get rid of the
> dependency on the now defunct v3:1/3 patch per yesterdays discussion.
>

There's another alternative - make ioapic and pic irq-safe by switching
irq locking to spinlocks and using spin_lock_irqsave().

I've long opposed this since the ioapic loops on all vcpus when
injecting some irqs and this will increase irqoff times with large
guests. But we don't have large guests now, and we need irq-safe
injection in three places now:

- irqfd
- pit - we now signal vcpu0 to handle the injection, but this has its
problems
- device assignment

so it may be better to have irq-safe injection, and deal with the loop
later (would be good to have an idea how exactly).

--
error compiling committee.c: too many arguments to function

2009-10-28 13:30:49

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [KVM PATCH v2 1/2] KVM: export lockless GSI attribute

Avi Kivity wrote:
> On 10/28/2009 03:19 PM, Gregory Haskins wrote:
>>> Yes, and it also contains the work_struct.
>>>
>>> What if we make the work_struct (and any additional state) part of the
>>> set_atomic() argument list? Does it simplify things?
>>>
>> Hmmm, that might not, but we could do a kmalloc(GFP_ATOMIC) for such
>> parameters. Considering this is just a safety net, perhaps this would
>> work fine.
>>
>
> Can't you simply pass the same work_struct from irqfd as we use now?

Well, yes, of course, but I am not sure that buys us much in terms of
generalizing the code. Unless I am misunderstanding, that would still
leave the impetus of the init/sync/cleanup to the irqfd code, at which
point we might as well just leave it entirely in irqfd anyway. Or am I
misunderstanding you?

>
>>>> So while generalizing this perhaps makes sense at some point,
>>>> especially
>>>> if irqfd-like interfaces get added, it probably doesn't make a ton of
>>>> sense to expend energy on it ATM. It is basically a generalization of
>>>> the irqfd deferrment code. Lets just wait until we have a user beyond
>>>> irqfd for now. Sound acceptable?
>>>>
>>>>
>>> I'll look at v3, but would really like to disentangle this.
>>>
>> Ok, I will see what I can do. I need at least a v4 to get rid of the
>> dependency on the now defunct v3:1/3 patch per yesterdays discussion.
>>
>
> There's another alternative - make ioapic and pic irq-safe by switching
> irq locking to spinlocks and using spin_lock_irqsave().
>
> I've long opposed this since the ioapic loops on all vcpus when
> injecting some irqs and this will increase irqoff times with large
> guests. But we don't have large guests now, and we need irq-safe
> injection in three places now:
>
> - irqfd
> - pit - we now signal vcpu0 to handle the injection, but this has its
> problems
> - device assignment
>
> so it may be better to have irq-safe injection, and deal with the loop
> later (would be good to have an idea how exactly).

Ok, perhaps I should just hold off on this series for now, then. I can
submit the original "assume atomic safe" once the path is fully lockless.

-Greg

>



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature