2010-01-13 17:15:31

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH 1/2] kvm: fix spurious interrupt with irqfd

kvm didn't clear irqfd counter on deassign, as a result we could get a
spurious interrupt when irqfd is assigned back. this leads to poor
performance and, in theory, guest crash.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---
virt/kvm/eventfd.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 62e4cd9..a9d3fc6 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -72,12 +72,13 @@ static void
irqfd_shutdown(struct work_struct *work)
{
struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
+ u64 cnt;

/*
* Synchronize with the wait-queue and unhook ourselves to prevent
* further events.
*/
- remove_wait_queue(irqfd->wqh, &irqfd->wait);
+ eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);

/*
* We know no new events will be scheduled at this point, so block
--
1.6.6.144.g5c3af


2010-01-19 13:25:36

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH 1/2] kvm: fix spurious interrupt with irqfd

Michael S. Tsirkin wrote:
> kvm didn't clear irqfd counter on deassign, as a result we could get a
> spurious interrupt when irqfd is assigned back. this leads to poor
> performance and, in theory, guest crash.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
> virt/kvm/eventfd.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 62e4cd9..a9d3fc6 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -72,12 +72,13 @@ static void
> irqfd_shutdown(struct work_struct *work)
> {
> struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
> + u64 cnt;
>
> /*
> * Synchronize with the wait-queue and unhook ourselves to prevent
> * further events.
> */
> - remove_wait_queue(irqfd->wqh, &irqfd->wait);
> + eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
>
> /*
> * We know no new events will be scheduled at this point, so block

For kvm-kmod, I'm fighting with compat support for
eventfd_ctx_remove_wait_queue. I basically have a solution for kernels
with CONFIG_KPROBES enabled (I need to look up unexported
__wake_up_locked[_key]), but there will also be target kernels that do
not have this. So there are three options for that case:

- Warn the user and fall back to the old racy approach
- (Somehow) disable KVM subsystems that use eventfd
- Refuse to start KVM

As far as I understood, irqfd is interesting for device assignment and
now also for vhost, right? What about ioeventfd? I just wonder how broad
the impact of a broken or non-existent eventfd subsystem for kvm-kmod
is. Any thoughts welcome.

Jan

PS: If anyone forgot why Avi handed over this job, you should now
remember why. :)

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2010-01-19 13:51:32

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 1/2] kvm: fix spurious interrupt with irqfd

On Tue, Jan 19, 2010 at 02:25:12PM +0100, Jan Kiszka wrote:
> Michael S. Tsirkin wrote:
> > kvm didn't clear irqfd counter on deassign, as a result we could get a
> > spurious interrupt when irqfd is assigned back. this leads to poor
> > performance and, in theory, guest crash.
> >
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > ---
> > virt/kvm/eventfd.c | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index 62e4cd9..a9d3fc6 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -72,12 +72,13 @@ static void
> > irqfd_shutdown(struct work_struct *work)
> > {
> > struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
> > + u64 cnt;
> >
> > /*
> > * Synchronize with the wait-queue and unhook ourselves to prevent
> > * further events.
> > */
> > - remove_wait_queue(irqfd->wqh, &irqfd->wait);
> > + eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
> >
> > /*
> > * We know no new events will be scheduled at this point, so block
>
> For kvm-kmod, I'm fighting with compat support for
> eventfd_ctx_remove_wait_queue. I basically have a solution for kernels
> with CONFIG_KPROBES enabled (I need to look up unexported
> __wake_up_locked[_key]), but there will also be target kernels that do
> not have this. So there are three options for that case:
>
> - Warn the user and fall back to the old racy approach
> - (Somehow) disable KVM subsystems that use eventfd
> - Refuse to start KVM
> As far as I understood, irqfd is interesting for device assignment and
> now also for vhost, right?

At the moment, only vhost.

> What about ioeventfd?

Same thing.

> I just wonder how broad
> the impact of a broken or non-existent eventfd subsystem for kvm-kmod
> is. Any thoughts welcome.

How do you handle kernels that don't export eventfd_ctx_fileget?

> Jan
>
> PS: If anyone forgot why Avi handed over this job, you should now
> remember why. :)

Heh, I did the same kind of thing for infiniband for
several years. It's hard to forget.

> --
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

2010-01-19 14:03:24

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 1/2] kvm: fix spurious interrupt with irqfd

On 01/19/2010 03:25 PM, Jan Kiszka wrote:
>
> For kvm-kmod, I'm fighting with compat support for
> eventfd_ctx_remove_wait_queue. I basically have a solution for kernels
> with CONFIG_KPROBES enabled (I need to look up unexported
> __wake_up_locked[_key]), but there will also be target kernels that do
> not have this. So there are three options for that case:
>
> - Warn the user and fall back to the old racy approach
> - (Somehow) disable KVM subsystems that use eventfd
> - Refuse to start KVM
>
> As far as I understood, irqfd is interesting for device assignment and
> now also for vhost, right? What about ioeventfd? I just wonder how broad
> the impact of a broken or non-existent eventfd subsystem for kvm-kmod
> is. Any thoughts welcome.
>
>

Since vhost is only a performance option (and there isn't a vhost-kmod)
and device assignment wants a new kernel anyway (and only applies to a
small subset of users), I think it's okay to drop it from kvm-kmod. It
should be sufficient to return 0 from KVM_CHECK_EXCEPTION and substitute
some stubs for the functions.

ioeventfd/irqfd are useful for inter-guest wakeups, but there isn't any
public code for that that I'm aware of.

> PS: If anyone forgot why Avi handed over this job, you should now
> remember why. :)
>

It's only going to get more difficult, I'm afraid. The list of old
kernels keeps growing and we're going to depend on core kernel
functionality more and more.

Luckily for me you accepted in time...

--
error compiling committee.c: too many arguments to function

2010-01-19 14:04:11

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH 1/2] kvm: fix spurious interrupt with irqfd

Michael S. Tsirkin wrote:
> On Tue, Jan 19, 2010 at 02:25:12PM +0100, Jan Kiszka wrote:
>> Michael S. Tsirkin wrote:
>>> kvm didn't clear irqfd counter on deassign, as a result we could get a
>>> spurious interrupt when irqfd is assigned back. this leads to poor
>>> performance and, in theory, guest crash.
>>>
>>> Signed-off-by: Michael S. Tsirkin <[email protected]>
>>> ---
>>> virt/kvm/eventfd.c | 3 ++-
>>> 1 files changed, 2 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>>> index 62e4cd9..a9d3fc6 100644
>>> --- a/virt/kvm/eventfd.c
>>> +++ b/virt/kvm/eventfd.c
>>> @@ -72,12 +72,13 @@ static void
>>> irqfd_shutdown(struct work_struct *work)
>>> {
>>> struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
>>> + u64 cnt;
>>>
>>> /*
>>> * Synchronize with the wait-queue and unhook ourselves to prevent
>>> * further events.
>>> */
>>> - remove_wait_queue(irqfd->wqh, &irqfd->wait);
>>> + eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
>>>
>>> /*
>>> * We know no new events will be scheduled at this point, so block
>> For kvm-kmod, I'm fighting with compat support for
>> eventfd_ctx_remove_wait_queue. I basically have a solution for kernels
>> with CONFIG_KPROBES enabled (I need to look up unexported
>> __wake_up_locked[_key]), but there will also be target kernels that do
>> not have this. So there are three options for that case:
>>
>> - Warn the user and fall back to the old racy approach
>> - (Somehow) disable KVM subsystems that use eventfd
>> - Refuse to start KVM
>> As far as I understood, irqfd is interesting for device assignment and
>> now also for vhost, right?
>
> At the moment, only vhost.
>
>> What about ioeventfd?
>
> Same thing.
>

OK...

>> I just wonder how broad
>> the impact of a broken or non-existent eventfd subsystem for kvm-kmod
>> is. Any thoughts welcome.
>
> How do you handle kernels that don't export eventfd_ctx_fileget?

Now that you mention it: not yet properly. So far we pass the file
struct as pseudo eventfd_ctx around on < 2.6.31. But now that I peek
into the struct in kvm_eventfd_ctx_remove_wait_queue, this should should
crash. Guess I need to look up that module the same way as I acquire
__wake_up_locked[_key].

>
>> Jan
>>
>> PS: If anyone forgot why Avi handed over this job, you should now
>> remember why. :)
>
> Heh, I did the same kind of thing for infiniband for
> several years. It's hard to forget.
>

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2010-01-19 14:07:30

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 1/2] kvm: fix spurious interrupt with irqfd

On Tue, Jan 19, 2010 at 03:03:34PM +0100, Jan Kiszka wrote:
> Michael S. Tsirkin wrote:
> > On Tue, Jan 19, 2010 at 02:25:12PM +0100, Jan Kiszka wrote:
> >> Michael S. Tsirkin wrote:
> >>> kvm didn't clear irqfd counter on deassign, as a result we could get a
> >>> spurious interrupt when irqfd is assigned back. this leads to poor
> >>> performance and, in theory, guest crash.
> >>>
> >>> Signed-off-by: Michael S. Tsirkin <[email protected]>
> >>> ---
> >>> virt/kvm/eventfd.c | 3 ++-
> >>> 1 files changed, 2 insertions(+), 1 deletions(-)
> >>>
> >>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> >>> index 62e4cd9..a9d3fc6 100644
> >>> --- a/virt/kvm/eventfd.c
> >>> +++ b/virt/kvm/eventfd.c
> >>> @@ -72,12 +72,13 @@ static void
> >>> irqfd_shutdown(struct work_struct *work)
> >>> {
> >>> struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
> >>> + u64 cnt;
> >>>
> >>> /*
> >>> * Synchronize with the wait-queue and unhook ourselves to prevent
> >>> * further events.
> >>> */
> >>> - remove_wait_queue(irqfd->wqh, &irqfd->wait);
> >>> + eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
> >>>
> >>> /*
> >>> * We know no new events will be scheduled at this point, so block
> >> For kvm-kmod, I'm fighting with compat support for
> >> eventfd_ctx_remove_wait_queue. I basically have a solution for kernels
> >> with CONFIG_KPROBES enabled (I need to look up unexported
> >> __wake_up_locked[_key]), but there will also be target kernels that do
> >> not have this. So there are three options for that case:
> >>
> >> - Warn the user and fall back to the old racy approach
> >> - (Somehow) disable KVM subsystems that use eventfd
> >> - Refuse to start KVM
> >> As far as I understood, irqfd is interesting for device assignment and
> >> now also for vhost, right?
> >
> > At the moment, only vhost.
> >
> >> What about ioeventfd?
> >
> > Same thing.
> >
>
> OK...
>
> >> I just wonder how broad
> >> the impact of a broken or non-existent eventfd subsystem for kvm-kmod
> >> is. Any thoughts welcome.
> >
> > How do you handle kernels that don't export eventfd_ctx_fileget?
>
> Now that you mention it: not yet properly. So far we pass the file
> struct as pseudo eventfd_ctx around on < 2.6.31. But now that I peek
> into the struct in kvm_eventfd_ctx_remove_wait_queue, this should should
> crash. Guess I need to look up that module the same way as I acquire
> __wake_up_locked[_key].

This won't work that well: eventfd in upstream
sends us POLLHUP so we can close the structure,
in old kernels it doesn't so kernel will crash
when we try to reference the structure later.


> >
> >> Jan
> >>
> >> PS: If anyone forgot why Avi handed over this job, you should now
> >> remember why. :)
> >
> > Heh, I did the same kind of thing for infiniband for
> > several years. It's hard to forget.
> >
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

2010-01-19 14:23:31

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH 1/2] kvm: fix spurious interrupt with irqfd

Michael S. Tsirkin wrote:
>>>> I just wonder how broad
>>>> the impact of a broken or non-existent eventfd subsystem for kvm-kmod
>>>> is. Any thoughts welcome.
>>> How do you handle kernels that don't export eventfd_ctx_fileget?
>> Now that you mention it: not yet properly. So far we pass the file
>> struct as pseudo eventfd_ctx around on < 2.6.31. But now that I peek
>> into the struct in kvm_eventfd_ctx_remove_wait_queue, this should should
>> crash. Guess I need to look up that module the same way as I acquire
>> __wake_up_locked[_key].
>
> This won't work that well: eventfd in upstream
> sends us POLLHUP so we can close the structure,
> in old kernels it doesn't so kernel will crash
> when we try to reference the structure later.
>

OK, so any host kernel < 2.6.31 will never work for us. Mmh, then I
could only close the gap 2.6.31..2.6.33. vhost will show up in 33...
Will that version already be worth any eventfd wrapping?

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2010-01-19 14:32:13

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 1/2] kvm: fix spurious interrupt with irqfd

On Tue, Jan 19, 2010 at 03:23:25PM +0100, Jan Kiszka wrote:
> Michael S. Tsirkin wrote:
> >>>> I just wonder how broad
> >>>> the impact of a broken or non-existent eventfd subsystem for kvm-kmod
> >>>> is. Any thoughts welcome.
> >>> How do you handle kernels that don't export eventfd_ctx_fileget?
> >> Now that you mention it: not yet properly. So far we pass the file
> >> struct as pseudo eventfd_ctx around on < 2.6.31. But now that I peek
> >> into the struct in kvm_eventfd_ctx_remove_wait_queue, this should should
> >> crash. Guess I need to look up that module the same way as I acquire
> >> __wake_up_locked[_key].
> >
> > This won't work that well: eventfd in upstream
> > sends us POLLHUP so we can close the structure,
> > in old kernels it doesn't so kernel will crash
> > when we try to reference the structure later.
> >
>
> OK, so any host kernel < 2.6.31 will never work for us. Mmh, then I
> could only close the gap 2.6.31..2.6.33. vhost will show up in 33...
> Will that version already be worth any eventfd wrapping?
>
> Jan

I asked Avi to send a patch upstream into 2.6.32.X fixing spurious
interrupts there. 2.6.31-stable is closed unfortunately, so we won't be
able to support it. Disable eventfd there?

> --
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux