Message-ID: <1340551118.14120.66.camel@bling.home>
Subject: Re: [PATCH 3/4] kvm: Extend irqfd to support level interrupts
From: Alex Williamson <alex.williamson@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jan.kiszka@siemens.com,
        mst@redhat.com
Date: Sun, 24 Jun 2012 09:18:38 -0600
In-Reply-To: <4FE6EC20.5030502@redhat.com>
References: <20120622220040.9858.43665.stgit@bling.home>
	 <20120622221559.9858.59593.stgit@bling.home> <4FE6EC20.5030502@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7542
Lines: 195

On Sun, 2012-06-24 at 13:29 +0300, Avi Kivity wrote:
> On 06/23/2012 01:16 AM, Alex Williamson wrote:
> > KVM_IRQFD currently only supports edge triggered interrupts,
> > asserting then immediately deasserting an interrupt.  There are a
> > couple ways we can emulate level triggered interrupts using
> > discrete events depending on the usage model we expect from drivers.
> > This patch implements a level emulation model useful for external
> > assigned device drivers, like VFIO.  The irqfd is used to assert
> > the interrupt.  When the guest issues an EOI for the interrupt, the
> > level is automatically deasserted and the irqfd user is notified via
> > an eventfd.  This is therefore the LEVEL_EOI extension to KVM_IRQFD.
> > To do this, we need to allocate a new irq source ID for the interrupt
> > so we don't get interference from userspace.
> > 
> >  
> > +With KVM_CAP_IRQFD_LEVEL_EOI KVM_IRQFD is able to support a level
> > +triggered interrupt model where the irqchip pin (kvm_irqfd.gsi) is
> > +asserted from the kvm_irqfd.fd eventfd and remain asserted until the
> > +guest issues an EOI for the irqchip pin.  The level interrupt is
> > +then de-asserted and the caller is notified via the eventfd specified
> > +by kvm_irqfd.fd2.  Note that users of this interface are responsible
> > +for re-asserting the interrupt if their device still requires service
> > +after receiving the EOI notification.  Additionally, users must not
> > +re-assert an interrupt until after receiving an EOI.  
> 
> What happens if this is violated?

Hmm, perhaps nothing.  The only race I see is re-asserting in the gap
between de-asserting the guest and sending the EOI.  At worst that would
cause a spurious interrupt, so probably no big deal.

> > When available,
> > +this feature is enabled using the KVM_IRQFD_FLAG_LEVEL_EOI flag.
> > +De-assigning an irqfd setup using this flag should include both
> > +KVM_IRQFD_FLAG_DEASSIGN and KVM_IRQFD_FLAG_LEVEL_EOI and will be
> > +matched using kvm_irqfd.fd, kvm_irqfd.gsi, and kvm_irqfd.fd2.
> > +De-assigning automatically de-asserts the interrupt line setup through
> > +this interface.
> >  
> > @@ -203,8 +232,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  	struct kvm_irq_routing_table *irq_rt;
> >  	struct _irqfd *irqfd, *tmp;
> >  	struct file *file = NULL;
> > -	struct eventfd_ctx *eventfd = NULL;
> > -	int ret;
> > +	struct eventfd_ctx *eventfd = NULL, *eoi_eventfd = NULL;
> > +	int ret, irq_source_id = -1;
> >  	unsigned int events;
> >  
> >  	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> > @@ -214,7 +243,30 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  	irqfd->kvm = kvm;
> >  	irqfd->gsi = args->gsi;
> >  	INIT_LIST_HEAD(&irqfd->list);
> > -	INIT_WORK(&irqfd->inject, irqfd_inject);
> > +
> > +	if (args->flags & KVM_IRQFD_FLAG_LEVEL_EOI) {
> > +		irq_source_id = kvm_request_irq_source_id(kvm);
> > +		if (irq_source_id < 0) {
> > +			ret = irq_source_id;
> > +			goto fail;
> 
> 'file' is NULL at this point, and fput() doesn't test for NULL.

Good catch.  I was looking for an excuse to move the existing code to
eventfd_ctx_fdget() and avoid the 2 step process it uses now.

> > +		}
> > +
> > +		eoi_eventfd = eventfd_ctx_fdget(args->fd2);
> > +		if (IS_ERR(eoi_eventfd)) {
> > +			ret = PTR_ERR(eoi_eventfd);
> > +			goto fail;
> 
> Same.

Yep

> > +		}
> > +
> > +		irqfd->irq_source_id = irq_source_id;
> > +		irqfd->eoi_eventfd = eoi_eventfd;
> > +		irqfd->notifier.gsi = args->gsi;
> > +		irqfd->notifier.irq_acked = irqfd_ack_level;
> > +		kvm_register_irq_ack_notifier(kvm, &irqfd->notifier);
> > +
> > +		INIT_WORK(&irqfd->inject, irqfd_inject_level);
> > +	} else
> > +		INIT_WORK(&irqfd->inject, irqfd_inject_edge);
> > +
> >  	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
> >  
> >  	file = eventfd_fget(args->fd);
> 
> > @@ -242,7 +299,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  
> >  	ret = 0;
> >  	list_for_each_entry(tmp, &kvm->irqfds.items, list) {
> > -		if (irqfd->eventfd != tmp->eventfd)
> > +		if (irqfd->eventfd != tmp->eventfd &&
> > +		    irqfd->eventfd != tmp->eoi_eventfd)
> >  			continue;
> 
> So we allow duplicate irqfd with differing eoifd (or edge-triggered and
> level-triggered irqfd on the same context).
> 
> (why the check in the first place? just so we can have a reliable
> deassign or is it avoiding a deeper problem?)

I really wasn't sure to what extent we wanted to prevent duplicates.  My
guess was that we don't want to have an irqfd trigger more than one
thing.  That seems to be what the current code does.  I don't see any
problems with multiple irqfds triggering the same eventfd though.  I
only added a test that a new irqfd can't be triggered by an existing
eoi_eventfd as that could make a nasty loop.

> >  		/* This fd is used for another irq already. */
> >  		ret = -EBUSY;
> > @@ -282,6 +340,14 @@ fail:
> >  	if (!IS_ERR(file))
> >  		fput(file);
> >  
> > +	if (eoi_eventfd && !IS_ERR(eoi_eventfd)) {
> > +		kvm_unregister_irq_ack_notifier(kvm, &irqfd->notifier);
> > +		eventfd_ctx_put(eoi_eventfd);
> > +	}
> > +
> > +	if (irq_source_id >= 0)
> > +		kvm_free_irq_source_id(kvm, irq_source_id);
> > +
> >  	kfree(irqfd);
> >  	return ret;
> >  }
> > @@ -301,16 +367,26 @@ static int
> >  kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> >  {
> >  	struct _irqfd *irqfd, *tmp;
> > -	struct eventfd_ctx *eventfd;
> > +	struct eventfd_ctx *eventfd, *eoi_eventfd = NULL;
> >  
> >  	eventfd = eventfd_ctx_fdget(args->fd);
> >  	if (IS_ERR(eventfd))
> >  		return PTR_ERR(eventfd);
> >  
> > +	if (args->flags & KVM_IRQFD_FLAG_LEVEL_EOI) {
> > +		eoi_eventfd = eventfd_ctx_fdget(args->fd2);
> > +		if (IS_ERR(eoi_eventfd)) {
> > +			eventfd_ctx_put(eventfd);
> > +			return PTR_ERR(eoi_eventfd);
> > +		}
> > +	}
> > +
> >  	spin_lock_irq(&kvm->irqfds.lock);
> >  
> >  	list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
> > -		if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
> > +		if (irqfd->eventfd == eventfd &&
> > +		    irqfd->gsi == args->gsi &&
> > +		    irqfd->eoi_eventfd == eoi_eventfd) {
> >  			/*
> >  			 * This rcu_assign_pointer is needed for when
> >  			 * another thread calls kvm_irq_routing_update before
> > @@ -326,6 +402,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> >  
> >  	spin_unlock_irq(&kvm->irqfds.lock);
> >  	eventfd_ctx_put(eventfd);
> > +	if (eoi_eventfd)
> > +		eventfd_ctx_put(eoi_eventfd);
> >  
> >  	/*
> >  	 * Block until we know all outstanding shutdown jobs have completed
> 
> I see there is no check on undefined ->flags that you had to relax.
> This means that there could be buggy code out there that sets LEVEL_EOI
> and will now get an unexpected error.  I doubt it's the case, but please
> add a patch (in front of the series) that adds the check.

Ok.

> Xen had/has a hack for doing this in a different way, based on ioapic
> polarity.  When the host takes an interrupt, they reverse the polarity
> on that ioapic pin, so they get interrupts on both assertion and
> deassertion.  This is more general and more correct, but waaaaaaaaaaay
> more intrusive and won't play well with shared host interrupts.  But
> let's at least consider it.

Thanks, I'll look for that code.

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/