Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752744AbbF2N2N (ORCPT ); Mon, 29 Jun 2015 09:28:13 -0400 Received: from mga03.intel.com ([134.134.136.65]:27365 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752995AbbF2N2B (ORCPT ); Mon, 29 Jun 2015 09:28:01 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,698,1427785200"; d="scan'208";a="719602047" From: "Wu, Feng" To: Alex Williamson CC: Eric Auger , Avi Kivity , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "pbonzini@redhat.com" , "mtosatti@redhat.com" , Joerg Roedel , "Wu, Feng" Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding Thread-Topic: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding Thread-Index: AQHQqgHz2hS85RWS20+nmLf2OgReZJ3Dhvfg Date: Mon, 29 Jun 2015 13:27:50 +0000 Message-ID: References: <1434019912-15423-1-git-send-email-feng.wu@intel.com> <1434019912-15423-9-git-send-email-feng.wu@intel.com> <5579E884.3040500@gmail.com> <1434123695.4927.304.camel@redhat.com> <557B2994.1070900@gmail.com> <1434135815.4927.308.camel@redhat.com> <557EFA7F.9010209@linaro.org> <1434386702.4927.391.camel@redhat.com> <1434657848.3700.83.camel@redhat.com> In-Reply-To: <1434657848.3700.83.camel@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id t5TDSHHo005984 Content-Length: 16192 Lines: 442 > -----Original Message----- > From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Friday, June 19, 2015 4:04 AM > To: Wu, Feng > Cc: Eric Auger; Avi Kivity; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; > pbonzini@redhat.com; mtosatti@redhat.com; Joerg Roedel > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > [Adding Joerg since he was part of this original idea] > > On Thu, 2015-06-18 at 09:16 +0000, Wu, Feng wrote: > > > > > > > -----Original Message----- > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > Sent: Tuesday, June 16, 2015 12:45 AM > > > To: Eric Auger > > > Cc: Avi Kivity; Wu, Feng; kvm@vger.kernel.org; > linux-kernel@vger.kernel.org; > > > pbonzini@redhat.com; mtosatti@redhat.com > > > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > > > > > On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote: > > > > Hi Alex, all, > > > > On 06/12/2015 09:03 PM, Alex Williamson wrote: > > > > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote: > > > > >> On 06/12/2015 06:41 PM, Alex Williamson wrote: > > > > >>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote: > > > > >>>>> -----Original Message----- > > > > >>>>> From: Avi Kivity [mailto:avi.kivity@gmail.com] > > > > >>>>> Sent: Friday, June 12, 2015 3:59 AM > > > > >>>>> To: Wu, Feng; kvm@vger.kernel.org; linux-kernel@vger.kernel.org > > > > >>>>> Cc: pbonzini@redhat.com; mtosatti@redhat.com; > > > > >>>>> alex.williamson@redhat.com; eric.auger@linaro.org > > > > >>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > > > >>>>> > > > > >>>>> On 06/11/2015 01:51 PM, Feng Wu wrote: > > > > >>>>>> From: Eric Auger > > > > >>>>>> > > > > >>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE > > > group > > > > >>>>>> and 2 device attributes: > KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, > > > > >>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be > > > able > > > > >>>>>> to set a VFIO device IRQ as forwarded or not forwarded. > > > > >>>>>> the command takes as argument a handle to a new struct named > > > > >>>>>> kvm_vfio_dev_irq. > > > > >>>>> Is there no way to do this automatically? After all, vfio knows that > a > > > > >>>>> device interrupt is forwarded to some eventfd, and kvm knows that > > > some > > > > >>>>> eventfd is forwarded to a guest interrupt. If they compare notes > > > > >>>>> through a central registry, they can figure out that the interrupt > needs > > > > >>>>> to be forwarded. > > > > >>>> Oh, just like Eric mentioned in his reply, this description is out of > context > > > of > > > > >>>> this series, I will remove them in the next version. > > > > >>> > > > > >>> I suspect Avi's question was more general. While forward/unforward > is > > > > >>> out of context for this series, it's very similar in nature to > > > > >>> enabling/disabling posted interrupts. So I think the question remains > > > > >>> whether we really need userspace to participate in creating this > > > > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out > > > > >>> automatically. > > > > >>> > > > > >>> Personally I don't know how we could do it automatically. We've > always > > > > >>> relied on userspace to independently setup vfio and kvm such that > > > > >>> neither have any idea that the other is there and update each side > > > > >>> independently when anything changes. So it seems consistent to > > > continue > > > > >>> that here. It doesn't seem like there's much to gain > performance-wise > > > > >>> either, updates should be a relatively rare event I'd expect. > > > > >>> > > > > >>> There's really no metadata associated with an eventfd, so "comparing > > > > >>> notes" automatically might imply some central registration entity. > That > > > > >>> immediately sounds like a much more complex solution, but maybe Avi > > > has > > > > >>> some ideas to manage it. Thanks, > > > > >>> > > > > >> > > > > >> The idea is to have a central registry maintained by a posted interrupts > > > > >> manager. Both vfio and kvm pass the filp (along with extra > information) > > > > >> to the posted interrupts manager, which, when it detects a filp match, > > > > >> tells each of them what to do. > > > > >> > > > > >> The advantages are: > > > > >> - old userspace gains the optimization without change > > > > >> - a userspace API is more expensive to maintain than internal kernel > > > > >> interfaces (CVEs, documentation, maintaining backwards compatibility) > > > > >> - if you can do it without a new interface, this indicates that all the > > > > >> information in the new interface is redundant. That means you have > to > > > > >> check it for consistency with the existing information, so it's extra > > > > >> work (likely, it's exactly what the posted interrupt manager would be > > > > >> doing anyway). > > > > > > > > > > Yep, those all sound like good things and I believe that's similar in > > > > > design to the way we had originally discussed this interaction at > > > > > LPC/KVM Forum several years ago. I'd be in favor of that approach. > > > > > > > > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO > > > > IRQ forward control" series? Or is that "central registry maintained by > > > > a posted interrupts manager" something more specific to x86? > > > > > > I'd think we'd want it for any sort of offload and supporting both > > > posted-interrupts and irq-forwarding would be a good validation. I > > > imagine there would be registration/de-registration callbacks separate > > > for interrupt producers vs interrupt consumers. Each registration > > > function would likely provide a struct of callbacks, probably similar to > > > the get_symbol callbacks proposed for the kvm-vfio device on the IRQ > > > producer side. The eventfd would be the token that the manager would > > > use to match producers and consumers. The hard part is probably > > > figuring out what information to retrieve from the producer and provide > > > to the consumer in a generic way between pci and platform, but as an > > > internal interface, it's not a big deal if we screw it up a few times to > > > start. Thanks, > > > > On posted-interrupts side, the main purpose of the new APIs is to update > > the IRTE when guest changes vMSI/vMSIx configuration. Alex, do you have > > any detailed ideas for the new solution to achieve this purpose? It should > > be helpful if you can share some! > > > There are plenty of details to be filled in, but I think the basics > looks something like the code below. The IRQ bypass manager just > defines a pair of structures, one for interrupt producers and one for > interrupt consumers. I'm certain that we'll need more callbacks than > I've defined below, but figuring out what those should be for the best > abstraction is the hardest part of this idea. The manager provides both > registration and de-registration interfaces for both types of objects > and keeps lists for each, protected by a lock. The manager doesn't even > really need to know what the match token is, but I assume for our > purposes it will be an eventfd_ctx. > > On the vfio side, the producer struct would be embedded in the > vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct > in _irqfd. As I've coded below, the IRQ bypass manager calls the > consumer callbacks, so the producer struct would need fields or > callbacks to provide the consumer the info it needs. AIUI the Posted > Interrupt model, VFIO only needs to provide data to the consumer. For > IRQ Forwarding, I think the producer needs to be informed when bypass is > active to model the incoming interrupt as edge vs level. > > I've prototyped the base IRQ bypass manager here as static, but I don't > see any reason it couldn't be a module that's loaded by dependency when > either vfio-pci or kvm-intel is loaded (or other producer/consumer > objects). > > Is this a reasonable starting point to craft the additional fields and > callbacks and interaction of who calls who that we need to support > Posted Interrupts and IRQ Forwarding? Is the AMD version of this still > alive? Thanks, > > Alex > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index 413a7bf..22f6fcb 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -61,6 +61,7 @@ config KVM_INTEL > depends on KVM > # for perf_guest_get_msrs(): > depends on CPU_SUP_INTEL > + select IRQ_BYPASS_MANAGER > ---help--- > Provides support for KVM on Intel processors equipped with the VT > extensions. > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig > index 579d83b..02912f1 100644 > --- a/drivers/vfio/pci/Kconfig > +++ b/drivers/vfio/pci/Kconfig > @@ -2,6 +2,7 @@ config VFIO_PCI > tristate "VFIO support for PCI devices" > depends on VFIO && PCI && EVENTFD > select VFIO_VIRQFD > + select IRQ_BYPASS_MANAGER > help > Support for the PCI VFIO bus driver. This is required to make > use of PCI drivers using the VFIO framework. > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index 1f577b4..4e053be 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device > *vdev, int fd) > > if (vdev->ctx[0].trigger) { > free_irq(pdev->irq, vdev); > + /* irq_bypass_unregister_producer(); */ > kfree(vdev->ctx[0].name); > eventfd_ctx_put(vdev->ctx[0].trigger); > vdev->ctx[0].trigger = NULL; > @@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device > *vdev, int fd) > return ret; > } > > + /* irq_bypass_register_producer(); */ > + > /* > * INTx disable will stick across the new irq setup, > * disable_irq won't. > @@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct > vfio_pci_device *vdev, > > if (vdev->ctx[vector].trigger) { > free_irq(irq, vdev->ctx[vector].trigger); > + /* irq_bypass_unregister_producer(); */ > kfree(vdev->ctx[vector].name); > eventfd_ctx_put(vdev->ctx[vector].trigger); > vdev->ctx[vector].trigger = NULL; > @@ -360,6 +364,8 @@ static int vfio_msi_set_vector_signal(struct > vfio_pci_device *vdev, > return ret; > } > > + /* irq_bypass_register_producer(); */ > + > vdev->ctx[vector].trigger = trigger; > > return 0; > diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h > new file mode 100644 > index 0000000..718508e > --- /dev/null > +++ b/include/linux/irqbypass.h > @@ -0,0 +1,23 @@ > +#ifndef IRQBYPASS_H > +#define IRQBYPASS_H > + > +#include > + > +struct irq_bypass_producer { > + struct list_head node; > + void *token; > + /* TBD */ > +}; > + > +struct irq_bypass_consumer { > + struct list_head node; > + void *token; > + void (*add_producer)(struct irq_bypass_producer *); > + void (*del_producer)(struct irq_bypass_producer *); These two callbacks should be common function, for PI, I need to add something specific to x86, such as, updating the associated IRTE, how should I do for this? > +}; > + > +int irq_bypass_register_producer(struct irq_bypass_producer *); > +void irq_bypass_unregister_producer(struct irq_bypass_producer *); > +int irq_bypass_register_consumer(struct irq_bypass_consumer *); > +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *); > +#endif /* IRQBYPASS_H */ > diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig > index 9a76e3b..4502cdc 100644 > --- a/kernel/irq/Kconfig > +++ b/kernel/irq/Kconfig > @@ -100,4 +100,7 @@ config SPARSE_IRQ > > If you don't know what to do here, say N. > > +config IRQ_BYPASS_MANAGER > + bool > + > endmenu > diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile > index d121235..a30ed77 100644 > --- a/kernel/irq/Makefile > +++ b/kernel/irq/Makefile > @@ -7,3 +7,4 @@ obj-$(CONFIG_PROC_FS) += proc.o > obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o > obj-$(CONFIG_PM_SLEEP) += pm.o > obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o > +obj-$(CONFIG_IRQ_BYPASS_MANAGER) += bypass.o > diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c > new file mode 100644 > index 0000000..5d0f92b > --- /dev/null > +++ b/kernel/irq/bypass.c Is it better to put this code here or in vfio folder? Thanks, Feng > @@ -0,0 +1,116 @@ > +/* > + * IRQ offload/bypass manager > + * > + * Various virtualization hardware acceleration techniques allow bypassing > + * or offloading interrupts receieved from devices around the host kernel. > + * Posted Interrupts on Intel VT-d systems can allow interrupts to be > + * recieved directly by a virtual machine. ARM IRQ Forwarding can allow > + * level triggered device interrupts to be de-asserted directly by the VM. > + * This manager allows interrupt producers and consumers to find each other > + * to enable this sort of bypass. > + */ > + > +#include > +#include > +#include > +#include > + > +static LIST_HEAD(producers); > +static LIST_HEAD(consumers); > +static DEFINE_MUTEX(lock); > + > +int irq_bypass_register_producer(struct irq_bypass_producer *producer) > +{ > + struct irq_bypass_producer *tmp; > + struct irq_bypass_consumer *consumer; > + int ret = 0; > + > + mutex_lock(&lock); > + > + list_for_each_entry(tmp, &producers, node) { > + if (tmp->token == producer->token) { > + ret = -EINVAL; > + goto unlock; > + } > + } > + > + list_add(&producer->node, &producers); > + > + list_for_each_entry(consumer, &consumers, node) { > + if (consumer->token == producer->token) { > + consumer->add_producer(producer); > + break; > + } > + } > +unlock: > + mutex_unlock(&lock); > + return ret; > +} > +EXPORT_SYMBOL_GPL(irq_bypass_register_producer); > + > +void irq_bypass_unregister_producer(struct irq_bypass_producer *producer) > +{ > + struct irq_bypass_consumer *consumer; > + > + mutex_lock(&lock); > + > + list_for_each_entry(consumer, &consumers, node) { > + if (consumer->token == producer->token) { > + consumer->del_producer(producer); > + break; > + } > + } > + > + list_del(&producer->node); > + > + mutex_unlock(&lock); > +} > +EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer); > + > +int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer) > +{ > + struct irq_bypass_consumer *tmp; > + struct irq_bypass_producer *producer; > + int ret = 0; > + > + mutex_lock(&lock); > + > + list_for_each_entry(tmp, &consumers, node) { > + if (tmp->token == consumer->token) { > + ret = -EINVAL; > + goto unlock; > + } > + } > + > + list_add(&consumer->node, &consumers); > + > + list_for_each_entry(producer, &producers, node) { > + if (producer->token == consumer->token) { > + consumer->add_producer(producer); > + break; > + } > + } > +unlock: > + mutex_unlock(&lock); > + return ret; > +} > +EXPORT_SYMBOL_GPL(irq_bypass_register_consumer); > + > +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer) > +{ > + struct irq_bypass_producer *producer; > + > + mutex_lock(&lock); > + > + list_for_each_entry(producer, &producers, node) { > + if (producer->token == consumer->token) { > + consumer->del_producer(producer); > + break; > + } > + } > + > + list_del(&consumer->node); > + > + mutex_unlock(&lock); > +} > +EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer); > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 9ff4193..f3da161 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -429,6 +429,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd > *args) > */ > fdput(f); > > + /* irq_bypass_register_consumer(); */ > + > return 0; > > fail: > @@ -528,6 +530,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd > *args) > struct _irqfd *irqfd, *tmp; > struct eventfd_ctx *eventfd; > > + /* irq_bypass_unregister_consumer() */ > + > eventfd = eventfd_ctx_fdget(args->fd); > if (IS_ERR(eventfd)) > return PTR_ERR(eventfd); > > ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?