Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp422524ybl; Fri, 30 Aug 2019 01:43:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqywg3T4ZWRPk4cm5XTB9i2dcqqGudWPoivxzLBPHxvHLG4EZuKpD7jeAwnT6ZR1Bu4kJ7fV X-Received: by 2002:aa7:9477:: with SMTP id t23mr16981386pfq.29.1567154621124; Fri, 30 Aug 2019 01:43:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567154621; cv=none; d=google.com; s=arc-20160816; b=szAuo03aj82aYtW0RXcw50/8tlkL5IU2JVzn80hxy6k9xIC6iHSewaN70S14ZkjWAi VfNLxpwe2S12PIfc5bfd12XMzOwtHeFwqFlcU3UMHOaNV1WcuEiQh1LJIzc8zqB7chLu EMmOhgcBe16jvZDcjSpg6rUl8li9wJ8tR4DRaRgXahh0Erue9xqoIxXD57Dw2/wYt2LE FaPYijBZ7bqgoYgHjq3D4b33L7TPao+EFF0+fCzrEUQ7R14pWYLs+OQeqSk1BCk0+xdw SOkAJrpNLGVzhFEkBi8AAi8Fw0yzPSFJpgYJtUROI0E/tkgnGLlrB/F6FMOlDhnlIBWC G+jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=+ga8wzJ0RJWj/YieYqTzltXklMHbIz4w2dbKd5ULDuI=; b=CpeP2t357oqUTveFzpPWDqoZHufzXtFQF8wet2NnKPV2fxv2wB9RKlBniAfwsbf3T9 yT0KUEq/x4du9zGx0Qlq2lQIDnx3pBx9bAdQkXE5x9qK57R76N90bHyx3wwEIOy5tuyO zwo72+gt75GG3as13HHFEZ3ptAJqOB+3KALcCvr8hKhJWU+b2m/K5yWpOEqWow0B0Sy/ 3LYQAZwBBtg3wVCfA/RIp4IWkIBPEGj9mE8BjcLsk0tJ63WN0irbZLIIvte98RB4rTQe VMW/EPt+PKr7ZRQltBEX1loYQIsrmNY6hi9m5nANca748JbX+SAFiWtk+Vp76iFDY4J0 2LfA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p9si4168959plk.386.2019.08.30.01.43.25; Fri, 30 Aug 2019 01:43:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727836AbfH3ImR (ORCPT + 99 others); Fri, 30 Aug 2019 04:42:17 -0400 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:39041 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726200AbfH3ImR (ORCPT ); Fri, 30 Aug 2019 04:42:17 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R451e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=luoben@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0Taqk710_1567154526; Received: from localhost(mailfrom:luoben@linux.alibaba.com fp:SMTPD_---0Taqk710_1567154526) by smtp.aliyun-inc.com(127.0.0.1); Fri, 30 Aug 2019 16:42:09 +0800 From: Ben Luo To: tglx@linutronix.de, alex.williamson@redhat.com Cc: linux-kernel@vger.kernel.org, tao.ma@linux.alibaba.com, gerry@linux.alibaba.com, nanhai.zou@linux.alibaba.com Subject: [PATCH v5 3/3] vfio/pci: make use of irq_update_devid and optimize irq ops Date: Fri, 30 Aug 2019 16:42:06 +0800 Message-Id: <9a8b3fc5d82c3c46feb0de673fbe898cfd884d63.1567151182.git.luoben@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When userspace (e.g. qemu) triggers a switch between KVM irqfd and userspace eventfd, only dev_id of irqaction (i.e. the "trigger" in this patch's context) will be changed, but a free-then-request-irq action is taken in current code. And, irq affinity setting in VM will also trigger a free-then-request-irq action, which actually changes nothing, but only need to bounce the irqbypass registraion in case that posted-interrupt is in use. This patch makes use of irq_update_devid() and optimize both cases above, which reduces the risk of losing interrupt and also cuts some overhead. Signed-off-by: Ben Luo --- drivers/vfio/pci/vfio_pci_intrs.c | 124 ++++++++++++++++++++++++++------------ 1 file changed, 87 insertions(+), 37 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 3fa3f72..d3a93d7 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -284,70 +284,120 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix) static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, int vector, int fd, bool msix) { + struct eventfd_ctx *trigger = NULL; struct pci_dev *pdev = vdev->pdev; - struct eventfd_ctx *trigger; int irq, ret; if (vector < 0 || vector >= vdev->num_ctx) return -EINVAL; + if (fd >= 0) { + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) { + /* oops, going to disable this interrupt */ + dev_info(&pdev->dev, + "get ctx error on bad fd: %d for vector:%d\n", + fd, vector); + } + } + irq = pci_irq_vector(pdev, vector); + /* + * 'trigger' is NULL or invalid, disable the interrupt + * 'trigger' is same as before, only bounce the bypass registration + * 'trigger' is a new invalid one, update it to irqaction and other + * data structures referencing to the old one; fallback to disable + * the interrupt on error + */ if (vdev->ctx[vector].trigger) { - free_irq(irq, vdev->ctx[vector].trigger); + /* + * even if the trigger is unchanged we need to bounce the + * interrupt bypass connection to allow affinity changes in + * the guest to be realized. + */ irq_bypass_unregister_producer(&vdev->ctx[vector].producer); - kfree(vdev->ctx[vector].name); - eventfd_ctx_put(vdev->ctx[vector].trigger); - vdev->ctx[vector].trigger = NULL; + + if (vdev->ctx[vector].trigger == trigger) { + /* avoid duplicated referencing to the same trigger */ + eventfd_ctx_put(trigger); + + } else if (trigger && !IS_ERR(trigger)) { + ret = irq_update_devid(irq, + vdev->ctx[vector].trigger, trigger); + if (unlikely(ret)) { + dev_info(&pdev->dev, + "update devid of %d (token %p) failed: %d\n", + irq, vdev->ctx[vector].trigger, ret); + eventfd_ctx_put(trigger); + free_irq(irq, vdev->ctx[vector].trigger); + kfree(vdev->ctx[vector].name); + eventfd_ctx_put(vdev->ctx[vector].trigger); + vdev->ctx[vector].trigger = NULL; + return ret; + } + eventfd_ctx_put(vdev->ctx[vector].trigger); + vdev->ctx[vector].producer.token = trigger; + vdev->ctx[vector].trigger = trigger; + + } else { + free_irq(irq, vdev->ctx[vector].trigger); + kfree(vdev->ctx[vector].name); + eventfd_ctx_put(vdev->ctx[vector].trigger); + vdev->ctx[vector].trigger = NULL; + } } if (fd < 0) return 0; + else if (IS_ERR(trigger)) + return PTR_ERR(trigger); - vdev->ctx[vector].name = kasprintf(GFP_KERNEL, "vfio-msi%s[%d](%s)", - msix ? "x" : "", vector, - pci_name(pdev)); - if (!vdev->ctx[vector].name) - return -ENOMEM; + if (!vdev->ctx[vector].trigger) { + vdev->ctx[vector].name = kasprintf(GFP_KERNEL, + "vfio-msi%s[%d](%s)", + msix ? "x" : "", vector, + pci_name(pdev)); + if (!vdev->ctx[vector].name) { + eventfd_ctx_put(trigger); + return -ENOMEM; + } - trigger = eventfd_ctx_fdget(fd); - if (IS_ERR(trigger)) { - kfree(vdev->ctx[vector].name); - return PTR_ERR(trigger); - } + /* + * The MSIx vector table resides in device memory which may be + * cleared via backdoor resets. We don't allow direct access to + * the vector table so even if a userspace driver attempts to + * save/restore around such a reset it would be unsuccessful. + * To avoid this, restore the cached value of the message prior + * to enabling. + */ + if (msix) { + struct msi_msg msg; - /* - * The MSIx vector table resides in device memory which may be cleared - * via backdoor resets. We don't allow direct access to the vector - * table so even if a userspace driver attempts to save/restore around - * such a reset it would be unsuccessful. To avoid this, restore the - * cached value of the message prior to enabling. - */ - if (msix) { - struct msi_msg msg; + get_cached_msi_msg(irq, &msg); + pci_write_msi_msg(irq, &msg); + } - get_cached_msi_msg(irq, &msg); - pci_write_msi_msg(irq, &msg); - } + ret = request_irq(irq, vfio_msihandler, 0, + vdev->ctx[vector].name, trigger); + if (ret) { + kfree(vdev->ctx[vector].name); + eventfd_ctx_put(trigger); + return ret; + } - ret = request_irq(irq, vfio_msihandler, 0, - vdev->ctx[vector].name, trigger); - if (ret) { - kfree(vdev->ctx[vector].name); - eventfd_ctx_put(trigger); - return ret; + vdev->ctx[vector].producer.token = trigger; + vdev->ctx[vector].producer.irq = irq; + vdev->ctx[vector].trigger = trigger; } - vdev->ctx[vector].producer.token = trigger; - vdev->ctx[vector].producer.irq = irq; + /* setup bypass connection and make irte updated */ ret = irq_bypass_register_producer(&vdev->ctx[vector].producer); if (unlikely(ret)) dev_info(&pdev->dev, "irq bypass producer (token %p) registration fails: %d\n", vdev->ctx[vector].producer.token, ret); - vdev->ctx[vector].trigger = trigger; - return 0; } -- 1.8.3.1