Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8974264imu; Tue, 4 Dec 2018 18:16:34 -0800 (PST) X-Google-Smtp-Source: AFSGD/WOBXowQq21jCETAd+/Mfs7QPgaxJT//18FQFeDvIvtZPYkpzMiUiol8VDs9F0lcHTQpWqn X-Received: by 2002:a63:8149:: with SMTP id t70mr19320608pgd.172.1543976194604; Tue, 04 Dec 2018 18:16:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543976194; cv=none; d=google.com; s=arc-20160816; b=SvQ+pJduy2dkarWomEpKyAFJXLy56sB+ygPFssNwZOnOsjfbykmMhkZRYCFizhsxNv YHAkdzY09ONQR1aBy2+Jm3xgw1U3G6yFRbx5NnI5ZbqxiTAPLcPjexrU9qeFZ35WI8B9 XfJJDditd4502hyl9Gf26lD2BHV+g2ngW26bLclbS5eur5XdLFVjqwNP8PCDWz6CYl7C O3leFuO77EGEqD/zIDmJJPDMprvGTQ5pdAIaHpfkaWpice1SGDCYpzWrZgcHypt3YzGE aQfP/Ikyvr5vG57qPs6SQTagex8w0BK9oqGCFFSVsvpINxmykV1FEk3C37FU16LYXf5d 6xGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=zgbnIngrn77dhs7cO/YOzGwBWJNEc8w7TkaYLt+0xGs=; b=ga7Un8ngOirdZGDvc0y9o7l9LBJTE0gz5R5rS7oSWhBWSSSmV8BtgE8QzMcicKEScN 7QO0ttRtmMwnir3Z7/3q8Rfbhve1tvkLAtCk3UOtRB7w4FninZdHG+sZ7mPkDSRMGIHx mLHiIfIacW4qlID3yoISGtOQDG4mWtHtzA0NtDa/yXS+NaKpIJIDzsTWjI395eeOklv7 UD8DtMMuMP5jCzA2/y7B/le6hTAi+5c45ExrTA/2X0cfa1wmqSqci81IXctiHPnGd0as Ji/R58sAtUJpKrKXm68FBOacW0G2J0cSogshLfHs8cDsRaX8xBGHMqAaMLqzTil6P5pb BOkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f7si17370934pga.87.2018.12.04.18.16.18; Tue, 04 Dec 2018 18:16:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726726AbeLECPk (ORCPT + 99 others); Tue, 4 Dec 2018 21:15:40 -0500 Received: from mga12.intel.com ([192.55.52.136]:59775 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725979AbeLECPk (ORCPT ); Tue, 4 Dec 2018 21:15:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Dec 2018 18:15:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,316,1539673200"; d="scan'208";a="96188844" Received: from gao-cwp.sh.intel.com ([10.239.159.28]) by orsmga007.jf.intel.com with ESMTP; 04 Dec 2018 18:15:37 -0800 From: Chao Gao To: linux-kernel@vger.kernel.org Cc: Chao Gao , Boris Ostrovsky , Juergen Gross , Stefano Stabellini , Jia-Ju Bai , xen-devel@lists.xenproject.org, =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= , Jan Beulich Subject: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device Date: Wed, 5 Dec 2018 10:19:17 +0800 Message-Id: <1543976357-1053-1-git-send-email-chao.gao@intel.com> X-Mailer: git-send-email 1.9.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I find some pass-thru devices don't work any more across guest reboot. Assigning it to another guest also meets the same issue. And the only way to make it work again is un-binding and binding it to pciback. Someone reported this issue one year ago [1]. More detail also can be found in [2]. The root-cause is Xen's internal MSI-X state isn't reset properly during reboot or re-assignment. In the above case, Xen set maskall bit to mask all MSI interrupts after it detected a potential security issue. Even after device reset, Xen didn't reset its internal maskall bit. As a result, maskall bit would be set again in next write to MSI-X message control register. Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X internal state of a device, we employ it to fix this issue rather than introducing another dedicated sub-hypercall. Note that PHYSDEVOPS_release_msix() will fail if the mapping between the device's msix and pirq has been created. This limitation prevents us calling this function when detaching a device from a guest during guest shutdown. Thus it is called right before calling PHYSDEVOPS_prepare_msix(). [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/ msg02520.html [2]: https://lists.xen.org/archives/html/xen-devel/2018-11/msg01616.html Signed-off-by: Chao Gao --- drivers/xen/xen-pciback/pci_stub.c | 49 ++++++++++++++++++++++++++++++++++++++ drivers/xen/xen-pciback/pciback.h | 1 + drivers/xen/xen-pciback/xenbus.c | 10 ++++++++ 3 files changed, 60 insertions(+) diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 59661db..f8623d0 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -87,6 +87,55 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) return psdev; } +/* + * Reset Xen internal MSI-X state by invoking PHYSDEVOP_{release, prepare}_msix. + */ +int pcistub_msix_reset(struct pci_dev *dev) +{ +#ifdef CONFIG_PCI_MSI + if (dev->msix_cap) { + struct physdev_pci_device ppdev = { + .seg = pci_domain_nr(dev->bus), + .bus = dev->bus->number, + .devfn = dev->devfn + }; + int err; + u16 val; + + /* + * Do a write first to flush Xen's internal state to hardware + * such that the following read can infer whether MSI-X maskall + * bit is set by Xen. + */ + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &val); + pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, val); + + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &val); + if (!(val & PCI_MSIX_FLAGS_MASKALL)) + return 0; + + pr_info("Reset MSI-X state for device %04x:%02x:%02x.%d\n", + ppdev.seg, ppdev.bus, PCI_SLOT(ppdev.devfn), + PCI_FUNC(ppdev.devfn)); + + err = HYPERVISOR_physdev_op(PHYSDEVOP_release_msix, &ppdev); + if (err) { + dev_warn(&dev->dev, "MSI-X release failed (%d)\n", + err); + return err; + } + + err = HYPERVISOR_physdev_op(PHYSDEVOP_prepare_msix, &ppdev); + if (err) { + dev_err(&dev->dev, "MSI-X preparation failed (%d)\n", + err); + return err; + } + } +#endif + return 0; +} + /* Don't call this directly as it's called by pcistub_device_put */ static void pcistub_device_release(struct kref *kref) { diff --git a/drivers/xen/xen-pciback/pciback.h b/drivers/xen/xen-pciback/pciback.h index 263c059..9046154 100644 --- a/drivers/xen/xen-pciback/pciback.h +++ b/drivers/xen/xen-pciback/pciback.h @@ -66,6 +66,7 @@ struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev, struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device *pdev, struct pci_dev *dev); void pcistub_put_pci_dev(struct pci_dev *dev); +int pcistub_msix_reset(struct pci_dev *dev); /* Ensure a device is turned off or reset */ void xen_pcibk_reset_device(struct pci_dev *pdev); diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c index 581c4e1..2f71f26 100644 --- a/drivers/xen/xen-pciback/xenbus.c +++ b/drivers/xen/xen-pciback/xenbus.c @@ -243,6 +243,16 @@ static int xen_pcibk_export_device(struct xen_pcibk_device *pdev, goto out; } + /* + * Reset Xen's internal MSI-X state before exposing a device. + * + * In some cases, Xen's internal MSI-X state is not clean, which would + * incur the new guest cannot receive MSIs. + */ + err = pcistub_msix_reset(dev); + if (err) + goto out; + err = xen_pcibk_add_pci_dev(pdev, dev, devid, xen_pcibk_publish_pci_dev); if (err) -- 1.8.3.1