Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp9258485imu; Wed, 5 Dec 2018 01:35:50 -0800 (PST) X-Google-Smtp-Source: AFSGD/UZXjXMIeAVP4e9UBB6Fvngs5PQLrdMOwNY66pBsqWDYO81kAYb9m+3gAH0SBvqkdF4vL26 X-Received: by 2002:a62:33c1:: with SMTP id z184mr23417486pfz.104.1544002550201; Wed, 05 Dec 2018 01:35:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544002550; cv=none; d=google.com; s=arc-20160816; b=vS0bQe9WN1fZiZYWyweO+vU2qY8UkoxcU6Qza8lu5OaSVIM8yQSuifjH6k1dG8+J4G aOo41XjmP/xpyYxwNBIhn2leIeEpW2XEr9hydLg42eyY41mgaO5uyzFjML1S1/nmLyoc p2VKN5CLvYg6+Rj8oqFPdv/EbJtVv15bhFGVRyN5nSi3T6ngneeu7lRQnwxzeAUtL5Ql B95HbRURv3tzjsI3JgyD9MgxrgoWh23doJ470LyaxK4pc1ESC7hYZXaQTNiKuDhO6Km1 kdElMFfyFg0otLbV/IWr/RxIBGGGTfPuUO2RKO63wdw1GoK/p+iGSvXfTmxUwWETR82u witw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=3x7vKWVGPG/51eTtoZcPduugv1BgbLIzYjYze3tvtIA=; b=EgjNGpM39JFoleM3DApecsDnsVQTIPccn6g6UkVUiGJSeu2kykwc6LOTdyA78vaYz3 7mJNbJiXzuq5kbEDE5Wq+wPq4gw4lImH6DdIXlUcix0vPfTQb6v4JVgYGtoKRzLIfvu7 eDnYhhiT9MSRZtxvZ1mYg8AZfrfv0RqGi0Y2eoAFPVgJ/7Cx+OBzFNBEmLmKSV3Qs/j9 Egvtdd+ZfyjRIiV853UKDHekQa43ehvmk7UT+/NCOQOrtTLU6is1UvafjyH0c43DcCUQ Gz+bCEzK33t5A+r66kNxMXpeZP4CGBun3w945PY36mttFc6R1dzYtignSTncOudtxZjK 1bBA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k5si18578616pgr.69.2018.12.05.01.35.35; Wed, 05 Dec 2018 01:35:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727549AbeLEJdJ (ORCPT + 99 others); Wed, 5 Dec 2018 04:33:09 -0500 Received: from smtp.eu.citrix.com ([185.25.65.24]:9887 "EHLO SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726102AbeLEJdJ (ORCPT ); Wed, 5 Dec 2018 04:33:09 -0500 X-IronPort-AV: E=Sophos;i="5.56,317,1539648000"; d="scan'208";a="82813116" Date: Wed, 5 Dec 2018 10:32:23 +0100 From: Roger Pau =?utf-8?B?TW9ubsOp?= To: Chao Gao CC: , Boris Ostrovsky , Juergen Gross , Stefano Stabellini , Jia-Ju Bai , , Jan Beulich Subject: Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device Message-ID: <20181205093223.dncg4nq4dh6xmrhk@mac> References: <1543976357-1053-1-git-send-email-chao.gao@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1543976357-1053-1-git-send-email-chao.gao@intel.com> User-Agent: NeoMutt/20180716 X-ClientProxiedBy: AMSPEX02CAS02.citrite.net (10.69.22.113) To AMSPEX02CL02.citrite.net (10.69.22.126) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote: > I find some pass-thru devices don't work any more across guest reboot. > Assigning it to another guest also meets the same issue. And the only > way to make it work again is un-binding and binding it to pciback. > Someone reported this issue one year ago [1]. More detail also can be > found in [2]. > > The root-cause is Xen's internal MSI-X state isn't reset properly > during reboot or re-assignment. In the above case, Xen set maskall bit > to mask all MSI interrupts after it detected a potential security > issue. Even after device reset, Xen didn't reset its internal maskall > bit. As a result, maskall bit would be set again in next write to > MSI-X message control register. > > Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X > internal state of a device, we employ it to fix this issue rather than > introducing another dedicated sub-hypercall. > > Note that PHYSDEVOPS_release_msix() will fail if the mapping between > the device's msix and pirq has been created. This limitation prevents > us calling this function when detaching a device from a guest during > guest shutdown. Thus it is called right before calling > PHYSDEVOPS_prepare_msix(). s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the () at the end of the hypercall name since it's not a function. I'm also wondering why the release can't be done when the device is detached from the guest (or the guest has been shut down). This makes me worry about the raciness of the attach/detach procedure: if there's a state where pciback assumes the device has been detached from the guest, but there are still pirqs bound, an attempt to attach to another guest in such state will fail. > [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/ > msg02520.html > [2]: https://lists.xen.org/archives/html/xen-devel/2018-11/msg01616.html > > Signed-off-by: Chao Gao > --- > drivers/xen/xen-pciback/pci_stub.c | 49 ++++++++++++++++++++++++++++++++++++++ > drivers/xen/xen-pciback/pciback.h | 1 + > drivers/xen/xen-pciback/xenbus.c | 10 ++++++++ > 3 files changed, 60 insertions(+) > > diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c > index 59661db..f8623d0 100644 > --- a/drivers/xen/xen-pciback/pci_stub.c > +++ b/drivers/xen/xen-pciback/pci_stub.c > @@ -87,6 +87,55 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) > return psdev; > } > > +/* > + * Reset Xen internal MSI-X state by invoking PHYSDEVOP_{release, prepare}_msix. > + */ > +int pcistub_msix_reset(struct pci_dev *dev) > +{ > +#ifdef CONFIG_PCI_MSI > + if (dev->msix_cap) { > + struct physdev_pci_device ppdev = { > + .seg = pci_domain_nr(dev->bus), > + .bus = dev->bus->number, > + .devfn = dev->devfn > + }; > + int err; > + u16 val; > + > + /* > + * Do a write first to flush Xen's internal state to hardware > + * such that the following read can infer whether MSI-X maskall > + * bit is set by Xen. > + */ > + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &val); > + pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, val); > + > + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &val); > + if (!(val & PCI_MSIX_FLAGS_MASKALL)) > + return 0; I would just perform a reset regardless of the maskall value, which would also allow you to skip the read/write dance that you do above. ATM we are only concerned about the maskall bit, but there's no reason why prepare/release couldn't do more stuff in the future. > + > + pr_info("Reset MSI-X state for device %04x:%02x:%02x.%d\n", > + ppdev.seg, ppdev.bus, PCI_SLOT(ppdev.devfn), > + PCI_FUNC(ppdev.devfn)); > + > + err = HYPERVISOR_physdev_op(PHYSDEVOP_release_msix, &ppdev); > + if (err) { > + dev_warn(&dev->dev, "MSI-X release failed (%d)\n", > + err); This is a warn, while the message below is an err, any reason for the difference in log level? > + return err; > + } > + > + err = HYPERVISOR_physdev_op(PHYSDEVOP_prepare_msix, &ppdev); > + if (err) { > + dev_err(&dev->dev, "MSI-X preparation failed (%d)\n", > + err); > + return err; Thanks, Roger.