Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp10190748imu; Wed, 5 Dec 2018 18:16:39 -0800 (PST) X-Google-Smtp-Source: AFSGD/VBxpme1XlcIjK7cyb8FTkRGYOW+jROzMI/1kmmOpCa0jJTsMsaVaU1bmQe2/SeWuBAQpkJ X-Received: by 2002:a62:3888:: with SMTP id f130mr26169274pfa.132.1544062599205; Wed, 05 Dec 2018 18:16:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544062599; cv=none; d=google.com; s=arc-20160816; b=PbUOHhafG/whuloFw/aqH++ocpny3gRNUloBN5A0iQjhA8hiCOTHRKg04T/nluxhcy Fg9cQKdC7tq6dPmhRdeT6y4O8F9TF7pSjAq3gw9V67VxjAOeAI0r2nlf3kNdV/a/zGoF ORNjG9jMNPNJN742p6rNqpMrcGDzFXJDnQ5dqJvulb519aqYpPo7S/4sHrzYiGQmQJwx 5+uhngzYpehdkmaSdBFwYsmUxawW574AlxlDt62ZKMk5nVrRTSchmJrPWkrVgYxApOsx dssjioV3qCDc3q+1ByTq3XeTR4a3Lj7+sROEt4hXr3w8Q6PvYE/PgT4sD2v/zN+rJgF9 5LZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=1YBGGyX3XCoAcNQl+NRGyCaFBWZRHvOVEgtpkxZ68sI=; b=cnaWbR/1ZTLkPiEhb6P3DhCvoQTce1tc5NvHGqJcY6jUIgzgvTimHelSQDDHaVEJxG vs0uQw9qEeMDoyXIwitsqHgDXy7xHZEkJ9uzKRMfRHwZ7PfqXwIJ/lBb5OwOUkjQFY3O 2k+82+DwKLbFWMI4aNtPmypcvGsTD6mOKBAjwfonk2wx10I9TPtQDz7j5QHuzZVk2z21 3B4lh8jAz+8NTUZWSFL/sdBmUcuDPsm7VhJDpHXNoqnk0K8Uet5nRSM5Wv5Gkx5P1ZFj is2WoERQ8Q1AYkJxDSt2bLNe08zwQifBRl9GBREyVm4GOLGG9Yg9ixISew5+oBDz93KO Y3lg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t130si19577452pgb.521.2018.12.05.18.16.23; Wed, 05 Dec 2018 18:16:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728834AbeLFCOi (ORCPT + 99 others); Wed, 5 Dec 2018 21:14:38 -0500 Received: from mga01.intel.com ([192.55.52.88]:7048 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727712AbeLFCOi (ORCPT ); Wed, 5 Dec 2018 21:14:38 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Dec 2018 18:14:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,320,1539673200"; d="scan'208";a="98447451" Received: from gao-cwp.sh.intel.com (HELO gao-cwp) ([10.239.159.28]) by orsmga006.jf.intel.com with ESMTP; 05 Dec 2018 18:14:36 -0800 Date: Thu, 6 Dec 2018 10:18:17 +0800 From: Chao Gao To: Roger Pau =?iso-8859-1?Q?Monn=E9?= Cc: linux-kernel@vger.kernel.org, Boris Ostrovsky , Juergen Gross , Stefano Stabellini , Jia-Ju Bai , xen-devel@lists.xenproject.org, Jan Beulich Subject: Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device Message-ID: <20181206021815.GA8252@gao-cwp> References: <1543976357-1053-1-git-send-email-chao.gao@intel.com> <20181205093223.dncg4nq4dh6xmrhk@mac> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181205093223.dncg4nq4dh6xmrhk@mac> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 05, 2018 at 10:32:23AM +0100, Roger Pau Monn? wrote: >On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote: >> I find some pass-thru devices don't work any more across guest reboot. >> Assigning it to another guest also meets the same issue. And the only >> way to make it work again is un-binding and binding it to pciback. >> Someone reported this issue one year ago [1]. More detail also can be >> found in [2]. >> >> The root-cause is Xen's internal MSI-X state isn't reset properly >> during reboot or re-assignment. In the above case, Xen set maskall bit >> to mask all MSI interrupts after it detected a potential security >> issue. Even after device reset, Xen didn't reset its internal maskall >> bit. As a result, maskall bit would be set again in next write to >> MSI-X message control register. >> >> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X >> internal state of a device, we employ it to fix this issue rather than >> introducing another dedicated sub-hypercall. >> >> Note that PHYSDEVOPS_release_msix() will fail if the mapping between >> the device's msix and pirq has been created. This limitation prevents >> us calling this function when detaching a device from a guest during >> guest shutdown. Thus it is called right before calling >> PHYSDEVOPS_prepare_msix(). > >s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the >() at the end of the hypercall name since it's not a function. Will do. > >I'm also wondering why the release can't be done when the device is >detached from the guest (or the guest has been shut down). This makes >me worry about the raciness of the attach/detach procedure: if there's >a state where pciback assumes the device has been detached from the >guest, but there are still pirqs bound, an attempt to attach to >another guest in such state will fail. I think your concern is valid. This is the exact case in which Xen sets maskall flag. Qemu didn't do msix cleanup (it crashed or guest didn't request qemu to do this. Anyway, we couldn't rely on qemu), leaving pirqs bound. pciback doesn't manage pirqs so it doesn't know how to do cleanup for msix. Then pciback does device reset and disables memory decoding. After detaching a device from the domain, Xen can do further cleanup for a closing VM, including freeing pirqs of the VM, but saw memory decoding disabled. Hence, the maskall flag is set. > >> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/ >> msg02520.html >> [2]: https://lists.xen.org/archives/html/xen-devel/2018-11/msg01616.html >> >> Signed-off-by: Chao Gao >> --- >> drivers/xen/xen-pciback/pci_stub.c | 49 ++++++++++++++++++++++++++++++++++++++ >> drivers/xen/xen-pciback/pciback.h | 1 + >> drivers/xen/xen-pciback/xenbus.c | 10 ++++++++ >> 3 files changed, 60 insertions(+) >> >> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c >> index 59661db..f8623d0 100644 >> --- a/drivers/xen/xen-pciback/pci_stub.c >> +++ b/drivers/xen/xen-pciback/pci_stub.c >> @@ -87,6 +87,55 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) >> return psdev; >> } >> >> +/* >> + * Reset Xen internal MSI-X state by invoking PHYSDEVOP_{release, prepare}_msix. >> + */ >> +int pcistub_msix_reset(struct pci_dev *dev) >> +{ >> +#ifdef CONFIG_PCI_MSI >> + if (dev->msix_cap) { >> + struct physdev_pci_device ppdev = { >> + .seg = pci_domain_nr(dev->bus), >> + .bus = dev->bus->number, >> + .devfn = dev->devfn >> + }; >> + int err; >> + u16 val; >> + >> + /* >> + * Do a write first to flush Xen's internal state to hardware >> + * such that the following read can infer whether MSI-X maskall >> + * bit is set by Xen. >> + */ >> + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &val); >> + pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, val); >> + >> + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &val); >> + if (!(val & PCI_MSIX_FLAGS_MASKALL)) >> + return 0; > >I would just perform a reset regardless of the maskall value, which >would also allow you to skip the read/write dance that you do above. > >ATM we are only concerned about the maskall bit, but there's no reason >why prepare/release couldn't do more stuff in the future. > >> + >> + pr_info("Reset MSI-X state for device %04x:%02x:%02x.%d\n", >> + ppdev.seg, ppdev.bus, PCI_SLOT(ppdev.devfn), >> + PCI_FUNC(ppdev.devfn)); >> + >> + err = HYPERVISOR_physdev_op(PHYSDEVOP_release_msix, &ppdev); >> + if (err) { >> + dev_warn(&dev->dev, "MSI-X release failed (%d)\n", >> + err); > >This is a warn, while the message below is an err, any reason for >the difference in log level? > Will fix this. I just copy code snippte from the existing call site. and also make them distinguishable per Jan's suggestion. Thanks Chao