Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp603523pxb; Wed, 27 Jan 2021 16:27:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJyqceHjMR2ksh6nX8UInrNd4s3Bq20HThjHA4eFw8Px0exgsr7gxaMzwi3gmilndKUvCuGc X-Received: by 2002:a17:907:9810:: with SMTP id ji16mr8443215ejc.394.1611793647394; Wed, 27 Jan 2021 16:27:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611793647; cv=none; d=google.com; s=arc-20160816; b=hGahFmzVfJq2bwyfEtvXZAh+mUb7i/GQJmxpzLDjG1TMutWGqYF4feZcTB761DVaZY LIDFRAkpWFe7/aRKI2R9dfGWQPC4rCsKHp+bVgEl80PEmUHuW6xI2VB2QYFsRQ1Lk7Mq oCjiwYbjMTj9V9T9U1J4BQTWCo2wlCXrFmadwbXXLbdq9i6vAjsBigaMg7cqAXX4BfqD Fg2YavC0krarwl15Q9qapX+8h8XIxiJYoDoxXkwc9c6TSu7FLYNgaHGDvfjpsinEG/Vm CtuEBq/v3VhS8JFJHgdgJbCGhCR9UvJiZ8envh4VM1KkWZlNtj5jBQheSzrRUrvT5MzD 9Y8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=6U4leARKRO+Jh+hqXrbElm5XJ7njvHRL40JBTz8kLs4=; b=uK+BK3J/7/AkL9D9rZYHwJl4FlJxW2zsp2Yo9nyLseqffOVN8Cng90nC6g0NTt3GY9 zLKj3UmOCSd1DA1OYmsyvwW6J4/SoGPtl6n9fF+h5WRVikfeCz8dbzSXqy/Y0wQQokxQ 1Ud3yiz/fpp15speaw+FHgYxAyF52yqA6F1RjZyqCEIYebYGDo4KBwXv2YgybYewJ5ll F7fz8moZ3E8DUtYyHBmv7HE1B+eTAL2649BO7QNCwDPhfLorn8DxdLtor8dljR22QO2N zMY929Zl1IcgTtXNFDqgYSTz+2WnGQAI9H8+C8j4b+9rk8QyPyxkMCWMFnDy0OxdnT+k ewww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=V6ucyWyV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id kb2si1459989ejc.657.2021.01.27.16.27.02; Wed, 27 Jan 2021 16:27:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=V6ucyWyV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233468AbhA0Uvj (ORCPT + 99 others); Wed, 27 Jan 2021 15:51:39 -0500 Received: from mail.kernel.org ([198.145.29.99]:48650 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233256AbhA0Uvi (ORCPT ); Wed, 27 Jan 2021 15:51:38 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id E854264D9F; Wed, 27 Jan 2021 20:50:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611780657; bh=PG1IVVNWLnFqxSk+/vBnwpgrgPzUioifx7UoAF6paEM=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=V6ucyWyVzhwo9MAVgd72z/OiDy+6W7Eb8HOigwQfsd/393QNAIAAkOCXSdOVYko7/ tOeDvNRqS8mngQv6eXjLYDW3wiRvtdwGc2cCIPDawIjeUQZgStSP5d/oP5llxKBJBg wCVo2QPaC59YuuxolKClPPp3gacA1VMuNeFHjeKllA4mZyUV1V20NipHu8RLuflVTl rlihkD/NwuDrAvCG3k5bQ4gJc0ko2TnQ7EDUtUlQYqkxQSwKyjU5QqEDrszynVJZo3 G2EyH9ZcgW1Ow8vHd9aA/lZwYcTyrQfYRgsNrsYoQH87bp5qp/AU8k649wDA6OX7uY GH4mT9MLsSSuw== Date: Wed, 27 Jan 2021 14:50:53 -0600 From: Bjorn Helgaas To: Kai-Heng Feng Cc: bhelgaas@google.com, Russell Currey , Oliver O'Halloran , Mika Westerberg , Lalithambika Krishnakumar , Lu Baolu , Joerg Roedel , "open list:PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC" , "open list:PCI SUBSYSTEM" , open list Subject: Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend Message-ID: <20210127205053.GA3049358@bjorn-Precision-5520> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210127173101.446940-1-kai.heng.feng@canonical.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 28, 2021 at 01:31:00AM +0800, Kai-Heng Feng wrote: > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > hint") enables ACS, and some platforms lose its NVMe after resume from > firmware: > [ 50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000 > [ 50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected > [ 50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) > [ 50.947830] pcieport 0000:00:1b.0: device [8086:06ac] error status/mask=00200000/00010000 > [ 50.947831] pcieport 0000:00:1b.0: [21] ACSViol (First) > [ 50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message > [ 50.947843] nvme nvme0: frozen state error detected, reset controller > > It happens right after ACS gets enabled during resume. > > To prevent that from happening, disable AER interrupt and enable it on > system suspend and resume, respectively. Lots of questions here. Maybe this is what we'll end up doing, but I am curious about why the error is reported in the first place. Is this a consequence of the link going down and back up? Is it consequence of the device doing a DMA when it shouldn't? Are we doing something in the wrong order during suspend? Or maybe resume, since I assume the error is reported during resume? If we *do* take the error, why doesn't DPC recovery work? > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") > Signed-off-by: Kai-Heng Feng > --- > drivers/pci/pcie/aer.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 77b0f2c45bc0..0e9a85530ae6 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1365,6 +1365,22 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + > + aer_disable_rootport(rpc); > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + > + aer_enable_rootport(rpc); > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1437,6 +1453,8 @@ static struct pcie_port_service_driver aerdriver = { > .service = PCIE_PORT_SERVICE_AER, > > .probe = aer_probe, > + .suspend = aer_suspend, > + .resume = aer_resume, > .remove = aer_remove, > }; > > -- > 2.29.2 >