Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1529560pxk; Fri, 2 Oct 2020 11:51:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzaMfX266JaPo8D1yIIbuX6qgXFWDcZO53mHKJJkSY4pI6vH39hZgD45ojAjg9eRLltpicJ X-Received: by 2002:a17:906:78a:: with SMTP id l10mr3546307ejc.162.1601664691543; Fri, 02 Oct 2020 11:51:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601664691; cv=none; d=google.com; s=arc-20160816; b=Czv2wCqxGfZL9uXkHOQHXYiTFlvPip391BLtnTRbLKv5QaBx7Xot52eiW+lJEf/T63 9Wt7XIzPPbNIkldb7YNJ6NiOwkCCJM4KvrkaVI7lZN4u0FcZ1bUoz+P9Tv9Pk3IOaS+d ndBHSrUXYZiqVOuzSyoVnPuLLFl32i7ktC8HDJ05YOGJknKJzMKHmY6CBOyCnG798nfs EPe661Qqx3QNVmbPWMWpjy2hsrg0ETf1KpR2BVHpAK8KV8y75BTlGx7H3OKe23O60TDc 0Lthu3qJEEPBrVfY6ryNnVp3OwcYqTQ0U4/SqWWVJIirbJHA4ij8Lgi5TLyrNa35m58r wvdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=dmuUM23EPHsYPhkgKH73wmuIoV0cSfp+zz0aR4jDCFE=; b=CvglhVZiLAtChrQalzvHNlRSRc9FR6zZ6dYUllbVEc7/CyeAxaDJQZNoJtDsbfF/ky Y/8I9N/y6XIZOumDhNr18V3WeLiAD7RuAbmKVcG2KlM/8PZej8OkEZZLUPNGqs4/4OJk 6UHhlE36yP4Pdqvs9bR2L7Gf4NBKJsDVw1HM/rUeDk9WSwQwgXLOaeR61rItEJ6HHkMK HDcuZ4wTjzAzixGhxd1ywQl7ez7UNlZbe5C7AsVAK27qJLzVrWliaIeLiMW32AVx7NbY krICzzcAcnueCRXvqKU2p3J6AuX3MDW7Vwtj9zPvol9VSKnfMG5ahdz+ZXxp7ImjbHFK rJsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oregontracks.org header.s=fm1 header.b=XOV0rnFh; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=kMHVvR0P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g12si1664523edy.479.2020.10.02.11.51.08; Fri, 02 Oct 2020 11:51:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@oregontracks.org header.s=fm1 header.b=XOV0rnFh; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=kMHVvR0P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388463AbgJBSsZ (ORCPT + 99 others); Fri, 2 Oct 2020 14:48:25 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:37403 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388398AbgJBSsK (ORCPT ); Fri, 2 Oct 2020 14:48:10 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 2AF485C0154; Fri, 2 Oct 2020 14:48:08 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 02 Oct 2020 14:48:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= oregontracks.org; h=from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; s=fm1; bh=dmuUM23EPHsYPhkgKH73wmuIoV0cSfp+zz0aR4jDCFE=; b=XOV0r nFhNFbsbFdgg1c/bWkc1F+ONQhmvE2gVjXJDCUOsGTOg0WBmFJzJkNXTFD17esFS 3C7YcfdV+6tItk2bEEPDQKfp5Bfy3zBpEc4KNu9kZ57yWqlk1CoeylXDbB9clwFa Bn/khgTUGRA8g+JwvxL9Jc6yWB4MnMpGY7B81KrKlBTUiXf82hmCKiavRya6ZJfx T3siILKPxX0HaEs081fRpWLHmxyl3k6NFu4sU4mzLVFXsKemV1BdgcUCZvywHBWc ZC8rVCYUTR9IVnLzomNBMBTVb7N7M2v2EEXh6Ep5ZpUnQ/7J8OB3Xvl+UkYQXu+N S0TWWSIp3ruZiS8hg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=dmuUM23EPHsYPhkgKH73wmuIoV0cSfp+zz0aR4jDCFE=; b=kMHVvR0P uFIXa5ZuOMi/mZI0MIJ2KRus48msDxPxsEI6uMbVJ+TilYPOZg6877ynClBJPnZI TID/cVkdBfqOLy2gUVNTjsA/KGiOW3UOmV8j+3lEkzO2jHNA6ODfUFujZ/2Vd11+ Mp0nuxufeqzueSomdnvi7ys6ivJ32LPHqLJL7mAG8rYQKRrzZyKdVDSI2JAv1gnj HEpSxH4l6hGlmKLU35sqrxsAHeoycUIQuM5yZ0M6HZgCSruiDSMcrfX21JzTyoup rK732vnSZT05jbcCxARgDlQ76DV4CMPdftchJ5qPr795hzL5ef6aArYD5F+Ry/0h KXD0ZprVAcIPFg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrfeeigdduvdekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepufgvrghnucggucfmvghllhgvhicuoehsvggrnhhvkhdruggv vhesohhrvghgohhnthhrrggtkhhsrdhorhhgqeenucggtffrrghtthgvrhhnpeehkeffte eiudeiffelkeelvefftdelhfeitdeigeffleeufedvgfegvdefvedtteenucfkphepvdeg rddvtddrudegkedrgeelnecuvehluhhsthgvrhfuihiivgepieenucfrrghrrghmpehmrg hilhhfrhhomhepshgvrghnvhhkrdguvghvsehorhgvghhonhhtrhgrtghkshdrohhrgh X-ME-Proxy: Received: from arch-ashland-svkelley.hsd1.or.comcast.net (c-24-20-148-49.hsd1.or.comcast.net [24.20.148.49]) by mail.messagingengine.com (Postfix) with ESMTPA id 32E59306468B; Fri, 2 Oct 2020 14:48:06 -0400 (EDT) From: Sean V Kelley To: bhelgaas@google.com, Jonathan.Cameron@huawei.com, rafael.j.wysocki@intel.com, ashok.raj@intel.com, tony.luck@intel.com, sathyanarayanan.kuppuswamy@intel.com, qiuxu.zhuo@intel.com Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Sean V Kelley Subject: [PATCH v8 08/14] PCI/AER: Extend AER error handling to RCECs Date: Fri, 2 Oct 2020 11:47:29 -0700 Message-Id: <20201002184735.1229220-9-seanvk.dev@oregontracks.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201002184735.1229220-1-seanvk.dev@oregontracks.org> References: <20201002184735.1229220-1-seanvk.dev@oregontracks.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jonathan Cameron Currently the kernel does not handle AER errors for Root Complex integrated End Points (RCiEPs)[0]. These devices sit on a root bus within the Root Complex (RC). AER handling is performed by a Root Complex Event Collector (RCEC) [1] which is a effectively a type of RCiEP on the same root bus. For an RCEC (technically not a Bridge), error messages "received" from associated RCiEPs must be enabled for "transmission" in order to cause a System Error via the Root Control register or (when the Advanced Error Reporting Capability is present) reporting via the Root Error Command register and logging in the Root Error Status register and Error Source Identification register. In addition to the defined OS level handling of the reset flow for the associated RCiEPs of an RCEC, it is possible to also have non-native handling. In that case there is no need to take any actions on the RCEC because the firmware is responsible for them. This is true where APEI [2] is used to report the AER errors via a GHES[v2] HEST entry [3] and relevant AER CPER record [4] and non-native handling is in use. We effectively end up with two different types of discovery for purposes of handling AER errors: 1) Normal bus walk - we pass the downstream port above a bus to which the device is attached and it walks everything below that point. 2) An RCiEP with no visible association with an RCEC as there is no need to walk devices. In that case, the flow is to just call the callbacks for the actual device, which in turn references its associated RCEC. Modify pci_walk_bridge() to handle devices which lack a subordinate bus. If the device does not then it will call the function on that device alone. [0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex Integrated Endpoint Rules. [1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling and Logging [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI) [3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section Signed-off-by: Jonathan Cameron Signed-off-by: Sean V Kelley --- drivers/pci/pcie/err.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 5ff1afa4763d..c4ceca42a3bf 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -148,19 +148,25 @@ static int report_resume(struct pci_dev *dev, void *data) /** * pci_walk_bridge - walk bridges potentially AER affected - * @bridge bridge which may be a Port. + * @bridge bridge which may be an RCEC with associated RCiEPs, + * an RCiEP associated with an RCEC, or a Port. * @cb callback to be called for each device found * @userdata arbitrary pointer to be passed to callback. * * If the device provided is a bridge, walk the subordinate bus, * including any bridged devices on buses under this bus. * Call the provided callback on each device found. + * + * If the device provided has no subordinate bus, call the provided + * callback on the device itself. */ static void pci_walk_bridge(struct pci_dev *bridge, int (*cb)(struct pci_dev *, void *), void *userdata) { if (bridge->subordinate) pci_walk_bus(bridge->subordinate, cb, userdata); + else + cb(bridge, userdata); } pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, @@ -174,11 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, /* * Error recovery runs on all subordinates of the first downstream * bridge. If the downstream bridge detected the error, it is - * cleared at the end. + * cleared at the end. For RCiEPs we should reset just the RCiEP itself. */ type = pci_pcie_type(dev); if (type == PCI_EXP_TYPE_ROOT_PORT || - type == PCI_EXP_TYPE_DOWNSTREAM) + type == PCI_EXP_TYPE_DOWNSTREAM || + type == PCI_EXP_TYPE_RC_EC || + type == PCI_EXP_TYPE_RC_END) bridge = dev; else bridge = pci_upstream_bridge(dev); @@ -186,7 +194,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bridge(bridge, report_frozen_detected, &status); - status = reset_subordinate_device(bridge); + if (type == PCI_EXP_TYPE_RC_END) { + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); + status = PCI_ERS_RESULT_NONE; + goto failed; + } + + status = reset_subordinate_devices(bridge); if (status != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "subordinate device reset failed\n"); goto failed; @@ -219,7 +233,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_walk_bridge(bridge, report_resume, &status); if (type == PCI_EXP_TYPE_ROOT_PORT || - type == PCI_EXP_TYPE_DOWNSTREAM) { + type == PCI_EXP_TYPE_DOWNSTREAM || + type == PCI_EXP_TYPE_RC_EC) { if (pcie_aer_is_native(bridge)) pcie_clear_device_status(bridge); pci_aer_clear_nonfatal_status(bridge); -- 2.28.0