Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp8989104rwr; Thu, 11 May 2023 08:35:26 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5irtUu7Q/lSXMLXoKb6mc5gieKOhwkqTQE+LAOKG12DsheKvL5Onx+0k/dIPyEF7Yf9WRM X-Received: by 2002:a17:90a:82:b0:252:8f70:c807 with SMTP id a2-20020a17090a008200b002528f70c807mr2464383pja.12.1683819326364; Thu, 11 May 2023 08:35:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683819326; cv=none; d=google.com; s=arc-20160816; b=LjaQJ1JPx+l2Ka8xYQYyuctx0lYMAuQyC09zchYDmzjSkUYhCa6uEeiomgVJvV0S2u eMIa0fvwbM8mtCOprF60esdmqiev5x35OKu3q0HDqMQGZS38V5TaOLsjq6iril9JbTKQ j4EYlQQh7azb6z9bjZ2Yq5U6N0dZVsxsHJDgHP84fPrxUuwl4t/+c3qmSpNPp8Lv6QnT nIhb5sP0qnysaXuMG1AkDIlQF4X90GgeB1jwdkigvI1q6YQirEJWwvyICyd9avK3J9Zu 6IV/HEX4ADMsgjkTRgS1QaATSqnk+nC8WUekS9I84OyQYAVd1ohKcmOwX7bj87XCMh1Z d2pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=u4o8ORXDgwRX3+XKXWluThkhig8czj7KZFfGRnX9DS4=; b=WroLe4O01KG4suugjD48Fqdu30y8JfXHC7WbsxndFHfXujqN2P6YTTRxK35A//4p2a QVWK1+Gx7Tha/9gSsJhdMJrSdjlaTswndF4FzrsGJvvhF6q1wodH2aBrXXfgVMLboVma Az8LTGJXb2n82IyiArlz9bxLrOt91zZDnEviRT6pyYhMtOpNtYop/KmeW2izTvDMyjKI er0jEQGVejKxQKIN16MsXOIMBWSXE/l3KaYnmzn1jxvdsPsUSWZ15bXcXl4kO9eq0h/K uyRsOdw3YjB5aumcwXYFwGe7Q4Am/l0F0A1kYdS2mJ4yfrDZxg6wHM5h58kF0AmaLZb/ hbiQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k2-20020a17090a590200b002528f40700dsi1692977pji.153.2023.05.11.08.35.13; Thu, 11 May 2023 08:35:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238623AbjEKPXb (ORCPT + 99 others); Thu, 11 May 2023 11:23:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238339AbjEKPX3 (ORCPT ); Thu, 11 May 2023 11:23:29 -0400 X-Greylist: delayed 68629 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 11 May 2023 08:23:28 PDT Received: from bmailout1.hostsharing.net (bmailout1.hostsharing.net [83.223.95.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99F6ADC for ; Thu, 11 May 2023 08:23:28 -0700 (PDT) Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "*.hostsharing.net", Issuer "RapidSSL Global TLS RSA4096 SHA256 2022 CA1" (verified OK)) by bmailout1.hostsharing.net (Postfix) with ESMTPS id 0AD0F30008206; Thu, 11 May 2023 17:23:27 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id F26312560F6; Thu, 11 May 2023 17:23:26 +0200 (CEST) Date: Thu, 11 May 2023 17:23:26 +0200 From: Lukas Wunner To: Smita Koralahalli Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Bjorn Helgaas , oohall@gmail.com, Mahesh J Salgaonkar , Kuppuswamy Sathyanarayanan , Yazen Ghannam , Fontenot Nathan Subject: Re: [PATCH 1/2] PCI: pciehp: Add support for OS-First Hotplug and AER/DPC Message-ID: <20230511152326.GA16215@wunner.de> References: <20221101000719.36828-1-Smita.KoralahalliChannabasappa@amd.com> <20221101000719.36828-2-Smita.KoralahalliChannabasappa@amd.com> <20221104101536.GA11363@wunner.de> <20230510201937.GA11550@wunner.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230510201937.GA11550@wunner.de> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 10, 2023 at 10:19:37PM +0200, Lukas Wunner wrote: > Below please find a patch which > sets the Surprise Down Error mask bit. Could you test if this fixes > the issue for you? Sorry, I failed to appreciate that pcie_capability_set_dword() can't be used to RMW the AER capability. Replacement patch below. -- >8 -- From: Lukas Wunner Subject: [PATCH] PCI: pciehp: Disable Surprise Down Error reporting On hotplug ports capable of surprise removal, Surprise Down Errors are expected and no reason for AER or DPC to spring into action. Although a Surprise Down event might be caused by an error, software cannot discern that from regular surprise removal. Any well-behaved BIOS should mask such errors, but Smita reports a case where hot-removing an Intel NVMe SSD [8086:0a54] from an AMD Root Port [1022:14ab] results in irritating AER log messages and a delay of more than 1 second caused by DPC handling: pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000 pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID) pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000 pcieport 0000:00:01.4: [ 5] SDES (First) nvme nvme2: frozen state error detected, reset controller pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec pcieport 0000:00:01.4: AER: subordinate device reset failed pcieport 0000:00:01.4: AER: device recovery failed pcieport 0000:00:01.4: pciehp: Slot(16): Link Down nvme2n1: detected capacity change from 1953525168 to 0 pci 0000:04:00.0: Removing from iommu group 49 Avoid by masking Surprise Down Errors on hotplug ports capable of surprise removal. Mask them even if AER or DPC is handled by firmware because if hotplug control was granted to the operating system, it owns hotplug and thus Surprise Down events. So firmware has no business reporting or reacting to them. Reported-by: Smita Koralahalli Link: https://lore.kernel.org/all/20221101000719.36828-2-Smita.KoralahalliChannabasappa@amd.com/ Signed-off-by: Lukas Wunner --- drivers/pci/hotplug/pciehp_hpc.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index f8c70115b691..40a721f3b713 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -984,8 +984,9 @@ static inline int pcie_hotplug_depth(struct pci_dev *dev) struct controller *pcie_init(struct pcie_device *dev) { struct controller *ctrl; - u32 slot_cap, slot_cap2, link_cap; + u32 slot_cap, slot_cap2, link_cap, aer_cap; u8 poweron; + u16 aer; struct pci_dev *pdev = dev->port; struct pci_bus *subordinate = pdev->subordinate; @@ -1030,6 +1031,17 @@ struct controller *pcie_init(struct pcie_device *dev) if (dmi_first_match(inband_presence_disabled_dmi_table)) ctrl->inband_presence_disabled = 1; + /* + * Surprise Down Errors are par for the course on Hot-Plug Surprise + * capable ports, so disable reporting in case BIOS left it enabled. + */ + aer = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); + if (aer && slot_cap & PCI_EXP_SLTCAP_HPS) { + pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_MASK, &aer_cap); + aer_cap |= PCI_ERR_UNC_SURPDN; + pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_MASK, aer_cap); + } + /* Check if Data Link Layer Link Active Reporting is implemented */ pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap); -- 2.39.2