Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp760414rwi; Wed, 2 Nov 2022 18:24:42 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5ev0RIixhCAQjyriBvG6eRniFGJlN89erI6OFjQ/eAVgRi+2GMyhUu5XhVR3GjoxeBMTjK X-Received: by 2002:a50:d0c4:0:b0:464:14ed:834 with SMTP id g4-20020a50d0c4000000b0046414ed0834mr4371830edf.174.1667438682544; Wed, 02 Nov 2022 18:24:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667438682; cv=none; d=google.com; s=arc-20160816; b=hmyG77jBuMYiKI3ZFxW5BkNYgnayy3SJWoREof/5CWe1iz/eiNej3fqoO0INhRgGA+ jHjoQM+3p2Jc0mbuTza8blHoZxxvJPDDXqKZGzENMy+Yf66351kCHDJMHMdBExEMkX5H l0dv9qCyF3GPvSBI47TliZNwok3h/cfDmrpU9QZz3fBZ0tyJOBjftpK9oEn7GpGhCuhc plBlpxozBkXIcIeYKSFDw0JX7kiirEkFzs5xQ0JvmSPrzcvE9tbMAOobqe00KlP9Qepf 4tgSwtlTk+3rV9bkotKmBO/YabaLehGpE6JBhXo+n4AevHNeNZ00y1G/e94JACMjgZ78 tFlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=Qv04H8KMNFJyXoEClqVP9lr7K0qhcO8lJ0zcfw4m76I=; b=r9XH0A6NlFw0N7Ln7V0o+LvyTglOeR1uwje+6iDOE8C8CPz2SGkYqFrjc5/JHM6KEi RnWJbPkA9YnBpaOO77mDGeaIqtfQcPn95UxpU/whoojDUzGxqHkimkunqEe4sOcWkbSx C/ucy0Jake2H/ZnybKY8b5Q99nUWd2XxFdaL9STiNaWz073Tnc2drqu5r3zTqyMmFZmv e/3LKD17lPvNha86bPZ9SfLeb/GIfgcMb7pJWU9ehsy0Cw/xDrcvHF3vp8pYZj82GNGy 82YoBEXU5GGJSnN8mLoSMAlH+Mlu0M8RKYCETpcUZAgVx/C7qS6aSZ1n6RSvMXnbcEAL yHPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Yb/0r5Fv"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j27-20020a170906279b00b00715867834e3si12481209ejc.506.2022.11.02.18.24.19; Wed, 02 Nov 2022 18:24:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Yb/0r5Fv"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231475AbiKBX1c (ORCPT + 97 others); Wed, 2 Nov 2022 19:27:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231664AbiKBX0x (ORCPT ); Wed, 2 Nov 2022 19:26:53 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CE8513F0B; Wed, 2 Nov 2022 16:21:07 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6951CB82537; Wed, 2 Nov 2022 23:21:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D4ECFC433D7; Wed, 2 Nov 2022 23:21:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667431265; bh=0K5Lst1NZVSbMpTrV9SZig3D7HzFybj8iwHGE8ITCh4=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=Yb/0r5FvolMYkO5d15jwm2hwsUmmHLjey8LwpzK3iu/zAar5Y40qzaF20KjIJAvc4 M3iqe6wS1FeZX8uuIz1QyPCMEyIZ4gz4AbFqhEV5wMrHR1n9LZlaGDLRGd7MEXvsW1 V813CzTOSuY4f5JcXEusuTCGDlP7nNfiOE855+//6eOs2Naa/q41pyIJS5vZH8JLLX gTpQ1+KnAKxVxVMwauccjzB5W3/YqqvIW/6l4dmnVZc7otTDP48kMg2gHh9RX2ohR2 e+hG8O2xRMcSo9YGxGybuWIXp8uRvHOz4Cz+HjsVxg9oStE3AlgA+OmFWvi0qWyc2d rzH/z9jWx4vxA== Date: Wed, 2 Nov 2022 18:21:03 -0500 From: Bjorn Helgaas To: Smita Koralahalli Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Bjorn Helgaas , oohall@gmail.com, Mahesh J Salgaonkar , Lukas Wunner , Kuppuswamy Sathyanarayanan , Yazen Ghannam , Fontenot Nathan Subject: Re: [PATCH 1/2] PCI: pciehp: Add support for OS-First Hotplug and AER/DPC Message-ID: <20221102232103.GA7126@bhelgaas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221101000719.36828-2-Smita.KoralahalliChannabasappa@amd.com> X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 01, 2022 at 12:07:18AM +0000, Smita Koralahalli wrote: > Current systems support Firmware-First model for hot-plug. In this model, I'm familiar with "firmware first" in the context of ACPI APEI. Is there more "firmware first" language in the spec related to hotplug? Or is this just the ACPI hotplug implemented by acpiphp? Or is there something in the PCIe spec that talks about some firmware interfaces needed in pciehp? If so, please cite the specific sections. I see you cite PCIe r6.0, sec 6.7.6, below, but I don't see the firmware mention there. > firmware holds the responsibilty for executing the HW sequencing actions on > an async or surprise add and removal events. Additionally, according to > Section 6.7.6 of PCIe Base Specification [1], firmware must also handle > the side-effects (DPC/AER events) reported on an async removal and is > abstract to the OS. > > This model however, poses issues while rolling out updates or fixing bugs > as the servers need to be brought down for firmware updates. Hence, > introduce support for OS-First hot-plug and AER/DPC. Here, OS is > responsible for handling async add and remove along with handling of > AER/DPC events which are generated as a side-effect of async remove. > > The implementation is as follows: On an async remove a DPC is triggered as > a side-effect along with an MSI to the OS. Determine it's an async remove > by checking for DPC Trigger Status in DPC Status Register and Surprise > Down Error Status in AER Uncorrected Error Status to be non-zero. If true, > treat the DPC event as a side-effect of async remove, clear the error > status registers and continue with hot-plug tear down routines. If not, > follow the existing routine to handle AER/DPC errors. > > Dmesg before: > > pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000 > pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected > pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID) > pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000 > pcieport 0000:00:01.4: [ 5] SDES (First) > nvme nvme2: frozen state error detected, reset controller > pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec > pcieport 0000:00:01.4: AER: subordinate device reset failed > pcieport 0000:00:01.4: AER: device recovery failed > pcieport 0000:00:01.4: pciehp: Slot(16): Link Down > nvme2n1: detected capacity change from 1953525168 to 0 > pci 0000:04:00.0: Removing from iommu group 49 > > Dmesg after: > > pcieport 0000:00:01.4: pciehp: Slot(16): Link Down > nvme1n1: detected capacity change from 1953525168 to 0 > pci 0000:04:00.0: Removing from iommu group 37 > pcieport 0000:00:01.4: pciehp: Slot(16): Card present > pci 0000:04:00.0: [8086:0a54] type 00 class 0x010802 > pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit] > pci 0000:04:00.0: Max Payload Size set to 512 (was 128, max 512) > pci 0000:04:00.0: enabling Extended Tags > pci 0000:04:00.0: Adding to iommu group 37 > pci 0000:04:00.0: BAR 0: assigned [mem 0xf2400000-0xf2403fff 64bit] > pcieport 0000:00:01.4: PCI bridge to [bus 04] > pcieport 0000:00:01.4: bridge window [io 0x1000-0x1fff] > pcieport 0000:00:01.4: bridge window [mem 0xf2400000-0xf24fffff] > pcieport 0000:00:01.4: bridge window [mem 0x20080800000-0x200809fffff 64bit pref] > nvme nvme1: pci function 0000:04:00.0 > nvme 0000:04:00.0: enabling device (0000 -> 0002) > nvme nvme1: 128/0/0 default/read/poll queues Remove any lines that are not specifically relevant, e.g., I'm not sure whether the BARs, iommu, MPS, extended tags info is essential. Please indent the quoted material two spaces so it doesn't look like the narrative text. Thanks for working on this! Bjorn