Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp2446270rwl; Mon, 26 Dec 2022 15:20:42 -0800 (PST) X-Google-Smtp-Source: AMrXdXvzEfD+aeghlSO6cI5g1le4yvL9LrL6xEbczgCLMM09V36dY4+WszdQECuw05DAuR694koi X-Received: by 2002:a05:6402:f2a:b0:47f:de63:8465 with SMTP id i42-20020a0564020f2a00b0047fde638465mr11782104eda.26.1672096842572; Mon, 26 Dec 2022 15:20:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672096842; cv=none; d=google.com; s=arc-20160816; b=YfSpWGvU4vfjjw2sB+8g0LIEDwlnjV/+rxAnA7HeYBUnhGWyQt+brMwEyd27UPEvas ZYwjfOFBl5QqfzOwZMEIvLqicQKRXpY80RG5/TbjdMZz6BjzyDg0anWV4eu3t0X/oZ7X 3vf7AH09hEXKU7bD2KTZTaFXZL+bqP9QrKH+mrJrHc9pUw/EBGKmaYweVHhazCex/XOE 035B+FWRoOm58pnielkafF9pu3vMGt+LT4kNuL6KDsZsi4wECXqCLVp8FLvkelzEsanr YL5UYGIzhK2u5Gy21TmHxkb8NHrRm4NK/I5CW9hpxsvZ7Ytjb8bJvj6aWdkvihTROz4j 1swQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=K99nCNsH6U0ZBlYwpUKWvxMH4xEd79XQr7BzGnC6c34=; b=E2Ta39UaCsY3PJgHIMxAp2t6rsNLkY6c6dZVdUsgvp5t49xXtjy63bIJTgUm53QuTS sOQEW5EccWJZ3gzWpzv6XeavqmKKrrphLuTXhMawdlyjZ530n81V0JV+pmP9PGeGC5GH uw7FZGCAQRlKZlK+Jzs8TTJGfL93lLRECP5i64PmJ1c8JQeFsdUHw+M46BV/aEiOPe4Z 9cw/sSqeHCIMT3S2thHdfeTWMMJYwlLbGyqTMvYoTS82nbyZp7G3CuG/yUWjSzZiiMsy WWpm5UJoh3Ycget3mmiQdO/38UYFSEZAAdGJ60OVJNeJF26vmRvVvigz8ts+N+5yuoIj EI6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Cv9admjK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ew13-20020a056402538d00b004815f3b32a9si7927302edb.594.2022.12.26.15.20.25; Mon, 26 Dec 2022 15:20:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Cv9admjK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232386AbiLZWvZ (ORCPT + 67 others); Mon, 26 Dec 2022 17:51:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232147AbiLZWuv (ORCPT ); Mon, 26 Dec 2022 17:50:51 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79EBE2657; Mon, 26 Dec 2022 14:50:50 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 15F8DB80DB4; Mon, 26 Dec 2022 22:50:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 70B18C433D2; Mon, 26 Dec 2022 22:50:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672095047; bh=tkSpVVjv/nV+XYJhqA1quZ54G/qu34LwdP3dgR/CNa0=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=Cv9admjKnRTsrrXUqVkooE5cwmrOTtgKmj2VwQxoDDP7eeDvT9o5Dgy/6mOo/qdWp Z3vEvPT8Y2ElP5INS41StaKtU11U3Xf+FgKBV3j4oicXNVsqGQueuD2vDXXFb0V4TR hv316bXcRdSnmrAI2AuMQjg0894qB2SV1U9OWF5ceZGGTuAiE7DP+He0JIBJUMFik9 yyHAlIDd5Vwzf5Ohv3UDQ2nU3OymMTy8FVgsQje3ifxNwdZZyfpLUccy8dHRhwFGjh Qt5QSQCh0kjfU7EphWjHC4cehN3DfARSaqs/A5swYsSbFtZTMjP/iPdL2qU/yw92Ae DsR4UDDD+pVtg== Date: Mon, 26 Dec 2022 16:50:45 -0600 From: Bjorn Helgaas To: Kai-Heng Feng Cc: bhelgaas@google.com, Mario Limonciello , Mika Westerberg , Keith Busch , Kuppuswamy Sathyanarayanan , Pali =?iso-8859-1?Q?Roh=E1r?= , Stefan Roese , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, "David E. Box" Subject: Re: [PATCH] PCI/portdrv: Avoid enabling AER on Thunderbolt devices Message-ID: <20221226225045.GA400369@bhelgaas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221226153048.1208359-1-kai.heng.feng@canonical.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+cc David] Hi Kai-Heng, Thanks for the report and the debugging! On Mon, Dec 26, 2022 at 11:30:31PM +0800, Kai-Heng Feng wrote: > We are seeing igc ethernet device on Thunderbolt dock stops working > after S3 resume because of AER error, or even make S3 resume freeze: > pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:00:1d.0 > pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID) > pcieport 0000:00:1d.0: device [8086:7ab0] error status/mask=00008000/00002000 > pcieport 0000:00:1d.0: [15] HeaderOF > pcieport 0000:00:1d.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:1d.0 > pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > pcieport 0000:00:1d.0: device [8086:7ab0] error status/mask=00100000/00004000 > pcieport 0000:00:1d.0: [20] UnsupReq (First) > pcieport 0000:00:1d.0: AER: TLP Header: 34000000 0a000052 00000000 00000000 From a very quick look, I think 34...... ......52 is a PTM message (as you suggest below). > pcieport 0000:00:1d.0: AER: Error of this Agent is reported first > pcieport 0000:04:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > pcieport 0000:04:01.0: device [8086:1136] error status/mask=00300000/00000000 > pcieport 0000:04:01.0: [20] UnsupReq (First) > pcieport 0000:04:01.0: [21] ACSViol > pcieport 0000:04:01.0: AER: TLP Header: 34000000 04000052 00000000 00000000 > thunderbolt 0000:05:00.0: AER: can't recover (no error_detected callback) > > This supposedly should be fixed by commit c01163dbd1b8 ("PCI/PM: Always disable > PTM for all devices during suspend"), but somehow it doesn't work for > this case. > > By dumping the PCI_PTM_CTRL register on resume, it turns out PTM is > already flipped on by either the Thunderbolt dock firmware or the host > BIOS. Writing 0 to PCI_PTM_CTRL yields the same result. Can you share your debug patch and corresponding dmesg log in the bugzilla? > Windows is however not affected by this issue, by using WinDbg's !pci > command, it shows that AER is not enabled for devices connected via > Thunderbolt port, and that's the reason why Windows doesn't exhibit the > issue. > > So turn a blind eye on external Thunderbolt devices like Windows does by > disabling AER. Unless there's something in the PCIe or Thunderbolt spec that says AER shouldn't be used on external devices, I think we need to figure out the root cause before disabling AER on all removable devices. The dmesg in the bugzilla below is from an HP ZBook Fury 16. Do you see this on any other platforms? Do you have any HP BIOS contacts to ask about this? It seems like a firmware defect to enable PTM without knowing whether upstream devices have PTM enabled. We could leave PTM enabled on upstream devices when suspending, but that apparently prevents some low-power states. Adding David since he worked on that. > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216850 > Cc: Mario Limonciello > Cc: Mika Westerberg > Signed-off-by: Kai-Heng Feng > --- > drivers/pci/pcie/portdrv.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 2cc2e60bcb396..59d00e20e57bf 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -237,7 +237,8 @@ static int get_port_device_capability(struct pci_dev *dev) > if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || > pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) && > dev->aer_cap && pci_aer_available() && > - (pcie_ports_native || host->native_aer)) > + (pcie_ports_native || host->native_aer) && > + !dev_is_removable(&dev->dev)) > services |= PCIE_PORT_SERVICE_AER; > #endif > > -- > 2.34.1 >