Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp108678rwr; Thu, 4 May 2023 15:43:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4GCu+GSuYU5h2DQohYTalCH0z4yKW/TJOT4aBE7CrDB9Avxi1manXg1HCv17/FZ8VpbKiI X-Received: by 2002:a05:6a20:728c:b0:ee:9647:45fa with SMTP id o12-20020a056a20728c00b000ee964745famr3878410pzk.20.1683240223191; Thu, 04 May 2023 15:43:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683240223; cv=none; d=google.com; s=arc-20160816; b=OAVehvwkmBwY366thyoVuqukHdajjptNHKFilgndaOjTla6Q6+k4bq5xWlak5x5xla NRIfU19wJ9ToYCH+tjm2he6+3+co8RRnqXgRXSKzv0I1f7tRvijZ7eIgvvtwQBAdA+mo xEUSRDEiSPJXOYJw4sSyLaD1sKTe9yFdmXamF8cJ28Wz1uRY06VG+Bbki6m5vcT7xf2d R/wkAacWz0XiNUrTGZKpoIIjwFWLlSSQilybbck2LvVLsOkFJfKwNNtXbcyzE7C8xm3t hgJZyg1T7HUZTiXd5NfVYfb0W34OI6ZHpWCVrghjpi2N3pUEUTlRMm4V9RWDMCK+IeB5 uoKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=dNoM9l83mAY/LmnBcBOcjI3CLLCvxH/buCq+t/mGjmQ=; b=QlrUPLVTEd88JYvYzsGhghlef0RaGAzpk/EaRkebh3lArzfJWUxcnOjYELoSuoFLL9 s0c5bKDehBBrFO1nudtUYdMCbtuarRgZixsr8iFHNKs+XnNQ48Ufq/3Q0Y+kXaereZC2 a+Jan3MAa6EzXGIR96HKJ28H+KJtlAL+gTkNCk4ZHHPRo8+xSbmornCvDNWVI5Cw7gsI U5jCGMO4OcuxpOCVVsqUZxeZAEW/Hp+T7CFduiJNxk3dItRCy1tgHJnz52EKOHJgV5ir lJ+nUgg41GrOm+c4Fq9O1TWcAy45JA4Zm5WKKyE0hyVTvWLXWF66TA0K3zfwLD5U3FhN hmEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YKsUVpu0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u66-20020a637945000000b0051b2ba5cb37si429296pgc.437.2023.05.04.15.43.28; Thu, 04 May 2023 15:43:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YKsUVpu0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229689AbjEDWUy (ORCPT + 99 others); Thu, 4 May 2023 18:20:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229472AbjEDWUx (ORCPT ); Thu, 4 May 2023 18:20:53 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70E529EE8; Thu, 4 May 2023 15:20:51 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 063E263A6F; Thu, 4 May 2023 22:20:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2C8DEC433EF; Thu, 4 May 2023 22:20:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1683238850; bh=rwx7iUdSi9ZJ936Oee+tnrMgzVQXpxJNfSPrASIpy3Q=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=YKsUVpu0ptYjRBovoU/uOo2fA0BLiAZ1A+MBoqqjLC4c5AS7LeMY8bwZEN8loP+Gs 32n4NOnIJBMoV5gKsEAkVHH0nJiXS1gPKnCxotZljfqR1V4u4dNPWyAWtx8U0yfbCT SXACPyg87+6r2MlTf67MfYt+OvW2G2HHVFp3SnMXecSyCAEHHBTsJrSqfGym4NDomP DMk3EPIfkqNi9tsmpRaccspQfjuesw8vkZMyprA6GPQvMtpC8eQpyHfbZysO7RoPlU xIdDEW6wAuoObJ8PfmRChxOfCuTrKWbAgYzUDSL4n2lAZB9iu7OBv8SWzhKNMd8m7j 3LCQ1rhIJRL5w== Date: Thu, 4 May 2023 17:20:48 -0500 From: Bjorn Helgaas To: "Maciej W. Rozycki" Cc: Bjorn Helgaas , Mahesh J Salgaonkar , Oliver O'Halloran , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Saeed Mahameed , Leon Romanovsky , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alex Williamson , Lukas Wunner , Mika Westerberg , Stefan Roese , Jim Wilson , David Abdurachmanov , Pali =?iso-8859-1?Q?Roh=E1r?= , linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v8 7/7] PCI: Work around PCIe link training failures Message-ID: <20230504222048.GA887151@bhelgaas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 06, 2023 at 01:21:31AM +0100, Maciej W. Rozycki wrote: > Attempt to handle cases such as with a downstream port of the ASMedia > ASM2824 PCIe switch where link training never completes and the link > continues switching between speeds indefinitely with the data link layer > never reaching the active state. We're going to land this series this cycle, come hell or high water. We talked about reusing pcie_retrain_link() earlier. IIRC that didn't work: ASPM needs to use PCI_EXP_LNKSTA_LT because not all devices support PCI_EXP_LNKSTA_DLLLA, and you need PCI_EXP_LNKSTA_DLLLA because the erratum makes PCI_EXP_LNKSTA_LT flap. What if we made pcie_retrain_link() reusable by making it: bool pcie_retrain_link(struct pci_dev *pdev, u16 link_status_bit) so ASPM could use pcie_retrain_link(link->pdev, PCI_EXP_LNKSTA_LT) and you could use pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA)? Maybe do it two steps? 1) Move pcie_retrain_link() just after pcie_wait_for_link() and make it take link->pdev instead of link. 2) Add the bit parameter. I'm OK with having pcie_retrain_link() in pci.c, but the surrounding logic about restricting to 2.5GT/s, retraining, removing the restriction, retraining again is stuff I'd rather have in quirks.c so it doesn't clutter pci.c. I think it'd be good if the pci_device_add() path made clear that this is a workaround for a problem, e.g., void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) { ... if (pcie_link_failed(dev)) pcie_fix_link_train(dev); where pcie_fix_link_train() could live in quirks.c (with a stub when CONFIG_PCI_QUIRKS isn't enabled). It *might* even be worth adding it and the stub first because that's a trivial patch and wouldn't clutter the probe.c git history with all the grotty details about ASM2824 and this topology. > +int pcie_downstream_link_retrain(struct pci_dev *dev) > +{ > + static const struct pci_device_id ids[] = { > + { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */ > + {} > + }; > + u16 lnksta, lnkctl2; > + > + if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || > + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) > + return -1; > + > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) == > + PCI_EXP_LNKSTA_LBMS) { You go to some trouble to make sure PCI_EXP_LNKSTA_LBMS is set, and I can't remember what the reason is. If you make a preparatory patch like this, it would give a place for that background, e.g., +bool pcie_link_failed(struct pci_dev *dev) +{ + u16 lnksta; + + if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) + return false; + + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) == + PCI_EXP_LNKSTA_LBMS) + return true; + + return false; +} If this is a generic thing and checking PCI_EXP_LNKSTA_LBMS makes sense for everybody, it could go in pci.c; otherwise it could go in quirks.c as well. I guess it's not *truly* generic anyway because it only detects link training failures for devices that have LNKCTL2 and link_active_reporting. > + unsigned long timeout; > + u16 lnkctl; > + > + pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > + > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl); > + lnkctl |= PCI_EXP_LNKCTL_RL; > + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; > + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT; > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl); > + /* > + * Due to an erratum in some devices the Retrain Link bit > + * needs to be cleared again manually to allow the link > + * training to succeed. > + */ > + lnkctl &= ~PCI_EXP_LNKCTL_RL; > + if (dev->clear_retrain_link) > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, > + lnkctl); > + > + timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT; > + do { > + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, > + &lnksta); > + if (lnksta & PCI_EXP_LNKSTA_DLLLA) > + break; > + usleep_range(10000, 20000); > + } while (time_before(jiffies, timeout)); > + > + if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) { > + pci_info(dev, "retraining failed\n"); > + return -1; > + } > + } > + if (IS_ENABLED(CONFIG_PCI_QUIRKS) && (lnksta & PCI_EXP_LNKSTA_DLLLA) && > + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && > + pci_match_id(ids, dev)) { > + u32 lnkcap; > + u16 lnkctl; > + > + pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl); > + lnkctl |= PCI_EXP_LNKCTL_RL; > + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; > + lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS; > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl); This starts a retrain; should we wait for training to complete? > + } If we put most of this into a pcie_fix_link_train() (separated from detecting the *need* to fix something), could it be made to look sort of like this? (I suppose you'd want to return bool and rename it that reads naturally, e.g., "pcie_link_forcibly_retrained()", "pcie_link_retrained()", etc) +void pcie_fix_link_train(struct pci_dev *dev) +{ + u16 lnkctl2; + u32 lnkcap; + bool linkup; + + pci_info(dev, "attempting link retrain at 2.5GT/s\n"); + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT; + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); + + linkup = pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA); + if (!linkup) { + pci_info(dev, "retraining failed\n"); + return; + } + + if (LNKCAP supports only 2.5GT/s) + return; + + if (!pci_match_id(ids, dev)) + return; Your comment said "if we know this is *safe*"; I can't remember if pci_match_id() is there to avoid a known problem? + + pci_info(dev, "attempting link retrain at max supported rate\n"); + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; + lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS; + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); + + linkup = pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA); + if (!linkup) + pci_info(dev, "retraining failed\n"); +} > + > + return 0; > +} > + > +/* Same as above, but called for a downstream device. */ > +static int pcie_upstream_link_retrain(struct pci_dev *dev) > +{ > + struct pci_dev *bridge; > + > + bridge = pci_upstream_bridge(dev); > + if (bridge) > + return pcie_downstream_link_retrain(bridge); > + else > + return -1; > +} > + > static int pci_acs_enable; > > /** > @@ -1148,8 +1274,8 @@ void pci_resume_bus(struct pci_bus *bus) > > static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) > { > + int retrain = 0; > int delay = 1; > - u32 id; > > /* > * After reset, the device should not silently discard config > @@ -1163,21 +1289,37 @@ static int pci_dev_wait(struct pci_dev * > * Command register instead of Vendor ID so we don't have to > * contend with the CRS SV value. > */ > - pci_read_config_dword(dev, PCI_COMMAND, &id); > - while (PCI_POSSIBLE_ERROR(id)) { > + for (;;) { > + u32 id; > + > + pci_read_config_dword(dev, PCI_COMMAND, &id); > + if (!PCI_POSSIBLE_ERROR(id)) { > + if (delay > PCI_RESET_WAIT) > + pci_info(dev, "ready %dms after %s\n", > + delay - 1, reset_type); > + break; > + } > + > if (delay > timeout) { > pci_warn(dev, "not ready %dms after %s; giving up\n", > delay - 1, reset_type); > return -ENOTTY; > } > > - if (delay > PCI_RESET_WAIT) > + if (delay > PCI_RESET_WAIT) { > + if (!retrain) { > + retrain = 1; > + if (pcie_upstream_link_retrain(dev) == 0) { > + delay = 1; > + continue; > + } > + } > pci_info(dev, "not ready %dms after %s; waiting\n", > delay - 1, reset_type); > + } Thanks for fixing this in the reset path, too. Can we move this part to a separate patch? It's related to the rest of the patch, but it looks so much different that I think it would be easier to understand by itself. I think I might try to fold the pcie_upstream_link_retrain() directly in here because the "upstream link retrain" in the function name doesn't really make sense in PCIe terms. Bjorn