Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp5233422rwd; Mon, 12 Jun 2023 01:52:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4bCI2dWaNL0VL73FYHqOCXuKgnSgNZ2x0J6g0xs25AqdPtWA7k6p9Ti0t14hoBDAgzqXYX X-Received: by 2002:a17:902:9b8c:b0:1af:adc6:3bc0 with SMTP id y12-20020a1709029b8c00b001afadc63bc0mr5323658plp.5.1686559972138; Mon, 12 Jun 2023 01:52:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686559972; cv=none; d=google.com; s=arc-20160816; b=zpL88NXMmgeEdxiorIU+ewCHxGXlzU9Ax7y+XH2K4Zaz+jIhWjPkgQ2NUUnC58QE/r 0LkiLnb9uJYmPBFXlDZkustmOAxliCJSjJOF6BhQapS58ZAEmFckB0/Xwee1vmpSi+eQ WsIbJeFMcCWyVknDqBZ051xQUQAuRszo3neVZA97Av3OEgQCLDkbf8KC9SV+GYJuc6zd x+/leYITc+cyYlPuQh+9jsFgT5gqcPXEse9BODDNreZJOcXdfjyQ4IfJmvivOtgH5Wb8 dz1N29luSWfJCJV3akWVO+FX7ogboAgtiqdgTuuCmKHPTv6jTgnPgpx9i0epNqUSbCE2 SRxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=OtUyHY2FL7H06e4tGkVnv2fsNOOGY7kt5NYrLMbfP+A=; b=GvypWuC7z2ireISQQyZWwDjCokS+2ke48Yz9b/hSm2wGc5HcHR5n4Sp8k+iscL/Lb+ KPTq96RkYHuJWsN6KHxqfgUyBC19nbyf+r8d/a+7Vrnn1Uz/xiteWaMLAajNXfkU1Fdu DDwyMpCUGM/4tkdszH5qbJFePPkKcppxwOh4IdiOYJ+84VMtAwnBZi3+ekGOWXpvi3pf 1unXGKOBhqp61sMjgNUmPhMWOqQckB8X26lC61FaTCY+48yooW/n6nUP7v8PL/06A5Om xE8zFfgytN4xwNm6mATRnC4zOp65KMwxtpeXo+z1MPCz7nkI5gLsGG7BqdWNaLd6WTj7 hp7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=TZo7R6JR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u5-20020a170903124500b001aafb271d13si7178625plh.235.2023.06.12.01.52.39; Mon, 12 Jun 2023 01:52:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=TZo7R6JR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231867AbjFLIah (ORCPT + 99 others); Mon, 12 Jun 2023 04:30:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233352AbjFLI3w (ORCPT ); Mon, 12 Jun 2023 04:29:52 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 845DC4206; Mon, 12 Jun 2023 01:29:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BC0F762125; Mon, 12 Jun 2023 08:29:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 17A6AC4339B; Mon, 12 Jun 2023 08:29:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1686558548; bh=7P7phndunueYEv+4irY6tmeVzrPPEMoV3e4JltrN8Rw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=TZo7R6JRjC0MOV6tFurOdHL0XNjkZaXytpxrYL74iiVkqbpO8wZgadPGMuX+enDoV LE/0WDqfxwkF8N8W2EBRuP/aSG6Z3qETCPs6huhJ7feCHoDaAxbMzSpwPrGI2fqPRG TWF57ZXgCjfWRCsh9gOzVGrstinUueN39iuu+WAldr+7iHMr074v1LqKVPEnBJgHnX nRrL3uQVyQPZQe5VVgQuhG3Q8sF95PNkX5fd4OJcrn+XxjwxuIrpfo+9Hv7/L+QeZJ G7C/ffV49s12Orhun54l6WmJ2/sCFaQ6N6A16MHLmnFcQENWAqocOjcqWPXUd5zHCN XxjctW06YkzdA== Date: Mon, 12 Jun 2023 10:29:00 +0200 From: Lorenzo Pieralisi To: Siddharth Vadapalli Cc: Bjorn Helgaas , tjoseph@cadence.com, robh@kernel.org, kw@linux.com, bhelgaas@google.com, nadeem@cadence.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, vigneshr@ti.com, srk@ti.com, nm@ti.com Subject: Re: [PATCH v3] PCI: cadence: Fix Gen2 Link Retraining process Message-ID: References: <20230609173940.GA1252506@bhelgaas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 12, 2023 at 09:56:27AM +0530, Siddharth Vadapalli wrote: > > > On 09/06/23 23:09, Bjorn Helgaas wrote: > > On Wed, Jun 07, 2023 at 02:44:27PM +0530, Siddharth Vadapalli wrote: > >> The Link Retraining process is initiated to account for the Gen2 defect in > >> the Cadence PCIe controller in J721E SoC. The errata corresponding to this > >> is i2085, documented at: > >> https://www.ti.com/lit/er/sprz455c/sprz455c.pdf > >> > >> The existing workaround implemented for the errata waits for the Data Link > >> initialization to complete and assumes that the link retraining process > >> at the Physical Layer has completed. However, it is possible that the > >> Physical Layer training might be ongoing as indicated by the > >> PCI_EXP_LNKSTA_LT bit in the PCI_EXP_LNKSTA register. > >> > >> Fix the existing workaround, to ensure that the Physical Layer training > >> has also completed, in addition to the Data Link initialization. > >> > >> Fixes: 4740b969aaf5 ("PCI: cadence: Retrain Link to work around Gen2 training defect") > >> Signed-off-by: Siddharth Vadapalli > >> Reviewed-by: Vignesh Raghavendra > >> --- > >> > >> Hello, > >> > >> This patch is based on linux-next tagged next-20230606. > >> > >> v2: > >> https://lore.kernel.org/r/20230315070800.1615527-1-s-vadapalli@ti.com/ > >> Changes since v2: > >> - Merge the cdns_pcie_host_training_complete() function with the > >> cdns_pcie_host_wait_for_link() function, as suggested by Bjorn > >> for the v2 patch. > >> - Add dev_err() to notify when Link Training fails, since this is a > >> fatal error and proceeding from this point will almost always crash > >> the kernel. > >> > >> v1: > >> https://lore.kernel.org/r/20230102075656.260333-1-s-vadapalli@ti.com/ > >> Changes since v1: > >> - Collect Reviewed-by tag from Vignesh Raghavendra. > >> - Rebase on next-20230315. > >> > >> Regards, > >> Siddharth. > >> > >> .../controller/cadence/pcie-cadence-host.c | 20 +++++++++++++++++++ > >> 1 file changed, 20 insertions(+) > >> > >> diff --git a/drivers/pci/controller/cadence/pcie-cadence-host.c b/drivers/pci/controller/cadence/pcie-cadence-host.c > >> index 940c7dd701d6..70a5f581ff4f 100644 > >> --- a/drivers/pci/controller/cadence/pcie-cadence-host.c > >> +++ b/drivers/pci/controller/cadence/pcie-cadence-host.c > >> @@ -12,6 +12,8 @@ > >> > >> #include "pcie-cadence.h" > >> > >> +#define LINK_RETRAIN_TIMEOUT HZ > >> + > >> static u64 bar_max_size[] = { > >> [RP_BAR0] = _ULL(128 * SZ_2G), > >> [RP_BAR1] = SZ_2G, > >> @@ -80,8 +82,26 @@ static struct pci_ops cdns_pcie_host_ops = { > >> static int cdns_pcie_host_wait_for_link(struct cdns_pcie *pcie) > >> { > >> struct device *dev = pcie->dev; > >> + unsigned long end_jiffies; > >> + u16 link_status; > >> int retries; > >> > >> + /* Wait for link training to complete */ > >> + end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT; > >> + do { > >> + link_status = cdns_pcie_rp_readw(pcie, CDNS_PCIE_RP_CAP_OFFSET + PCI_EXP_LNKSTA); > >> + if (!(link_status & PCI_EXP_LNKSTA_LT)) > >> + break; > >> + usleep_range(0, 1000); > >> + } while (time_before(jiffies, end_jiffies)); > >> + > >> + if (!(link_status & PCI_EXP_LNKSTA_LT)) { > >> + dev_info(dev, "Link training complete\n"); > >> + } else { > >> + dev_err(dev, "Fatal! Link training incomplete\n"); > >> + return -ETIMEDOUT; > >> + } > > > > Can I have a brown paper bag, please? I totally blew it here, and I'm > > sorry. > > > > You took my advice by combining this with the existing > > cdns_pcie_host_wait_for_link(), but I think my advice was poor because > > (a) now this additional wait is not clearly connected with the > > erratum, and (b) it affects devices that don't have the erratum. > > > > IIUC, this is all part of a workaround for the i2085 erratum. The > > original workaround, 4740b969aaf5 ("PCI: cadence: Retrain Link to work > > around Gen2 training defect"), added this: > > > > if (!ret && rc->quirk_retrain_flag) > > ret = cdns_pcie_retrain(pcie); > > > > I think the wait for link train to complete should also be in > > cdns_pcie_retrain() so it's clearly connected with the quirk, which > > also means we'd only do the wait for devices with the erratum. > > > > Which is EXACTLY what your first patch did, and I missed it. I am > > very sorry. I guess maybe I thought cdns_pcie_retrain() was a > > general-purpose thing, but in fact it's only used for this quirk. > > With the current approach implemented in this patch, I could do the following: > In the cdns_pcie_host_wait_for_link() function, I obtain the reference to the > struct cdns_pcie_rc *rc, using: > struct cdns_pcie_rc *rc = container_of(pcie, struct cdns_pcie_rc, pcie); > followed by checking if the quirk "quirk_retrain_flag" is set, before proceeding > with the Link Training check added by this patch. With this, only the > controllers with the quirk will check for the Link Training completion before > proceeding. However, the difference with this new approach compared to the > approach in the v2 patch is that in this new approach, even in the Link Training > Phase, the Link Training check is performed for the controllers with the quirk, > unlike the v2 patch where the Link Training check was performed only during the > Link Retraining Phase through the cdns_pcie_retrain() function. > > Also, based on Mani's suggestion, I have measured the latency introduced by the > Link Training check for both quirky and non-quirky controllers at: > https://lore.kernel.org/r/a63fc8b0-581b-897f-cac6-cb0a0e82c63e@ti.com/ > If the latency is acceptable, then the current implementation in this v3 patch > could be fine too. > > Kindly let me know which approach among the following seems to be the best one: > 1. The approach implemented in v2 patch (I will make minor changes to the patch > to print out the "Fatal" error, so that users will be informed of the cause of > the crash, followed by posting a v4 patch with this change). > 2. The current implementation in the v3 patch with a check added to see if the > controller has the quirk_retrain_flag set, before proceeding with the Link > Training check. > 3. The current implementation in the v3 patch as is, without any modification, > if the latency introduced is not a concern and the sanity check for Link > Training completion for non-quirky controllers appears acceptable. The point is, you stated it yourself that the non-quirky path is broken too in its *current* form, I don't think there is any other option on the table other than (3) (unless we want to rely on probe time timing to hide the issue; that to me it is not even considerable as an option). Lorenzo