Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1193830rwi; Thu, 13 Oct 2022 10:12:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM62j/U1abfDVhK+bHh4z/9pbElQE/u8BH0yDqsbNCRBWn+swhM/YM47YDjXY8hrsWeLgFIG X-Received: by 2002:a05:6402:28a1:b0:458:81c0:a379 with SMTP id eg33-20020a05640228a100b0045881c0a379mr670794edb.388.1665681122822; Thu, 13 Oct 2022 10:12:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665681122; cv=none; d=google.com; s=arc-20160816; b=bA+59GlnKSl0X1qsFRMQfPGpBYB3Q/9Un4Ms9gwSf7BRSoRJfSd66pARobvCmaqqze finbWkR3hdUAIwQs3plct+j6TebwTM3uRADVJ7ICjQzpR4kInd1uYZiPLFUUGNUw7WYi 18jgNIc+57p09Ijb1lmepyVcKaPcFPd3VrVoLX8CfVbzMcxYc1GS0QZqEzmmyqGRpWW7 xbXT6apoIVQojUrzH2KWjXyTinavd4kd43OFYgx93Nh2H5IirKUtpAd7cvPDlEaEGEn5 4tH2HVaQKyetBhZAGTG7qPH3ahKBmkEG6/PSMfjpFd0WBCwKcZPlc3Rpci8tDAy9SgLO yNCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=pMi5qoaLtF4xuqRYBvnQy3R3MqTw0bMvGBerhNI+kOk=; b=cjcH7wRmfmC412ExCbM7ztvEdWuSnybN3JOR943xtm10fcCTq2IJFNZ9Q6bZB87McN htAB6Mt+SubnwLADifTbWOnX3mq91pOuICDQc2IEAqwdgvkBIfCGeGuOYQoCpzzdRKM/ F5z6oLjZgCSIRWHrlB9eo9YZa235Gs1a7TpwVP3NKIOsmNsWu7/y/0Q/ueajNYLf3Im7 EM2HSjm9LfpZOxb99mf3W1L8iLOAszORxNIIHznmr9Y+od6fUvSGLmvLnzuYWPY4Z/c0 SROoDIKB6e/Odr9Szvf+x+9+Nivi9IOsiSzSZxNP8Lzk4A4iteVXkt5Z/VXtUmXLJupD VMxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Bs9Eqk0u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j14-20020a05640211ce00b0045d079c8f20si260570edw.43.2022.10.13.10.11.34; Thu, 13 Oct 2022 10:12:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Bs9Eqk0u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229783AbiJMQut (ORCPT + 99 others); Thu, 13 Oct 2022 12:50:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229485AbiJMQur (ORCPT ); Thu, 13 Oct 2022 12:50:47 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56C1713333D; Thu, 13 Oct 2022 09:50:46 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D02AE61870; Thu, 13 Oct 2022 16:50:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E0342C433C1; Thu, 13 Oct 2022 16:50:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665679845; bh=5GFgw6QTHtEZ01GdWve/MitmH27Csb+Mh/SvvIm9Y98=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Bs9Eqk0uiZzTaEBf4pbqi2hzjp7SqbvlAQikV/tjrUkOwNKInW81cTVN3Km1SIEhY ioD599Op+rCC6kqEwlkVfMOApCM1rPevHM38ovgmlNMmyMU/NQWRpRkppsOCYfAcpM amhLhQMEpULqkzPcQ/6DgtNCUzK1iQ9DtKtnLEyAi3pFrWBZ86ZGM8wpkBqDhrCC4+ +oNYxuoFLiRxKeU+4GRqa4UqHA+2wve6lc8tO2aH1yIkAgcH+G7cTyb2unOvjoVArW ziD0/jCDoZNdECzTXowXGQ95pwHtUAgtIgR2Ps7kjjdpdzVCeiokcY3+zPZD4NETPX hUPe/bozLSIVw== Received: by pali.im (Postfix) id C0B767F0; Thu, 13 Oct 2022 18:50:41 +0200 (CEST) Date: Thu, 13 Oct 2022 18:50:41 +0200 From: Pali =?utf-8?B?Um9ow6Fy?= To: Elad Nachman , Lorenzo Pieralisi Cc: Bjorn Helgaas , Krzysztof =?utf-8?Q?Wilczy=C5=84ski?= , "linux-pci@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" Subject: Re: [EXT] Re: [PATCH v2] PCI: aardvark: Implement workaround for PCIe Completion Timeout Message-ID: <20221013165041.h23yz7xxi2nejygg@pali> References: <20220802123816.21817-1-pali@kernel.org> <20220926123434.2tqx4t6u3cnlrcx3@pali> <20221003211412.5pqfjvcxyszd4ai6@pali> <20221004083957.qtfkn4eyi42lsd4j@pali> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20180716 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Elad! Thank you very much for detailed explanation. Lorenzo, it is OK, or do you need some other details? On Thursday 06 October 2022 08:14:48 Elad Nachman wrote: > Hi, > > Regarding number #1 : > The H/W erratum means strong ordering mode is not properly implemented in the H/W. > The workaround is to set DIS_ORD_CHK to disable the ordered check, to prevent the issues Pali mentioned. > Leaving it without the workaround means encountering the crashes Pali mentioned. > The H/W was originally designed to have the strong ordering mode, per what is expected of PCI Express, but due to this erratum, it cannot be supported by the taped-out H/W. > There is nothing to do about it now in the H/W, as this is a years old SOC. > Software wise the only option is the workaround above, or to accept sporadic crashes of the kernel. > > Regarding number #2 : > The impact of the workaround is that the DMA done status could be set before the last data is written to the host memory, causing the host to read incomplete data. This is the reason why I recommended to add a memory barrier. If this is bothering, the memory barrier can be removed from the patch, although I suggest to thoroughly test the data integrity if implementing the patch without the memory barrier. > > FYI, > > Elad. > > -----Original Message----- > From: Pali Rohár > Sent: Tuesday, October 4, 2022 11:40 AM > To: Elad Nachman ; Lorenzo Pieralisi > Cc: Bjorn Helgaas ; Krzysztof Wilczyński ; linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org > Subject: Re: [EXT] Re: [PATCH v2] PCI: aardvark: Implement workaround for PCIe Completion Timeout > > + Elad > > Could you please look at Lorenzo's comments and help with this fix? > > On Tuesday 04 October 2022 10:02:32 Lorenzo Pieralisi wrote: > > On Mon, Oct 03, 2022 at 11:14:12PM +0200, Pali Rohár wrote: > > > Lorenzo, is something more needed for this patch? As it workarounds > > > crashing it is really needed to have it in mainline and backports. > > > > Yes, a clear explanation from Marvell about what this is actually > > fixing - it took me a while to go through the whole thread but I still > > don't understand what this patch actually does and why. > > > > An Erratum workaround (if there is any) should define and explain a SW > > workaround. > > > > (1) Bjorn's concerns in relation to PCI memory model weren't addressed > > (2) We don't add undocumented memory barriers to the kernel to "minimize > > risks". Either we fix a bug or we don't. If we do, write that down > > and document why the barrier is there and the issue it solves. > > > > I understand that basically you are reverse engineering a HW bug but I > > am afraid we can't fix the kernel this way - more so with patches > > going to be backported to stable kernels. > > > > Lorenzo > > > > > On Wednesday 28 September 2022 14:05:10 Elad Nachman wrote: > > > > Reviewed-by: Elad Nachman > > > > > > > > Thanks, > > > > > > > > Elad. > > > > > > > > -----Original Message----- > > > > From: Pali Rohár > > > > Sent: Monday, September 26, 2022 3:35 PM > > > > To: Elad Nachman > > > > Cc: Thomas Petazzoni ; Lorenzo > > > > Pieralisi ; Bjorn Helgaas > > > > ; Krzysztof Wilczyński ; Rob > > > > Herring ; linux-pci@vger.kernel.org; > > > > linux-arm-kernel@lists.infradead.org; > > > > linux-kernel@vger.kernel.org; Gregory Clement > > > > ; Marek Behún ; > > > > Remi Pommarel ; Xogium ; > > > > Tomasz Maciej Nowak > > > > Subject: [EXT] Re: [PATCH v2] PCI: aardvark: Implement workaround > > > > for PCIe Completion Timeout > > > > > > > > External Email > > > > > > > > ------------------------------------------------------------------ > > > > ---- Hello Elad, could you please review this patch? I have > > > > implemented it according your instructions, including that full memory barrier as you described. > > > > > > > > On Tuesday 02 August 2022 14:38:16 Pali Rohár wrote: > > > > > Marvell Armada 3700 Functional Errata, Guidelines, and > > > > > Restrictions document describes in erratum 3.12 PCIe Completion Timeout (Ref #: > > > > > 251), that PCIe IP does not support a strong-ordered model for inbound posted vs. > > > > > outbound completion. > > > > > > > > > > As a workaround for this erratum, DIS_ORD_CHK flag in Debug Mux > > > > > Control register must be set. It disables the ordering check in > > > > > the core between Completions and Posted requests received from the link. > > > > > > > > > > Marvell also suggests to do full memory barrier at the beginning > > > > > of aardvark summary interrupt handler before calling interrupt > > > > > handlers of endpoint drivers in order to minimize the risk for > > > > > the race condition documented in the Erratum between the DMA > > > > > done status reading and the completion of writing to the host memory. > > > > > > > > > > More details about this issue and suggested workarounds are in discussion: > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel > > > > > .org_l > > > > > inux-2Dpci_BN9PR18MB425154FE5019DCAF2028A1D5DB8D9-40BN9PR18MB425 > > > > > 1.namp > > > > > rd18.prod.outlook.com_t_-23u&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r > > > > > =eTeNT > > > > > LEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=bjgkhgPgOjqCEsbHYHONCZM > > > > > iFDX72 > > > > > MztWaE0AvWBktQVn3zKEDtUdn02Kx_KJ14B&s=SToGsDGEObwbZGilVtVZPyME8j > > > > > NiRgrq > > > > > 4SDYvqqT0TA&e= > > > > > > > > > > It was reported that enabling this workaround fixes instability > > > > > issues and "Unhandled fault" errors when using 60 GHz WiFi > > > > > 802.11ad card with Qualcomm > > > > > QCA6335 chip under significant load which were caused by > > > > > interrupt status stuck in the outbound CMPLT queue traced back to this erratum. > > > > > > > > > > This workaround fixes also kernel panic triggered after some > > > > > minutes of usage 5 GHz WiFi 802.11ax card with Mediatek MT7915 chip: > > > > > > > > > > Internal error: synchronous external abort: 96000210 [#1] SMP > > > > > Kernel panic - not syncing: Fatal exception in interrupt > > > > > > > > > > Signed-off-by: Thomas Petazzoni > > > > > Signed-off-by: Pali Rohár > > > > > Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host > > > > > controller > > > > > driver") > > > > > Cc: stable@vger.kernel.org > > > > > --- > > > > > drivers/pci/controller/pci-aardvark.c | 10 ++++++++++ > > > > > 1 file changed, 10 insertions(+) > > > > > > > > > > diff --git a/drivers/pci/controller/pci-aardvark.c > > > > > b/drivers/pci/controller/pci-aardvark.c > > > > > index 060936ef01fe..3ae8a85ec72e 100644 > > > > > --- a/drivers/pci/controller/pci-aardvark.c > > > > > +++ b/drivers/pci/controller/pci-aardvark.c > > > > > @@ -210,6 +210,8 @@ enum { > > > > > }; > > > > > > > > > > #define VENDOR_ID_REG (LMI_BASE_ADDR + 0x44) > > > > > +#define DEBUG_MUX_CTRL_REG (LMI_BASE_ADDR + 0x208) > > > > > +#define DIS_ORD_CHK BIT(30) > > > > > > > > > > /* PCIe core controller registers */ > > > > > #define CTRL_CORE_BASE_ADDR 0x18000 > > > > > @@ -558,6 +560,11 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie) > > > > > PCIE_CORE_CTRL2_TD_ENABLE; > > > > > advk_writel(pcie, reg, PCIE_CORE_CTRL2_REG); > > > > > > > > > > + /* Disable ordering checks, workaround for erratum 3.12 "PCIe completion timeout" */ > > > > > + reg = advk_readl(pcie, DEBUG_MUX_CTRL_REG); > > > > > + reg |= DIS_ORD_CHK; > > > > > + advk_writel(pcie, reg, DEBUG_MUX_CTRL_REG); > > > > > + > > > > > /* Set lane X1 */ > > > > > reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); > > > > > reg &= ~LANE_CNT_MSK; > > > > > @@ -1581,6 +1588,9 @@ static irqreturn_t advk_pcie_irq_handler(int irq, void *arg) > > > > > struct advk_pcie *pcie = arg; > > > > > u32 status; > > > > > > > > > > + /* Full memory barrier (ARM dsb sy), workaround for erratum 3.12 "PCIe completion timeout" */ > > > > > + mb(); > > > > > + > > > > > status = advk_readl(pcie, HOST_CTRL_INT_STATUS_REG); > > > > > if (!(status & PCIE_IRQ_CORE_INT)) > > > > > return IRQ_NONE; > > > > > -- > > > > > 2.20.1 > > > > > > > > > > > _______________________________________________ > > > linux-arm-kernel mailing list > > > linux-arm-kernel@lists.infradead.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead. > > > org_mailman_listinfo_linux-2Darm-2Dkernel&d=DwIDaQ&c=nKjWec2b6R0mOyP > > > az7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=2NzkT9KLO26k > > > efUOw2nIeSeRnJVZLxEiBXqEoRvDQ0ueww6n4YaXWgAN1uCJX20o&s=nczACS_2jERbA > > > -c4Gfar0-HTA4PtvZdJmsBv8jhW8G0&e=