Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1723259pxj; Wed, 19 May 2021 12:21:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx1pG6hRd73LsVgQC65peJSXzyQbRT1AJkFwFdarij2XF+x6Aszt/25JDLJ1LTZ8xxdsyHO X-Received: by 2002:a92:cd85:: with SMTP id r5mr655147ilb.294.1621452076514; Wed, 19 May 2021 12:21:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621452076; cv=none; d=google.com; s=arc-20160816; b=EtK2wfroKnuZRB7tsqUAieshOq0bjo6udv6pRQZJDqfJ25NQA2GPPtwx2KPKkkZmKC gyrb0n3HSZTJpIsnzOJTnFVsflsNXVM/n00DPmQxHXfXRzXlf3Pf6ld8MuZ2KGqpmH/v 5V1oLi3OGPm4UONkQZ8r6zDnH8cSRFW3fkqHbKKx5z6tr+lUJXHhXydCPdUibVfQkNnz TyPDI5k+CkjfGUFgeyGe7K/la8uMFS/HV5nQ6Wlj7BFjq8mtKDC2UOAnncmN3ZGI3YZb 9XslsC7n/J/5uaU7CSKoa56ke+8RX+Uxjr5VDpcjgWUWT3Yf8XA7LzSsXGYnolJXEuOL 065g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=uyCAQNtFLsl8+khNxh9sUzfwxNBBOjBbk9Nlj8t+dQw=; b=RF28rsjngVwW706ymut/ikyLBASjXSpgm5Xvj60hQ9JJhTBWihnJBFvlVNQ5keS+Eg Jr73zgQYsbzrXB86yEHtjReWswklVzn1l2/JunCuO0NTZUgC8skLxlXSCj9I71NqLn8e 5aYhROqrmH/5rcOZsPiD2AuJ+sJodmDj91XDfL27CXAnbvR3KD7K1/9gYDc0BOZX3mIg 2vc2HWebDvdva7uY750M7UzRLYg31g1XG5n1ZZE03B4xRSFg7XsMk9Zbe80mpthMejYg yXCyRvkVbCae0qECMNMpl5es2JIeW7j9rxkZkwEwzz+lkw/NKKWpgg9Qx+ie+Ula54F7 gGCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EYejWY7f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a27si162163jab.118.2021.05.19.12.21.03; Wed, 19 May 2021 12:21:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EYejWY7f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235555AbhESIHv (ORCPT + 99 others); Wed, 19 May 2021 04:07:51 -0400 Received: from mail.kernel.org ([198.145.29.99]:47940 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235381AbhESIHv (ORCPT ); Wed, 19 May 2021 04:07:51 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 84223611BF; Wed, 19 May 2021 08:06:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1621411591; bh=KnUP4pbDB1v5P9lFsxoFeNygHudhxC0V615et8+QNIU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=EYejWY7fAJBv7pX9xt9unm/0dh/EdBPyqLXHJw9DmrHigvoA9OzXROd6rE99Rsymx y4YQ3kBQ2gbfMdnUOIjhg4DSpsA7tXWRy9MT194ib1RdkkRlKHWpBznR+Cc/yyPugV KYbtJG8a1DGQNeDmx2dVWFkxcSFqHc6FX5VaSt3Cnon5UrcxreK6Xmk+XkOCnG8Pq1 9iNSo9d+eJvSPXcySh9k/4nAw1Henvqr7kKmSFc36yXv7Mm72UL/h994ymSp/BJt2r bF2gXZLeDvWq8NU+U76Ps8J+p+SDVJPsLu8BPcFQxig24YE010Rzpx/BcoOSWgOpOt raoKqq7PjuhKw== Received: by pali.im (Postfix) id D03A2857; Wed, 19 May 2021 10:06:27 +0200 (CEST) Date: Wed, 19 May 2021 10:06:27 +0200 From: Pali =?utf-8?B?Um9ow6Fy?= To: Lorenzo Pieralisi , Thomas Petazzoni , Rob Herring , Bjorn Helgaas Cc: Russell King , Marek =?utf-8?B?QmVow7pu?= , Remi Pommarel , Xogium , Tomasz Maciej Nowak , Marc Zyngier , linux-pci@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 01/42] PCI: aardvark: Fix kernel panic during PIO transfer Message-ID: <20210519080627.sfs74d2zxopkarfl@pali> References: <20210506153153.30454-1-pali@kernel.org> <20210506153153.30454-2-pali@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20210506153153.30454-2-pali@kernel.org> User-Agent: NeoMutt/20180716 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thursday 06 May 2021 17:31:12 Pali Rohár wrote: > Trying to start a new PIO transfer by writing value 0 in PIO_START register > when previous transfer has not yet completed (which is indicated by value 1 > in PIO_START) causes an External Abort on CPU, which results in kernel > panic: > > SError Interrupt on CPU0, code 0xbf000002 -- SError > Kernel panic - not syncing: Asynchronous SError Interrupt > > To prevent kernel panic, it is required to reject a new PIO transfer when > previous one has not finished yet. > > If previous PIO transfer is not finished yet, the kernel may issue a new > PIO request only if the previous PIO transfer timed out. > > In the past the root cause of this issue was incorrectly identified (as it > often happens during link retraining or after link down event) and special > hack was implemented in Trusted Firmware to catch all SError events in EL3, > to ignore errors with code 0xbf000002 and not forwarding any other errors > to kernel and instead throw panic from EL3 Trusted Firmware handler. > > Links to discussion and patches about this issue: > https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50 > https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/ > https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.fr/ > https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541 > > But the real cause was the fact that during link retraning or after link > down event the PIO transfer may take longer time, up to the 1.44s until it > times out. This increased probability that a new PIO transfer would be > issued by kernel while previous one has not finished yet. > > After applying this change into the kernel, it is possible to revert the > mentioned TF-A hack and SError events do not have to be caught in TF-A EL3. > > Signed-off-by: Pali Rohár > Reviewed-by: Marek Behún > Cc: stable@vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock") Hello! Could you please review at least this patch? It is fixing kernel panic and to prevent future kernel crashes I would really suggest to merge this one patch into 5.13 queue. > --- > drivers/pci/controller/pci-aardvark.c | 49 ++++++++++++++++++++++----- > 1 file changed, 40 insertions(+), 9 deletions(-) > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c > index 051b48bd7985..e3f5e7ab7606 100644 > --- a/drivers/pci/controller/pci-aardvark.c > +++ b/drivers/pci/controller/pci-aardvark.c > @@ -514,7 +514,7 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie) > udelay(PIO_RETRY_DELAY); > } > > - dev_err(dev, "config read/write timed out\n"); > + dev_err(dev, "PIO read/write transfer time out\n"); > return -ETIMEDOUT; > } > > @@ -657,6 +657,35 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus, > return true; > } > > +static bool advk_pcie_pio_is_running(struct advk_pcie *pcie) > +{ > + struct device *dev = &pcie->pdev->dev; > + > + /* > + * Trying to start a new PIO transfer when previous has not completed > + * cause External Abort on CPU which results in kernel panic: > + * > + * SError Interrupt on CPU0, code 0xbf000002 -- SError > + * Kernel panic - not syncing: Asynchronous SError Interrupt > + * > + * Functions advk_pcie_rd_conf() and advk_pcie_wr_conf() are protected > + * by raw_spin_lock_irqsave() at pci_lock_config() level to prevent > + * concurrent calls at the same time. But because PIO transfer may take > + * about 1.5s when link is down or card is disconnected, it means that > + * advk_pcie_wait_pio() does not always have to wait for completion. > + * > + * Some versions of ARM Trusted Firmware handles this External Abort at > + * EL3 level and mask it to prevent kernel panic. Relevant TF-A commit: > + * https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50 > + */ > + if (advk_readl(pcie, PIO_START)) { > + dev_err(dev, "Previous PIO read/write transfer is still running\n"); > + return true; > + } > + > + return false; > +} > + > static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn, > int where, int size, u32 *val) > { > @@ -673,9 +702,10 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn, > return pci_bridge_emul_conf_read(&pcie->bridge, where, > size, val); > > - /* Start PIO */ > - advk_writel(pcie, 0, PIO_START); > - advk_writel(pcie, 1, PIO_ISR); > + if (advk_pcie_pio_is_running(pcie)) { > + *val = 0xffffffff; > + return PCIBIOS_SET_FAILED; > + } > > /* Program the control register */ > reg = advk_readl(pcie, PIO_CTRL); > @@ -694,7 +724,8 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn, > /* Program the data strobe */ > advk_writel(pcie, 0xf, PIO_WR_DATA_STRB); > > - /* Start the transfer */ > + /* Clear PIO DONE ISR and start the transfer */ > + advk_writel(pcie, 1, PIO_ISR); > advk_writel(pcie, 1, PIO_START); > > ret = advk_pcie_wait_pio(pcie); > @@ -734,9 +765,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn, > if (where % size) > return PCIBIOS_SET_FAILED; > > - /* Start PIO */ > - advk_writel(pcie, 0, PIO_START); > - advk_writel(pcie, 1, PIO_ISR); > + if (advk_pcie_pio_is_running(pcie)) > + return PCIBIOS_SET_FAILED; > > /* Program the control register */ > reg = advk_readl(pcie, PIO_CTRL); > @@ -763,7 +793,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn, > /* Program the data strobe */ > advk_writel(pcie, data_strobe, PIO_WR_DATA_STRB); > > - /* Start the transfer */ > + /* Clear PIO DONE ISR and start the transfer */ > + advk_writel(pcie, 1, PIO_ISR); > advk_writel(pcie, 1, PIO_START); > > ret = advk_pcie_wait_pio(pcie); > -- > 2.20.1 >