Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752373AbeAERme (ORCPT + 1 other); Fri, 5 Jan 2018 12:42:34 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:47758 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752332AbeAERmc (ORCPT ); Fri, 5 Jan 2018 12:42:32 -0500 Subject: Re: [PATCH v5 1/2] PCI: mediatek: Clear IRQ status after IRQ dispatched to avoid reentry To: Honghui Zhang Cc: Lorenzo Pieralisi , bhelgaas@google.com, matthias.bgg@gmail.com, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, yingjoe.chen@mediatek.com, eddie.huang@mediatek.com, ryder.lee@mediatek.com, hongkun.cao@mediatek.com, youlin.pei@mediatek.com, yong.wu@mediatek.com, yt.shen@mediatek.com, sean.wang@mediatek.com, xinping.qian@mediatek.com References: <1514336394-17747-1-git-send-email-honghui.zhang@mediatek.com> <1514336394-17747-2-git-send-email-honghui.zhang@mediatek.com> <20180104184040.GE12239@red-moon> <88c84a3e-17ea-08f2-e5fc-4799b41de267@arm.com> <1515153107.25872.57.camel@mhfsdcap03> From: Marc Zyngier Organization: ARM Ltd Message-ID: <1de755e8-6654-b789-159f-248231fab7b0@arm.com> Date: Fri, 5 Jan 2018 17:42:27 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <1515153107.25872.57.camel@mhfsdcap03> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 05/01/18 11:51, Honghui Zhang wrote: > On Thu, 2018-01-04 at 19:04 +0000, Marc Zyngier wrote: >> On 04/01/18 18:40, Lorenzo Pieralisi wrote: >>> [+Marc] >>> >>> On Wed, Dec 27, 2017 at 08:59:53AM +0800, honghui.zhang@mediatek.com wrote: >>>> From: Honghui Zhang >>>> >>>> There maybe a same IRQ reentry scenario after IRQ received in current >>>> IRQ handle flow: >>>> EP device PCIe host driver EP driver >>>> 1. issue an IRQ >>>> 2. received IRQ >>>> 3. clear IRQ status >>>> 4. dispatch IRQ >>>> 5. clear IRQ source >>>> The IRQ status was not successfully cleared at step 2 since the IRQ >>>> source was not cleared yet. So the PCIe host driver may receive the >>>> same IRQ after step 5. Then there's an IRQ reentry occurred. >>>> Even worse, if the reentry IRQ was not an IRQ that EP driver expected, >>>> it may not handle the IRQ. Then we may run into the infinite loop from >>>> step 2 to step 4. >>>> Clear the IRQ status after IRQ have been dispatched to avoid the IRQ >>>> reentry. >>>> This patch also fix another INTx IRQ issue by initialize the iterate >>>> before the loop. If an INTx IRQ re-occurred while we are dispatching >>>> the INTx IRQ, then iterate may start from PCI_NUM_INTX + INTX_SHIFT >>>> instead of INTX_SHIFT for the second time entering the >>>> for_each_set_bit_from() loop. >>> >>> This looks like two different issues that should be fixed with two >>> patches. > > Ok, I split this into two patches and figure out a more reasonable > approach by using irq_chip solution. > >>> >>>> Signed-off-by: Honghui Zhang >>>> Acked-by: Ryder Lee >>>> --- >>>> drivers/pci/host/pcie-mediatek.c | 11 ++++++----- >>>> 1 file changed, 6 insertions(+), 5 deletions(-) >>> >>> For the sake of uniformity, I first want to understand why this >>> driver does not call: >>> >>> chained_irq_enter/exit() >>> >>> in the primary handler (mtk_pcie_intr_handler()). >>> >>> With the GIC as a primary interrupt controller we have not >>> even figured out how current code can actually work without >>> calling the chained_* API. >>> >>> I want to come up with a consistent handling of IRQ domains for >>> all host bridges and any discrepancy should be explained. >> >> That's because this driver is a huge hack, see below: >> >>> >>>> diff --git a/drivers/pci/host/pcie-mediatek.c b/drivers/pci/host/pcie-mediatek.c >>>> index db93efd..fc29a9a 100644 >>>> --- a/drivers/pci/host/pcie-mediatek.c >>>> +++ b/drivers/pci/host/pcie-mediatek.c >>>> @@ -601,15 +601,16 @@ static irqreturn_t mtk_pcie_intr_handler(int irq, void *data) >> >> This function is not a chained irqchip, but an interrupt handler... >> >>>> struct mtk_pcie_port *port = (struct mtk_pcie_port *)data; >>>> unsigned long status; >>>> u32 virq; >>>> - u32 bit = INTX_SHIFT; >>>> + u32 bit; >>>> >>>> while ((status = readl(port->base + PCIE_INT_STATUS)) & INTX_MASK) { >>>> + bit = INTX_SHIFT; >>>> for_each_set_bit_from(bit, &status, PCI_NUM_INTX + INTX_SHIFT) { >>>> - /* Clear the INTx */ >>>> - writel(1 << bit, port->base + PCIE_INT_STATUS); >>>> virq = irq_find_mapping(port->irq_domain, >>>> bit - INTX_SHIFT); >>>> generic_handle_irq(virq); >> >> and nonetheless, this calls into generic_handle_irq(). That's a complete >> violation of the interrupt layering. Maybe there is a good reason for >> it, but I'd like to know which one. >> >> Which means that all of the ack/mask has to be done outside of the >> irqchip framework too... Disgusting. >> >>>> + /* Clear the INTx */ >>>> + writel(1 << bit, port->base + PCIE_INT_STATUS); >>> >>> I think that these masking/acking should actually be done through >>> the irq_chip hooks (see for instance pci-ftpci100.c) - that would >>> make this kind of bugs much easier to prevent (because the IRQ >>> layer does the sequencing for you). >> >> +1. >> > > Thanks for your advice, I need to do some homework to have a better > understanding of the irq_chip approach. > >>> Marc (CC'ed) has a more comprehensive view on this than me - I would >>> like to get to a point where all host bridges uses a consistent >>> approach for chained IRQ handling and I hope this bug fix can be >>> a starting point. >> >> +1 again. We definitely need to come up with some form of common >> approach for all these host drivers, and maybe turn that into a library... >> > > Well, this is beyond my knowledge now, I guess I can figure out how to > using irq_chip for the first step, then I may following this "common > approach" after we have a solution for that? We can help you with that at a later time indeed. the urgent thing is to fix this driver so that it does the right thing, and we can then look at using a common approach for a number of them. Thanks, M. -- Jazz is not dead. It just smells funny...