Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751666AbeAELwC (ORCPT + 1 other); Fri, 5 Jan 2018 06:52:02 -0500 Received: from mailgw02.mediatek.com ([1.203.163.81]:19586 "EHLO mailgw02.mediatek.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751083AbeAELwA (ORCPT ); Fri, 5 Jan 2018 06:52:00 -0500 X-UUID: 80c29a3dbc8640dd93a909dfb43ee223-20180105 Message-ID: <1515153107.25872.57.camel@mhfsdcap03> Subject: Re: [PATCH v5 1/2] PCI: mediatek: Clear IRQ status after IRQ dispatched to avoid reentry From: Honghui Zhang To: Marc Zyngier CC: Lorenzo Pieralisi , , , , , , , , , , , , , , , , Date: Fri, 5 Jan 2018 19:51:47 +0800 In-Reply-To: <88c84a3e-17ea-08f2-e5fc-4799b41de267@arm.com> References: <1514336394-17747-1-git-send-email-honghui.zhang@mediatek.com> <1514336394-17747-2-git-send-email-honghui.zhang@mediatek.com> <20180104184040.GE12239@red-moon> <88c84a3e-17ea-08f2-e5fc-4799b41de267@arm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit MIME-Version: 1.0 X-MTK: N Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, 2018-01-04 at 19:04 +0000, Marc Zyngier wrote: > On 04/01/18 18:40, Lorenzo Pieralisi wrote: > > [+Marc] > > > > On Wed, Dec 27, 2017 at 08:59:53AM +0800, honghui.zhang@mediatek.com wrote: > >> From: Honghui Zhang > >> > >> There maybe a same IRQ reentry scenario after IRQ received in current > >> IRQ handle flow: > >> EP device PCIe host driver EP driver > >> 1. issue an IRQ > >> 2. received IRQ > >> 3. clear IRQ status > >> 4. dispatch IRQ > >> 5. clear IRQ source > >> The IRQ status was not successfully cleared at step 2 since the IRQ > >> source was not cleared yet. So the PCIe host driver may receive the > >> same IRQ after step 5. Then there's an IRQ reentry occurred. > >> Even worse, if the reentry IRQ was not an IRQ that EP driver expected, > >> it may not handle the IRQ. Then we may run into the infinite loop from > >> step 2 to step 4. > >> Clear the IRQ status after IRQ have been dispatched to avoid the IRQ > >> reentry. > >> This patch also fix another INTx IRQ issue by initialize the iterate > >> before the loop. If an INTx IRQ re-occurred while we are dispatching > >> the INTx IRQ, then iterate may start from PCI_NUM_INTX + INTX_SHIFT > >> instead of INTX_SHIFT for the second time entering the > >> for_each_set_bit_from() loop. > > > > This looks like two different issues that should be fixed with two > > patches. Ok, I split this into two patches and figure out a more reasonable approach by using irq_chip solution. > > > >> Signed-off-by: Honghui Zhang > >> Acked-by: Ryder Lee > >> --- > >> drivers/pci/host/pcie-mediatek.c | 11 ++++++----- > >> 1 file changed, 6 insertions(+), 5 deletions(-) > > > > For the sake of uniformity, I first want to understand why this > > driver does not call: > > > > chained_irq_enter/exit() > > > > in the primary handler (mtk_pcie_intr_handler()). > > > > With the GIC as a primary interrupt controller we have not > > even figured out how current code can actually work without > > calling the chained_* API. > > > > I want to come up with a consistent handling of IRQ domains for > > all host bridges and any discrepancy should be explained. > > That's because this driver is a huge hack, see below: > > > > >> diff --git a/drivers/pci/host/pcie-mediatek.c b/drivers/pci/host/pcie-mediatek.c > >> index db93efd..fc29a9a 100644 > >> --- a/drivers/pci/host/pcie-mediatek.c > >> +++ b/drivers/pci/host/pcie-mediatek.c > >> @@ -601,15 +601,16 @@ static irqreturn_t mtk_pcie_intr_handler(int irq, void *data) > > This function is not a chained irqchip, but an interrupt handler... > > >> struct mtk_pcie_port *port = (struct mtk_pcie_port *)data; > >> unsigned long status; > >> u32 virq; > >> - u32 bit = INTX_SHIFT; > >> + u32 bit; > >> > >> while ((status = readl(port->base + PCIE_INT_STATUS)) & INTX_MASK) { > >> + bit = INTX_SHIFT; > >> for_each_set_bit_from(bit, &status, PCI_NUM_INTX + INTX_SHIFT) { > >> - /* Clear the INTx */ > >> - writel(1 << bit, port->base + PCIE_INT_STATUS); > >> virq = irq_find_mapping(port->irq_domain, > >> bit - INTX_SHIFT); > >> generic_handle_irq(virq); > > and nonetheless, this calls into generic_handle_irq(). That's a complete > violation of the interrupt layering. Maybe there is a good reason for > it, but I'd like to know which one. > > Which means that all of the ack/mask has to be done outside of the > irqchip framework too... Disgusting. > > >> + /* Clear the INTx */ > >> + writel(1 << bit, port->base + PCIE_INT_STATUS); > > > > I think that these masking/acking should actually be done through > > the irq_chip hooks (see for instance pci-ftpci100.c) - that would > > make this kind of bugs much easier to prevent (because the IRQ > > layer does the sequencing for you). > > +1. > Thanks for your advice, I need to do some homework to have a better understanding of the irq_chip approach. > > Marc (CC'ed) has a more comprehensive view on this than me - I would > > like to get to a point where all host bridges uses a consistent > > approach for chained IRQ handling and I hope this bug fix can be > > a starting point. > > +1 again. We definitely need to come up with some form of common > approach for all these host drivers, and maybe turn that into a library... > Well, this is beyond my knowledge now, I guess I can figure out how to using irq_chip for the first step, then I may following this "common approach" after we have a solution for that? thanks. > Thanks, > > M.