Received: by 10.223.185.82 with SMTP id b18csp103066wrg; Thu, 8 Mar 2018 20:24:16 -0800 (PST) X-Google-Smtp-Source: AG47ELtkepFCX5d076lTeZogO9Q1JWX6ooFkqpo6fzQJBpM6W+/A99dmU5678mtha0pU0sbHJlTc X-Received: by 10.98.10.65 with SMTP id s62mr28545223pfi.234.1520569456693; Thu, 08 Mar 2018 20:24:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520569456; cv=none; d=google.com; s=arc-20160816; b=OHUExfMGDQ5Ub9fYoLvIRKOUXRNIONrS6ql4S5buUrQGhreQtfE8AhU6jokNb0I1tQ OcjRwshOPq2QEjA8ejkcHPd3eBqtlO0Wom1G7OID3IOqcZqCNAB7MtKBCCbdJvvIybXK 8U06J4EWjZqaCbKyFlaA0zkSPSKI/VzBYnLIOJyew6JmYX9p3ucRQguekqmkwN4nG6Zv TOljffml/CsKm3qQScQx35v3Y1WfSJIEq62sn9wsKZGIqoaj84kWrxQWp5Ur+TS5LWRp nzB4er9/7iflNZATZJqcCsPPec53inMYak5IoJ7ACPo3gamc76g9dplUE3ZjhSikOpfj 40Kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=MNc+Y0fZHiHk/lI5Yzz+lV67R7fTMpKVKBPssue2yrw=; b=c3cXvrYtxQZUS+h2hcD9HRp9DsT2PjfvOAmWygjiECFZp82hODNGgrRkYKSv4ZiuZZ ZPIRtjoWuE0c14p/XYs4c7c6KdrOTfrYISUYS5ZedZAE5Z13O0gWc6Kp4KPHX9w1Wb75 DYv3llfCrXmznJfUn8Xf1MlNzCnkT1DL644L2oIbg0OazHfNh7e8CSEzzVqR+n2fljRT OJZGld0qhYqF0XBO/OAMXjdTXvYqAIn2X0WOVgIcZEzNwLBWkvLUpILCrrmUwoIHg/59 dTsy7CvB9agZ2+EOvcF9DeCqqvtqV1Fa7x4ROIuhJN6/FoiVQRI94s38Nd2lUGNIUQyh MAqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=zGtnp1pQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 26si178511pfl.261.2018.03.08.20.24.01; Thu, 08 Mar 2018 20:24:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=zGtnp1pQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751092AbeCIEXJ (ORCPT + 99 others); Thu, 8 Mar 2018 23:23:09 -0500 Received: from lelnx193.ext.ti.com ([198.47.27.77]:13514 "EHLO lelnx193.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750848AbeCIEXH (ORCPT ); Thu, 8 Mar 2018 23:23:07 -0500 Received: from dlelxv90.itg.ti.com ([172.17.2.17]) by lelnx193.ext.ti.com (8.15.1/8.15.1) with ESMTP id w294MuFq010777; Thu, 8 Mar 2018 22:22:56 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ti.com; s=ti-com-17Q1; t=1520569376; bh=ku1/rYsiPGXuvSKmcNT6joeblSLepjb+OEQeiiu7FkQ=; h=Subject:To:CC:References:From:Date:In-Reply-To; b=zGtnp1pQR6R5k2cjF+Qb8rxEuIFPmco33b+8IB3k9e5awmBBi+gsP4uefRnRkGLO2 i8Ka023X7z5tdmQxesfqVHqEbq28vkfELmV6SjQt59WUdYuUbUvZwMfeTyT9ytXFkz vMGfVVo95cSYk1Vl23Yb3WabNWvs1efhoUV0ZKeQ= Received: from DFLE102.ent.ti.com (dfle102.ent.ti.com [10.64.6.23]) by dlelxv90.itg.ti.com (8.14.3/8.13.8) with ESMTP id w294MuY2030744; Thu, 8 Mar 2018 22:22:56 -0600 Received: from DFLE104.ent.ti.com (10.64.6.25) by DFLE102.ent.ti.com (10.64.6.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1261.35; Thu, 8 Mar 2018 22:22:56 -0600 Received: from dflp32.itg.ti.com (10.64.6.15) by DFLE104.ent.ti.com (10.64.6.25) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_RSA_WITH_AES_256_CBC_SHA) id 15.1.1261.35 via Frontend Transport; Thu, 8 Mar 2018 22:22:56 -0600 Received: from [172.24.190.89] (ileax41-snat.itg.ti.com [10.172.224.153]) by dflp32.itg.ti.com (8.14.3/8.13.8) with ESMTP id w294MrBN029951; Thu, 8 Mar 2018 22:22:54 -0600 Subject: Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling To: Lorenzo Pieralisi CC: Jingoo Han , Joao Pinto , KISHON VIJAY ABRAHAM , Bjorn Helgaas , Niklas Cassel , "linux-omap@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" References: <20180209120415.17590-1-vigneshr@ti.com> <20180209120415.17590-3-vigneshr@ti.com> <20180212175801.GA29070@e107981-ln.cambridge.arm.com> <7cfa73af-09fa-d298-aaab-3e74ee7e1dd5@ti.com> <20180306151247.GA7378@red-moon> From: Vignesh R Message-ID: <1b1b3842-b3b1-e00f-9879-4f455b9ec647@ti.com> Date: Fri, 9 Mar 2018 09:53:24 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180306151247.GA7378@red-moon> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-EXCLAIMER-MD-CONFIG: e1e8a2fd-e40a-4ac6-ac9b-f7e9cc9ee180 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday 06 March 2018 08:42 PM, Lorenzo Pieralisi wrote: > On Thu, Feb 15, 2018 at 09:59:21AM +0530, Vignesh R wrote: >> Hi, >> >> On Monday 12 February 2018 11:28 PM, Lorenzo Pieralisi wrote: >>> On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote: >>>> We need to ensure that there are no pending MSI IRQ vector set (i.e >>>> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting >>>> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not >>>> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are >>>> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it >>>> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0. >>>> >>>> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel >>>> 8260 used to throw following error and stall during ping/iperf3 tests. >>>> >>>> [   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms. >>>> >>>> Signed-off-by: Vignesh R >>>> --- >>>>   drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++--- >>>>   1 file changed, 18 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c >>>> index ed8558d638e5..3420cbf7b60a 100644 >>>> --- a/drivers/pci/dwc/pci-dra7xx.c >>>> +++ b/drivers/pci/dwc/pci-dra7xx.c >>>> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg) >>>>         struct dra7xx_pcie *dra7xx = arg; >>>>         struct dw_pcie *pci = dra7xx->pci; >>>>         struct pcie_port *pp = &pci->pp; >>>> +     int count = 0; >>>>         unsigned long reg; >>>>         u32 virq, bit; >>>>   >>>>         reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI); >>>> +     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg); >>>>   >>>>         switch (reg) { >>>>         case MSI: >>>> -             dw_handle_msi_irq(pp); >>>> +             /* >>>> +              * Need to make sure no MSI IRQs are pending before >>>> +              * exiting handler, else the wrapper will not catch new >>>> +              * IRQs. So loop around till dw_handle_msi_irq() returns >>>> +              * IRQ_NONE >>>> +              */ >>>> +             while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000) >>>> +                     count++; >>>> + >>>> +             if (count == 1000) { >>>> +                     dev_err(pci->dev, "too much work in msi irq\n"); >>>> +                     dra7xx_pcie_writel(dra7xx, >>>> +                                        PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, >>>> +                                        reg); >>>> +                     return IRQ_HANDLED; >>> >>> I am not merging any code patching this IRQ handling routine anymore >>> unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register >>> works (and how it is related to DW registers) and why this specific host >>> controller needs handling that is not required by any other host >>> controller relying on dw_handle_msi_irq(). >> >> Unlike other DW PCIe controllers, TI implementation has a wrapper on top >> of DW core. This wrapper latches the DW core level MSI and legacy >> interrupts and then propagates it to GIC. >> PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI register is present in this TI >> wrapper which aggregates all the MSI IRQs(PCIE_MSI_INTR0_STATUS) of DW >> level. They are mapped on the MSI interrupt line of PCIe controller, >> using a single status bit in the PCIECTRL_TI_CONF_IRQSTATUS_MSI register. >> >> So, the irq handler, dra7xx_pcie_msi_irq_handler(), first needs to look >> at PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI[4] to know that its MSI IRQ and >> then call dw_handle_msi_irq() to handle individual MSI vectors. >> Driver has to make sure there are no pending vectors in DW core MSI > > How can it make *sure* ? And what makes the wrapper latch MSI IRQs > again ? > This is the sequence that I got from discussion with internal HW team: 1. read CONF_IRQSTATUS_MSI in wrapper and check if MSI bit 2. clear CONF_IRQSTATUS_MSI 3. read, clear and handle PCIE_MSI_INTR0_STATUS vectors 4. repeat step 3 until PCIE_MSI_INTR0_STATUS reads 0 If read of PCIE_MSI_INTR0_STATUS returns 0 at least once, then its guaranteed that the next time any vector is set in PCIE_MSI_INTR0_STATUS register(due to MSI IRQ), wrapper will latch it and raise an IRQ to CPU. >> status register before exiting handler. Otherwise next MSI IRQ will not >> be latched by the wrapper. > > I am sorry but I do not understand how this works - what is the > condition that makes wrapper latch IRQs again ? This is at least > racy, if not outright broken. > > That count == 1000 is a symptom there is something broken on how this > driver handles IRQs and I have the impression that we are applying > plasters on top of plasters to make it less broken than it actually is. > It is an upper bound on how many times driver looks at PCIE_MSI_INTR0_STATUS register, so that there is no infinite looping when there is an IRQ flood due to misbehaving EP. count == 1000 condition should not happen and it means something is wrong in the system. I haven't hit this situation in testing I can either remove this or put a WARN_ON to say this situation should not have happened, if that makes you more comfortable with the patch. >>> I suspect there is a code design flaw with the way this host handles >>> IRQs and we are going to find it and fix it the way it should, not with >>> any plaster like this patch. >>> >> >> I agree there has been some churn wrt this wrapper level IRQ handler. >> But, that was because hardware documentation/TRM did not match >> actual behavior and so it took some time to understand how the >> hardware is working. > > How does HW work :) ? Please explain in detail how this works in HW > then we will get to the code. > Software needs to ensure that PCIE_MSI_INTR0_STATUS needs be 0 by reading it. Then, when the next MSI IRQ is raised CONF_IRQSTATUS_MSI register in the wrapper will latch the next IRQ. This is my current knowledge, let me know if you need to know anything specifically, I will try to ask HW team. Regards Vignesh > Thanks, > Lorenzo > >> I have extensively tested this series on multiple problematic PCIe USB >> cards and PCIe WiFi cards over week long stress tests. And also had >> some agreement with internal hardware designers. Hardware >> documentations will also be updated. >> >> >>> Lorenzo >>> >>>> +             } >>>>                 break; >>>>         case INTA: >>>>         case INTB: >>>> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg) >>>>                 break; >>>>         } >>>>   >>>> -     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg); >>>> - >>>>         return IRQ_HANDLED; >>>>   } >>>>   >>>> -- >>>> 2.16.1 >>>> >> >> -- >> Regards >> Vignesh