Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp1420956ybc; Tue, 12 Nov 2019 21:11:01 -0800 (PST) X-Google-Smtp-Source: APXvYqyiQYrHuWW5fXyi/cRRxpSAzQp/7EgvCONYz5IibtRAwfQ0YVLyn5zQfWKkih0KuOSvYD2C X-Received: by 2002:a17:906:4c8c:: with SMTP id q12mr998320eju.256.1573621861693; Tue, 12 Nov 2019 21:11:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573621861; cv=none; d=google.com; s=arc-20160816; b=X+da//0RAah/rc6PDWwqQRY73Z+/fukW942poBXI7738K4CfrMTh1IDfb8QkjfaOqN 7n9y1rR9GmTVZIc8rE4Br75x0mucOpVCJB+6W+2l715J5Rwrafs2Za5EE50XAf/Sq5v4 Av8Vp/gJvEkWYkGaEv/WmVqYbpmCQr+AdLj5IXO+bVI0Myi00iGcHQJT4yT51z5mSSsi rdaC9YJmEp3ZpNfBStfVVbrVLiUNBwEvVnaiy1A3mx7igSGNGfj6mbXiif8n86mAda2Y 389jzeOwc7vIojK01spbQzfGCh2I8JUyT1QunLmMQO7uO8ZKMiFNGJ29uTvKjzhhXhx/ oZ4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=S+/lWZh65vz0fKZTyFgcDbe4k+Nog17TVnoSILY3TUg=; b=h56rFlaTN+VqARDnwFE5PvxCrzvJTi/JFNR2P1mTfuHmaJg42yn3h+EiiSYe+O0lKx PdKPxADNiREkY2bnMSQVP4mSYOyd+WSMoJRLiKXU759nhKX6d4HD07RVo0vzuMmnsadd 0kG4ve3CRHbbwTSqDUjAJkKM1J0khEXVAnlDFFepApb+wMH61s9fYpKY1EYWV5AHlJK5 LvVSJsItaEWEOscH5REHlOxHlcaTgV1GQ6dXlrRLWgoC7H2qXBIbA+pZyY1XmXAA/w5K uzM/9tFOkOHdr3FvwmgQ5UgQrc923qsUdo9cR3oKPhKSuGj3sujfYtEeOPNPmh3GCeLA 4i/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z65si567650ede.388.2019.11.12.21.10.34; Tue, 12 Nov 2019 21:11:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726074AbfKMFJt (ORCPT + 99 others); Wed, 13 Nov 2019 00:09:49 -0500 Received: from bmailout2.hostsharing.net ([83.223.78.240]:51099 "EHLO bmailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725446AbfKMFJt (ORCPT ); Wed, 13 Nov 2019 00:09:49 -0500 X-Greylist: delayed 606 seconds by postgrey-1.27 at vger.kernel.org; Wed, 13 Nov 2019 00:09:47 EST Received: from h08.hostsharing.net (h08.hostsharing.net [IPv6:2a01:37:1000::53df:5f1c:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id 763252800021B; Wed, 13 Nov 2019 05:59:39 +0100 (CET) Received: by h08.hostsharing.net (Postfix, from userid 100393) id 38CC8477C16; Wed, 13 Nov 2019 05:59:39 +0100 (CET) Date: Wed, 13 Nov 2019 05:59:39 +0100 From: Lukas Wunner To: Stuart Hayes Cc: Bjorn Helgaas , Austin Bolen , keith.busch@intel.com, Alexandru Gagniuc , "Rafael J . Wysocki" , Mika Westerberg , Andy Shevchenko , "Gustavo A . R . Silva" , Sinan Kaya , Oza Pawandeep , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] PCI: pciehp: Make sure pciehp_isr clears interrupt events Message-ID: <20191113045939.hhmzfbx46vkgmsvn@wunner.de> References: <20191112215938.1145-1-stuart.w.hayes@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191112215938.1145-1-stuart.w.hayes@gmail.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 12, 2019 at 04:59:38PM -0500, Stuart Hayes wrote: > The pciehp interrupt handler pciehp_isr() will read the slot status > register and then write back to it to clear just the bits that caused the > interrupt. If a different interrupt event bit gets set between the read and > the write, pciehp_isr() will exit without having cleared all of the > interrupt event bits, so we will never get another hotplug interrupt from > that device. The IRQ is masked when it occurs and unmasked after it's been handled. See the invocation of mask_ack_irq() in handle_edge_irq() and handle_level_irq() in kernel/irq/chip.c. If the IRQ has fired in-between, the IRQ chip is expected to invoke the IRQ handler again. There is some support for IRQ chips not capable of replaying interrupts in kernel/irq/resend.c, but in general, if you do not get another hotplug interrupt, the hardware is broken. What kind of IRQ chip are you using and what kind of chip is the hotplug port built into? I'm not opposed to quirks to support such broken hardware but the commit message shouldn't purport that it's normal behavior and the quirk should only be executed where necessary and be made explicit in the code to be a quirk. Thanks, Lukas > > That is expected behavior according to the PCI Express spec (v.5.0, section > 6.7.3.4, "Software Notification of Hot-Plug Events"). > > Because the "presence detect changed" and "data link layer state changed" > event bits are both getting set at nearly the same time when a device is > added or removed, this is more likely to happen than it might seem. The > issue can be reproduced rather easily by connecting and disconnecting an > NVMe device on at least one system model. > > This patch fixes the issue by modifying pciehp_isr() to loop back and > re-read the slot status register immediately after writing to it, until > it sees that all of the event status bits have been cleared. > > Signed-off-by: Stuart Hayes > --- > drivers/pci/hotplug/pciehp_hpc.c | 39 +++++++++++++++++++++++++++----- > 1 file changed, 33 insertions(+), 6 deletions(-) > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index 1a522c1c4177..8ec22a872b28 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -487,12 +487,21 @@ void pciehp_power_off_slot(struct controller *ctrl) > PCI_EXP_SLTCTL_PWR_OFF); > } > > +/* > + * We should never need to re-read the slot status register this many times, > + * because there are only six possible events that could generate this > + * interrupt. If we still see events after this many reads, there's a stuck > + * bit. > + */ > +#define MAX_ISR_STATUS_READS 6 > + > static irqreturn_t pciehp_isr(int irq, void *dev_id) > { > struct controller *ctrl = (struct controller *)dev_id; > struct pci_dev *pdev = ctrl_dev(ctrl); > struct device *parent = pdev->dev.parent; > - u16 status, events; > + u16 status, events = 0; > + int status_reads = 0; > > /* > * Interrupts only occur in D3hot or shallower and only if enabled > @@ -517,6 +526,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > } > } > > +read_status: > pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status); > if (status == (u16) ~0) { > ctrl_info(ctrl, "%s: no response from device\n", __func__); > @@ -529,16 +539,34 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > * Slot Status contains plain status bits as well as event > * notification bits; right now we only want the event bits. > */ > - events = status & (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD | > - PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC | > - PCI_EXP_SLTSTA_DLLSC); > + status &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD | > + PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC | > + PCI_EXP_SLTSTA_DLLSC); > > /* > * If we've already reported a power fault, don't report it again > * until we've done something to handle it. > */ > if (ctrl->power_fault_detected) > - events &= ~PCI_EXP_SLTSTA_PFD; > + status &= ~PCI_EXP_SLTSTA_PFD; > + > + if (status) { > + pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, status); > + events |= status; > + } > + > + /* > + * All of the event bits must be zero before the port will send > + * a new interrupt. If an event bit gets set between the read > + * and the write, we'll never get another interrupt, so loop until > + * we see no event bits set. > + */ > + if (status && status_reads++ < MAX_ISR_STATUS_READS) > + goto read_status; > + > + if (status_reads == MAX_ISR_STATUS_READS) > + ctrl_warn(ctrl, "Slot(%s): Hot plug event bit stuck, reading %x\n", > + status, slot_name(ctrl)); > > if (!events) { > if (parent) > @@ -546,7 +574,6 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > return IRQ_NONE; > } > > - pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events); > ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events); > if (parent) > pm_runtime_put(parent); > -- > 2.18.1 >