Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4763211pxb; Tue, 2 Nov 2021 15:35:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxume7NuohwjHMiqBhhmyeI7VTMP5B1wlqGPiCYw+M5WDHsXORgZdWsQwIUzA1Uqvtal7Le X-Received: by 2002:a5d:83c7:: with SMTP id u7mr27427725ior.80.1635892540610; Tue, 02 Nov 2021 15:35:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635892540; cv=none; d=google.com; s=arc-20160816; b=HEHYlQNVM99hQeLEIUgSRI1nGnWbT3vRHcifV1IkoIVzOplkNqDVHBXsLXcUymr403 PGxZVWT/TTcWUw5TcFYmg7VeeTDXY+ZuthFp/JWfQBCpsNuFT8IE2ZuA8G5CL5XQ6Eq4 2dg4tqUOVolaSsVy8+KlWWqw4M3aG3fVcTO4Oe2N+Rqk3RjyFFxZGuVUi/V5gGQk3+NC xymRllYPT9vp1vB32fkpkfu0VOfBkqBBLlDn03N0OhFjZiwksVOyREUzr4ThPSmcbewQ 7bx1SOH8rJ3aV15O0pASE6la/NybmP5Iv7Hp7sSHs15Pz6o8CtCu7GWti4qlzvSqQ8TG KR0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=Xwtc4AG4ns9BNt1daEKPK/m7atgUAEmUhEsvx++/HZw=; b=cRE03y/EdUHFjigK4aj81ww7v9eaw7mpzdJagGFhp4Ix+eu/CL5w2oXLbyRtl+Oakb eIY6uZGLUEvOz3tGDJNPCQb//Lp7ZnNswvWPG250/s1e9gF37xFBd5xgfX/eR1tIg7pV IKgk/d/7jGXX4GXCmPvgoVcqVifP77VgRrBoIuZXUMXR9dGleA/kckrfWiDWxyUsCR38 FIK6Jq0GiYyIkUWGjQ/qVZwTXzV2qdt7xjEzP4SwDw0k/xBqBzPLv3r+hOjAvVf/0mII BbTTmREnAQAB6CxTOXW1yfT5qFjtKgMcEE9yvB6clwDhSHJg3/abecMXwohfA8s9VLG+ tzGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HaWtAZis; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v5si518493jas.94.2021.11.02.15.35.27; Tue, 02 Nov 2021 15:35:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HaWtAZis; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231206AbhKBWgj (ORCPT + 99 others); Tue, 2 Nov 2021 18:36:39 -0400 Received: from mail.kernel.org ([198.145.29.99]:47172 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229835AbhKBWgi (ORCPT ); Tue, 2 Nov 2021 18:36:38 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 1C69561053; Tue, 2 Nov 2021 22:34:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1635892443; bh=xdhIBZKPp6GOknBcvDS3bNXUcF7QFXzPUbqIM4rpfMc=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=HaWtAZisLGv+tlpNXhDA1RgThrW/BaKiZzztkr+PCC7vjblqaCu+PL7HBHC4FT8pG up6ahRHmLot5hKXorarKcBKuFKog1VKLj3szc+RQUlP4BJaOHEpyVfFb4IVdF0sHQp rnZ+PDzM2CoR/DVTb6oaB8NZTJDgrYPhhYaIsGtVVPlzcExFvTjeEg/PbE54HtVe6a 7BF5avIilsBnypcNY/q7Z3UEdPBq7CHpe5sx8f0SPbuXs5vGPbMFE6WM8gawaKv4fP Zf4g1OIAlUSh1M46/26+3TomQpXcqvvH8fUXgzvJHk1aE12sSW2qltfjJtFfGLZEcS 4gpgaSJUmMkuA== Date: Tue, 2 Nov 2021 17:34:01 -0500 From: Bjorn Helgaas To: "Bao, Joseph" Cc: Bjorn Helgaas , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Stuart Hayes , Lukas Wunner Subject: Re: HW power fault defect cause system hang on kernel 5.4.y Message-ID: <20211102223401.GA651784@bhelgaas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+cc Stuart, author of 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race"), Lukas, pciehp expert] On Tue, Nov 02, 2021 at 03:45:00AM +0000, Bao, Joseph wrote: > Hi, dear kernel developer, > > Recently we encounter system hang (dead spinlock) when move to > kernel linux-5.4.y. > > Finally, we use bisect to locate the suspicious commit > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=4667358dab9cc07da044d5bc087065545b1000df. 4667358dab9c backported upstream commit 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race") to v5.4.69 just over a year ago. > Our system has some HW defect, which will wrongly set > PCI_EXP_SLTSTA_PFD high, and this commit will lead to infinite loop > jumping to read_status (no chance to clear status PCI_EXP_SLTSTA_PFD > bit since ctrl is not updated), I know this is our HW defect, but > this commit makes kernel trapped in this isr function and leads to > kernel hang (then the user could not get useful information to show > what's wrong), which I think is not expected behavior, so I would > like to report to you for discussion. I guess this happens because the first time we handle PFD, pciehp_ist() sets "ctrl->power_fault_detected = 1", and when power_fault_detected is set, pciehp_isr() won't clear PFD from PCI_EXP_SLTSTA? It looks like the only place we clear power_fault_detected is in pciehp_power_on_slot(), and I don't think we call that unless we have a presence detect or link status change. It would definitely be nice if we could arrange so this hardware defect didn't cause a kernel hang. I think the diff below is the backport of 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race"). > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index 356786a3b7f4b..88b996764ff95 100644 > --- a/https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/pci/hotplug/pciehp_hpc.c?h=linux-5.4.y&id=ca767cf0152d18fc299cde85b18d1f46ac21e1ba > +++ b/https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/pci/hotplug/pciehp_hpc.c?h=linux-5.4.y&id=4667358dab9cc07da044d5bc087065545b1000df > @@ -529,7 +529,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > struct controller *ctrl = (struct controller *)dev_id; > struct pci_dev *pdev = ctrl_dev(ctrl); > struct device *parent = pdev->dev.parent; > - u16 status, events; > + u16 status, events = 0; > > /* > * Interrupts only occur in D3hot or shallower and only if enabled > @@ -554,6 +554,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > } > } > > +read_status: > pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status); > if (status == (u16) ~0) { > ctrl_info(ctrl, "%s: no response from device\n", __func__); > @@ -566,24 +567,37 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > * Slot Status contains plain status bits as well as event > * notification bits; right now we only want the event bits. > */ > - events = status & (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD | > - PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC | > - PCI_EXP_SLTSTA_DLLSC); > + status &= PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD | > + PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC | > + PCI_EXP_SLTSTA_DLLSC; > > /* > * If we've already reported a power fault, don't report it again > * until we've done something to handle it. > */ > if (ctrl->power_fault_detected) > - events &= ~PCI_EXP_SLTSTA_PFD; > + status &= ~PCI_EXP_SLTSTA_PFD; > > + events |= status; > if (!events) { > if (parent) > pm_runtime_put(parent); > return IRQ_NONE; > } > > - pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events); > + if (status) { > + pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events); > + > + /* > + * In MSI mode, all event bits must be zero before the port > + * will send a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4). > + * So re-read the Slot Status register in case a bit was set > + * between read and write. > + */ > + if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode) > + goto read_status; > + } > + > ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events); > if (parent) > pm_runtime_put(parent);