Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp541838ybv; Sat, 22 Feb 2020 08:57:20 -0800 (PST) X-Google-Smtp-Source: APXvYqxp4lDcplVq4v7/J21JnSxlCzWgx0bxk7I2R/lIaexjMD1651VpOszGeLLibk+fN/hak8wp X-Received: by 2002:a05:6808:8ee:: with SMTP id d14mr6459602oic.138.1582390640410; Sat, 22 Feb 2020 08:57:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582390640; cv=none; d=google.com; s=arc-20160816; b=DVF1+egM2AOo6mkjuAvxI4wKVsarARvfV9dDTfvg6g0ZdgdoNQLPTOvzRtyNl465qK PpYQju3Dy1MG/YE66u6hT+pj3o4K+Mvk+pY4LWmbibX1N4BcaVhXN51U72xzRxleePox MS4Oonu7fQ+efTuOWMdJt96U5xfTtk7dZazEUdG9nH1woFGbhe9dMElUNGQPvaMnuXVc SDcPf8GUhyWCE96fa34uZ3BgMmhL4a7eBtciK6usBKAVw/CqXoR8xmFAFzDZJgegSesp 7+tE1ScUiSxczGP8B0h5YOqRkE66IvVZ/LtOKD9LIwzRnFO+GW75dMq07JB7dOos4dyl i6Sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:message-id:subject:cc:to:from:date :dkim-signature; bh=Cduj8Xzsc3fDXUmUCFLABXP+orTwHVRdbioyV+ZnZ+w=; b=caL8C8fZVmFzjbUrBAksSQavFbDEGBMB4/dWdx0jlhurUqqQwG9z8CwKHButJmXbRf nPfMSCyt2O9c7XiC2laOsfZL2gacqpuNO1vu06TrpJhYh6f4CD1Oyou1l6nu0d9VCLeW tNhMqQf2snenjGVga30EtpArFG4zzOJLr7bFbTlpLclpf33Pps5us7nkx13DrUzCKXfZ dAzWNmnsu3NYfUftILvm2HhxJYV2IUv3BZHLUBeTKJ8aUb/nHX3g1HOtMSG0Iel3tGRU 9egq2R/4uhcocqBAjj+lYwjrrW4kZGa9U+1djVrRfz1oXZnfblutmZqzLFN10H0hV0wV kSEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=RpTAVEh3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u21si3458375otq.137.2020.02.22.08.57.07; Sat, 22 Feb 2020 08:57:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=RpTAVEh3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726856AbgBVQ4f (ORCPT + 99 others); Sat, 22 Feb 2020 11:56:35 -0500 Received: from mail.kernel.org ([198.145.29.99]:55980 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726550AbgBVQ4f (ORCPT ); Sat, 22 Feb 2020 11:56:35 -0500 Received: from localhost (mobile-166-175-186-165.mycingular.net [166.175.186.165]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 55EEA20707; Sat, 22 Feb 2020 16:56:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1582390593; bh=98ab+zFtjpRV2/2XwjgNDpcfq7s8dB9xZ51dSQ4UUdc=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=RpTAVEh3HRsVrFVosUYVg3uSxrUkXCNyfNQ/LQAje769YVu6i57ohkDxwAkXJI424 xnxMU9g2yXNPJkgcq6jBqXYI/BMBCucFB3QDcwVu/jP5JM1yp8IWsqYK44FzCrJUyI e+mcyMIGw0+s42hhanEruxDO1tLo/glA1YEPABTQ= Date: Sat, 22 Feb 2020 10:56:31 -0600 From: Bjorn Helgaas To: Kairui Song Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kexec@lists.infradead.org, Jerry Hoemann , Baoquan He , Khalid Aziz , Deepa Dinamani , Randy Wright , Dave Young , Myron Stowe Subject: Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel Message-ID: <20200222165631.GA213225@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191225192118.283637-1-kasong@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+cc Khalid, Deepa, Randy, Dave, Myron] On Thu, Dec 26, 2019 at 03:21:18AM +0800, Kairui Song wrote: > There are reports about kdump hang upon reboot on some HPE machines, > kernel hanged when trying to shutdown a PCIe port, an uncorrectable > error occurred and crashed the system. Did we ever make progress on this? This definitely sounds like a problem that needs to be fixed, but I don't see a resolution here. > On the machine I can reproduce this issue, part of the topology > looks like this: > > [0000:00]-+-00.0 Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 > +-01.0-[02]-- > +-01.1-[05]-- > +-02.0-[06]--+-00.0 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.1 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.2 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.3 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.4 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.5 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.6 Emulex Corporation OneConnect NIC (Skyhawk) > | \-00.7 Emulex Corporation OneConnect NIC (Skyhawk) > +-02.1-[0f]-- > +-02.2-[07]----00.0 Hewlett-Packard Company Smart Array Gen9 Controllers > > When shuting down PCIe port 0000:00:02.2 or 0000:00:02.0, the machine > will hang, depend on which device is reinitialized in kdump kernel. > > If force remove unused device then trigger kdump, the problem will never > happen: > > echo 1 > /sys/bus/pci/devices/0000\:00\:02.2/0000\:07\:00.0/remove > echo c > /proc/sysrq-trigger > > ... Kdump save vmcore through network, the NIC get reinitialized and > hpsa is untouched. Then reboot with no problem. (If hpsa is used > instead, shutdown the NIC in first kernel will help) > > The cause is that some devices are enabled by the first kernel, but it > don't have the chance to shutdown the device, and kdump kernel is not > aware of it, unless it reinitialize the device. > > Upon reboot, kdump kernel will skip downstream device shutdown and > clears its bridge's master bit directly. The downstream device could > error out as it can still send requests but upstream refuses it. > > So for kdump, let kernel read the correct hardware power state on boot, > and always clear the bus master bit of PCI device upon shutdown if the > device is on. PCIe port driver will always shutdown all downstream > devices first, so this should ensure all downstream devices have bus > master bit off before clearing the bridge's bus master bit. > > Signed-off-by: Kairui Song > --- > drivers/pci/pci-driver.c | 11 ++++++++--- > drivers/pci/quirks.c | 20 ++++++++++++++++++++ > 2 files changed, 28 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 0454ca0e4e3f..84a7fd643b4d 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > #include "pci.h" > #include "pcie/portdrv.h" > > @@ -488,10 +489,14 @@ static void pci_device_shutdown(struct device *dev) > * If this is a kexec reboot, turn off Bus Master bit on the > * device to tell it to not continue to do DMA. Don't touch > * devices in D3cold or unknown states. > - * If it is not a kexec reboot, firmware will hit the PCI > - * devices with big hammer and stop their DMA any way. > + * If this is kdump kernel, also turn off Bus Master, the device > + * could be activated by previous crashed kernel and may block > + * it's upstream from shutting down. > + * Else, firmware will hit the PCI devices with big hammer > + * and stop their DMA any way. > */ > - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) > + if ((kexec_in_progress || is_kdump_kernel()) && > + pci_dev->current_state <= PCI_D3hot) > pci_clear_master(pci_dev); > } > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 4937a088d7d8..c65d11ab3939 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > #include /* isa_dma_bridge_buggy */ > #include "pci.h" > > @@ -192,6 +193,25 @@ static int __init pci_apply_final_quirks(void) > } > fs_initcall_sync(pci_apply_final_quirks); > > +/* > + * Read the device state even if it's not enabled. The device could be > + * activated by previous crashed kernel, this will read and correct the > + * cached state. > + */ > +static void quirk_read_pm_state_in_kdump(struct pci_dev *dev) > +{ > + u16 pmcsr; > + > + if (!is_kdump_kernel()) > + return; > + > + if (dev->pm_cap) { > + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > + dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); > + } > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_read_pm_state_in_kdump); > + > /* > * Decoding should be disabled for a PCI device during BAR sizing to avoid > * conflict. But doing so may cause problems on host bridge and perhaps other > -- > 2.24.1 >