Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp2680682ybv; Mon, 24 Feb 2020 09:32:31 -0800 (PST) X-Google-Smtp-Source: APXvYqxoE/sDaD6YrVEKxiDHdqcGdCg/09LxP+yjcYeOlw0gsIwUu2R6YWZEOC8C9pah+H8Yhfia X-Received: by 2002:aca:37c3:: with SMTP id e186mr106550oia.155.1582565551195; Mon, 24 Feb 2020 09:32:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582565551; cv=none; d=google.com; s=arc-20160816; b=eOfhtw54kvgz9YWqeWfEbtkN9zx1tEDhAyXqy8WfhHNXLm3N94q5aQVQtkzjpdrroS Zd4pJ3/yfbg549zqglsDPWuKz+bCfkaOkTabgDUSpFSBO7u/T+eTfOkiEgbwGiKYNhxX zjSCcM7KH8KD2Udvuu6kc9p4b5Choat3/uNE6K9lT2EazF1CNOISiCGzC70A2OpIsXob 4YF2cQZVvaMYOdtvvHKjh07NJqc10DsXUaErPS7zveCL38VFHB9aYbalbd97JD5tgUEy 7/RpXYl6lysX1uh+hEiwxovyEk8sxmlr7iVj69Fs6LKTNaGPECfYR65P/lwSxlhEjpxZ UGHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=jBUQXvyIT0NRGeVlKfYzxCNd6VTs5c/rTotbnjQlyHg=; b=uUiuyXLyzJ3YWjaQ+I6/i6BdsLv5GLKAGtiOKJhA0XnhljqPBUeub1/g/IGUU9Mbp2 JhWvlmBz44UlXslMniXkiZ+AXTRbf7cMOQrMbV76k1nfE8H0BV/10dAHSSPHkC5O1kgo DZsjcMyi+FHFBMY+VMZ6lq3Vf70cwE2vb9uOa5hmw7Ix9tQQVtyLbJAS6uixlIU2PiF2 h+g1uGw92+zaTGjox2Q2YNlABL8dc/4/yCzRGvTUwc1Bmnmc+D7BPcEr1tl9DQk5/GOD U55embS4UxCJMZWQBT/zkyny/sNYZhaPPI7Nt1MPKJiZdumC2nKBTVopE2McFSVg5TDa v3KA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SOpNRt8e; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r6si6611774otn.216.2020.02.24.09.32.17; Mon, 24 Feb 2020 09:32:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SOpNRt8e; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728095AbgBXRar (ORCPT + 99 others); Mon, 24 Feb 2020 12:30:47 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:50897 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728079AbgBXRaq (ORCPT ); Mon, 24 Feb 2020 12:30:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582565445; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jBUQXvyIT0NRGeVlKfYzxCNd6VTs5c/rTotbnjQlyHg=; b=SOpNRt8eWVIPHGZDDKgm7F13CQR+wUNSBgIQyqgDbz61GQT9G+KR+pqvF2DrC1kQLVytVY M2qLqHrPFobhwIAyb9beYeKGZruLJDsO4U4dIxzww/cdNz7qLq70j/5Uy/w6qjS5LLZO8E ubSsQN8GdMzblGuodHeWEP9Y86f8TGM= Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-350-He28CbMBOsyOeHndDL3GGQ-1; Mon, 24 Feb 2020 12:30:44 -0500 X-MC-Unique: He28CbMBOsyOeHndDL3GGQ-1 Received: by mail-il1-f200.google.com with SMTP id k9so19521305ili.8 for ; Mon, 24 Feb 2020 09:30:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jBUQXvyIT0NRGeVlKfYzxCNd6VTs5c/rTotbnjQlyHg=; b=bckP0Ya2Jw+6Tp5kW0ZJzW9pFEwCUJry7rxxY3RebaiTzKCHfPY7DgHx/ujVjvGah4 yvXZY6J5llsqUipJmnr3KDG7czzZ4hId2WOgkt5/w6TmaGGECz4l+51aD7Y3zTFQh3hk 1VgDrhgTtJ2o8pndYg900GJcsoq7aCFxEhSWbMkqPVkAxsVkdV3+fY4KcL124D/Jn23k UbluJNB8Bq3zl9GYycfFad1BI13dZtqH4yMhbDyGrMIhZeMPSHPag3E6UHwsxeyS3N82 /Cgnb32SKuqAVC1zG0PCxgYBWWs+28xHmqO9Pr24OHGgxYg0X5orRaOmrjO7oHF4p2rx Tq6g== X-Gm-Message-State: APjAAAVUw89sg4nZkb1pOYza3TGe4mD0qAbBkr7XxKXmuU154SfnmJEE CVXvhq3Ir0Mu/6ZXMC5tpqQVib4z/9HhvsEezlGREKOc4drTckmnR0wiPzfCi0J3AAbhqGi1ek+ b+GhQr+GpJ4CWd7PQ9gJ7epK59SVTJT9FySl28xtx X-Received: by 2002:a92:3a95:: with SMTP id i21mr62650035ilf.249.1582565442499; Mon, 24 Feb 2020 09:30:42 -0800 (PST) X-Received: by 2002:a92:3a95:: with SMTP id i21mr62649926ilf.249.1582565441513; Mon, 24 Feb 2020 09:30:41 -0800 (PST) MIME-Version: 1.0 References: <20191225192118.283637-1-kasong@redhat.com> <20200222165631.GA213225@google.com> In-Reply-To: <20200222165631.GA213225@google.com> From: Kairui Song Date: Tue, 25 Feb 2020 01:30:30 +0800 Message-ID: Subject: Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel To: Bjorn Helgaas , Khalid Aziz Cc: Linux Kernel Mailing List , linux-pci@vger.kernel.org, kexec@lists.infradead.org, Jerry Hoemann , Baoquan He , Deepa Dinamani , Randy Wright , Dave Young , Myron Stowe Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Thanks for the reply, I don't have any better idea than this RFC patch yet. The patch is hold as previous discussion suggests this just work around the problem, the real fix should be let crash kernel load every required kernel module and reset whichever hardware that is not in a good status. However, user may struggle to find out which driver is actually needed, and it's not practical to load all drivers in kdump kernel. (actually kdump have been trying to load as less driver as possible to save memory). So as Dave Y suggested in another reply, will it better to apply this quirk with a kernel param controlling it? If such problem happens, the option could be turned on as a fix. On Sun, Feb 23, 2020 at 12:59 AM Bjorn Helgaas wrote: > > [+cc Khalid, Deepa, Randy, Dave, Myron] > > On Thu, Dec 26, 2019 at 03:21:18AM +0800, Kairui Song wrote: > > There are reports about kdump hang upon reboot on some HPE machines, > > kernel hanged when trying to shutdown a PCIe port, an uncorrectable > > error occurred and crashed the system. > > Did we ever make progress on this? This definitely sounds like a > problem that needs to be fixed, but I don't see a resolution here. > > > On the machine I can reproduce this issue, part of the topology > > looks like this: > > > > [0000:00]-+-00.0 Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 > > +-01.0-[02]-- > > +-01.1-[05]-- > > +-02.0-[06]--+-00.0 Emulex Corporation OneConnect NIC (Skyhawk) > > | +-00.1 Emulex Corporation OneConnect NIC (Skyhawk) > > | +-00.2 Emulex Corporation OneConnect NIC (Skyhawk) > > | +-00.3 Emulex Corporation OneConnect NIC (Skyhawk) > > | +-00.4 Emulex Corporation OneConnect NIC (Skyhawk) > > | +-00.5 Emulex Corporation OneConnect NIC (Skyhawk) > > | +-00.6 Emulex Corporation OneConnect NIC (Skyhawk) > > | \-00.7 Emulex Corporation OneConnect NIC (Skyhawk) > > +-02.1-[0f]-- > > +-02.2-[07]----00.0 Hewlett-Packard Company Smart Array Gen9 Controllers > > > > When shuting down PCIe port 0000:00:02.2 or 0000:00:02.0, the machine > > will hang, depend on which device is reinitialized in kdump kernel. > > > > If force remove unused device then trigger kdump, the problem will never > > happen: > > > > echo 1 > /sys/bus/pci/devices/0000\:00\:02.2/0000\:07\:00.0/remove > > echo c > /proc/sysrq-trigger > > > > ... Kdump save vmcore through network, the NIC get reinitialized and > > hpsa is untouched. Then reboot with no problem. (If hpsa is used > > instead, shutdown the NIC in first kernel will help) > > > > The cause is that some devices are enabled by the first kernel, but it > > don't have the chance to shutdown the device, and kdump kernel is not > > aware of it, unless it reinitialize the device. > > > > Upon reboot, kdump kernel will skip downstream device shutdown and > > clears its bridge's master bit directly. The downstream device could > > error out as it can still send requests but upstream refuses it. > > > > So for kdump, let kernel read the correct hardware power state on boot, > > and always clear the bus master bit of PCI device upon shutdown if the > > device is on. PCIe port driver will always shutdown all downstream > > devices first, so this should ensure all downstream devices have bus > > master bit off before clearing the bridge's bus master bit. > > > > Signed-off-by: Kairui Song > > --- > > drivers/pci/pci-driver.c | 11 ++++++++--- > > drivers/pci/quirks.c | 20 ++++++++++++++++++++ > > 2 files changed, 28 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > > index 0454ca0e4e3f..84a7fd643b4d 100644 > > --- a/drivers/pci/pci-driver.c > > +++ b/drivers/pci/pci-driver.c > > @@ -18,6 +18,7 @@ > > #include > > #include > > #include > > +#include > > #include "pci.h" > > #include "pcie/portdrv.h" > > > > @@ -488,10 +489,14 @@ static void pci_device_shutdown(struct device *dev) > > * If this is a kexec reboot, turn off Bus Master bit on the > > * device to tell it to not continue to do DMA. Don't touch > > * devices in D3cold or unknown states. > > - * If it is not a kexec reboot, firmware will hit the PCI > > - * devices with big hammer and stop their DMA any way. > > + * If this is kdump kernel, also turn off Bus Master, the device > > + * could be activated by previous crashed kernel and may block > > + * it's upstream from shutting down. > > + * Else, firmware will hit the PCI devices with big hammer > > + * and stop their DMA any way. > > */ > > - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) > > + if ((kexec_in_progress || is_kdump_kernel()) && > > + pci_dev->current_state <= PCI_D3hot) > > pci_clear_master(pci_dev); > > } > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index 4937a088d7d8..c65d11ab3939 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -28,6 +28,7 @@ > > #include > > #include > > #include > > +#include > > #include /* isa_dma_bridge_buggy */ > > #include "pci.h" > > > > @@ -192,6 +193,25 @@ static int __init pci_apply_final_quirks(void) > > } > > fs_initcall_sync(pci_apply_final_quirks); > > > > +/* > > + * Read the device state even if it's not enabled. The device could be > > + * activated by previous crashed kernel, this will read and correct the > > + * cached state. > > + */ > > +static void quirk_read_pm_state_in_kdump(struct pci_dev *dev) > > +{ > > + u16 pmcsr; > > + > > + if (!is_kdump_kernel()) > > + return; > > + > > + if (dev->pm_cap) { > > + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > > + dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); > > + } > > +} > > +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_read_pm_state_in_kdump); > > + > > /* > > * Decoding should be disabled for a PCI device during BAR sizing to avoid > > * conflict. But doing so may cause problems on host bridge and perhaps other > > -- > > 2.24.1 > > > -- Best Regards, Kairui Song