Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp18343169ybl; Fri, 3 Jan 2020 00:00:50 -0800 (PST) X-Google-Smtp-Source: APXvYqwHRoHfMxwoKRsvwW4Zj/h42xzruMJdw0gOOVWtxGOBdWtFJxcrF7BOv7G3Czw1x63uBHWC X-Received: by 2002:a9d:4706:: with SMTP id a6mr90438031otf.331.1578038450146; Fri, 03 Jan 2020 00:00:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578038450; cv=none; d=google.com; s=arc-20160816; b=aaU3ctCA1sHJhdmyCP9ZPUB1tW07YFLAGmRFDBpNwOZs0dQ/QWUX8LDpltEwOwH8Xc ML+vO0h5GAEpSNWPDRInybNHOjshFymGtWrH4S0u4s1Cf9xnfT2QFPa2AeoSNTlBRnKU 0GYkYuytYdUYgst1loDxCN7SzRVx15zxBjwDmB+FD6/kTOSNZgYwsQ7xkiW9h0mqxmsZ JtQTdloRJl4zsTdTMnJjgaxPiUW2j5sy+eDGC7TOI7T+f0A5PFcUd+rYYANP22BuFtN9 OV5IAK25drpr+/Zy/vnIE3ItwHLuJTinZxmG9PbdQOyPSPIR/CsnaCJT0yyOy6hRlA9A 7V4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=oTkOJXYtsJl5gqpboh4tFkRUS1dSPk+5aoyVvRoTwh8=; b=z4vbPhgPHV5afuwYAOLN8SEfVseP2J0s7tadyhHGBlFPYoXQz4b6lelolplwQs+n2w Rg0HklmbJvSVMimKKL3kmelG9unD7dR/omLlKKokl98SqvUrXMDkvtNDIZ2kT9sXm89z rNycXZMOKr65m70Y1zDq5NTQ/0Cr996YHc21kWUrkTPNVrj8NJQnMI1W3WLU7JLEZnoP NWIAzRoNWu335PP+iOUt9ETnQVr0bSUlpAqXahCiAxH0kI5yCJFUoQcdYi/XA4+pTcNJ m/m1RlTWeQmG/JeWz+BiSvA9zmibdnsSd46scR60MNAhN38KKqofWR4z7SPcD+81kW1R KGHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EICipj7S; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o5si27216173otl.185.2020.01.03.00.00.37; Fri, 03 Jan 2020 00:00:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EICipj7S; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727363AbgACH7G (ORCPT + 99 others); Fri, 3 Jan 2020 02:59:06 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:46353 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725890AbgACH7F (ORCPT ); Fri, 3 Jan 2020 02:59:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1578038343; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oTkOJXYtsJl5gqpboh4tFkRUS1dSPk+5aoyVvRoTwh8=; b=EICipj7SXVtI1aXY605WXJAIR3MgwYMbLQh4zuUNti4uv2SyTJ04oUC676XSC1hP1/wzqY gt6wLVTll2xny9Qu5g4tTgR9iR5NFilo2bwAB6DIpFt2+bvm/vx42YiGQqQC2ECslp55WM +CChECYa1PjRLrRAdDtJdOur13ik/Ck= Received: from mail-io1-f71.google.com (mail-io1-f71.google.com [209.85.166.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-192-Dw7--kr7Mu2VFsI80sm5-g-1; Fri, 03 Jan 2020 02:59:02 -0500 X-MC-Unique: Dw7--kr7Mu2VFsI80sm5-g-1 Received: by mail-io1-f71.google.com with SMTP id u6so3836944iog.21 for ; Thu, 02 Jan 2020 23:59:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oTkOJXYtsJl5gqpboh4tFkRUS1dSPk+5aoyVvRoTwh8=; b=XZHlmYC2zHo11A3NIHOpxk56GjwaegTjFU5+trI0+ynzYTCoJ3qXx+xWnQ/jzgfMHv ot9M3o1JXpe3TfOiPpAfaYC08nU3DVGYKiLlgJaG+SPAQt9jhqrvu7F+Cps4xVHGdToo hu0ln/BQAh33wxscR4OO6Yx2prsLCYpRa72Oo+dv7s0zKqNqc8qGk5Y03tuzCJ5EtMS9 lfYIjatpXqg6wgEF2OUFLLAj0AWl+TA8cmBl2rUGqTlutfApHFw+uy+7761l+GoO4a47 bumrQUDmTAehjvbteGK12Ki98X4k5TfwfVoK5zsKv1m4yQKL0yDwuUkNxwovYOrUtmGE aC7g== X-Gm-Message-State: APjAAAWK0hB+L5txfmHKTYdFuLKVYG7rVU1nomVO0PzTRNtMpWz2F08F QBxUK/+nuorDQufM3z3vaXuq5J5VO8ZeKdKv7vyOtq+TyUct8MisLa4W6Fe2SCGd7WI/cCdjKaa ROQp7nbkODgYEOnHxEJwT5wPsFeKHsKsAwmlFcbPg X-Received: by 2002:a6b:c742:: with SMTP id x63mr54288773iof.162.1578038342044; Thu, 02 Jan 2020 23:59:02 -0800 (PST) X-Received: by 2002:a6b:c742:: with SMTP id x63mr54288762iof.162.1578038341758; Thu, 02 Jan 2020 23:59:01 -0800 (PST) MIME-Version: 1.0 References: <20191225192118.283637-1-kasong@redhat.com> In-Reply-To: <20191225192118.283637-1-kasong@redhat.com> From: Kairui Song Date: Fri, 3 Jan 2020 15:58:50 +0800 Message-ID: Subject: Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel To: Linux Kernel Mailing List , linux-pci@vger.kernel.org Cc: Bjorn Helgaas , kexec@lists.infradead.org, Jerry Hoemann , Baoquan He Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 26, 2019 at 3:21 AM Kairui Song wrote: > > There are reports about kdump hang upon reboot on some HPE machines, > kernel hanged when trying to shutdown a PCIe port, an uncorrectable > error occurred and crashed the system. > > On the machine I can reproduce this issue, part of the topology > looks like this: > > [0000:00]-+-00.0 Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 > +-01.0-[02]-- > +-01.1-[05]-- > +-02.0-[06]--+-00.0 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.1 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.2 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.3 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.4 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.5 Emulex Corporation OneConnect NIC (Skyhawk) > | +-00.6 Emulex Corporation OneConnect NIC (Skyhawk) > | \-00.7 Emulex Corporation OneConnect NIC (Skyhawk) > +-02.1-[0f]-- > +-02.2-[07]----00.0 Hewlett-Packard Company Smart Array Gen9 Controllers > > When shuting down PCIe port 0000:00:02.2 or 0000:00:02.0, the machine > will hang, depend on which device is reinitialized in kdump kernel. > > If force remove unused device then trigger kdump, the problem will never > happen: > > echo 1 > /sys/bus/pci/devices/0000\:00\:02.2/0000\:07\:00.0/remove > echo c > /proc/sysrq-trigger > > ... Kdump save vmcore through network, the NIC get reinitialized and > hpsa is untouched. Then reboot with no problem. (If hpsa is used > instead, shutdown the NIC in first kernel will help) > > The cause is that some devices are enabled by the first kernel, but it > don't have the chance to shutdown the device, and kdump kernel is not > aware of it, unless it reinitialize the device. > > Upon reboot, kdump kernel will skip downstream device shutdown and > clears its bridge's master bit directly. The downstream device could > error out as it can still send requests but upstream refuses it. > > So for kdump, let kernel read the correct hardware power state on boot, > and always clear the bus master bit of PCI device upon shutdown if the > device is on. PCIe port driver will always shutdown all downstream > devices first, so this should ensure all downstream devices have bus > master bit off before clearing the bridge's bus master bit. > > Signed-off-by: Kairui Song > --- > drivers/pci/pci-driver.c | 11 ++++++++--- > drivers/pci/quirks.c | 20 ++++++++++++++++++++ > 2 files changed, 28 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 0454ca0e4e3f..84a7fd643b4d 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > #include "pci.h" > #include "pcie/portdrv.h" > > @@ -488,10 +489,14 @@ static void pci_device_shutdown(struct device *dev) > * If this is a kexec reboot, turn off Bus Master bit on the > * device to tell it to not continue to do DMA. Don't touch > * devices in D3cold or unknown states. > - * If it is not a kexec reboot, firmware will hit the PCI > - * devices with big hammer and stop their DMA any way. > + * If this is kdump kernel, also turn off Bus Master, the device > + * could be activated by previous crashed kernel and may block > + * it's upstream from shutting down. > + * Else, firmware will hit the PCI devices with big hammer > + * and stop their DMA any way. > */ > - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) > + if ((kexec_in_progress || is_kdump_kernel()) && > + pci_dev->current_state <= PCI_D3hot) > pci_clear_master(pci_dev); > } > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 4937a088d7d8..c65d11ab3939 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > #include /* isa_dma_bridge_buggy */ > #include "pci.h" > > @@ -192,6 +193,25 @@ static int __init pci_apply_final_quirks(void) > } > fs_initcall_sync(pci_apply_final_quirks); > > +/* > + * Read the device state even if it's not enabled. The device could be > + * activated by previous crashed kernel, this will read and correct the > + * cached state. > + */ > +static void quirk_read_pm_state_in_kdump(struct pci_dev *dev) > +{ > + u16 pmcsr; > + > + if (!is_kdump_kernel()) > + return; > + > + if (dev->pm_cap) { > + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > + dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); > + } > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_read_pm_state_in_kdump); > + > /* > * Decoding should be disabled for a PCI device during BAR sizing to avoid > * conflict. But doing so may cause problems on host bridge and perhaps other > -- > 2.24.1 > Let me make a simplified version of the commit message: On some HPE machines, If kernel clears the bus master bit of a PCI bridge, but some downstream device of it is still alive, the downstream device could error out and crash the system. Usually this never happen, as PCI devices are always shutdown before their upstream bridge is shutdown. But in kdump kernel, the real hardware state is out of sync with the data in kernel, as previous kernel could enable some hardware and kdump kernel is unaware of it, so kdump kernel just skipped the downstream device shutdown. This patch is trying to fix this issue.