Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4177688ybl; Mon, 13 Jan 2020 09:08:50 -0800 (PST) X-Google-Smtp-Source: APXvYqxOeSQs+gps1OhNoTX0nrD1KOAz9M5k7PCw4xCIZ0AsOikF9tuXHfhybIGGi4RDy4zjbaor X-Received: by 2002:a05:6830:1d4c:: with SMTP id p12mr14160834oth.198.1578935330390; Mon, 13 Jan 2020 09:08:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578935330; cv=none; d=google.com; s=arc-20160816; b=RhwtxT8p+gyL3z3L1pl+jDlr38Tlkr7eqoFJtmTOuvOPGrZxbdTL0a5R57TSqElxIu jkyfQbbWNBFD0nlPpm92nzEEqKJbivL4Xgv2ThheyeyCwnXhzCNkhfJIq5Mqa9mcK2dP 6ZjUlNthJD9uejdZaykvllqFUCtWJtK1Xknfa6BsN6ltQ4bGsf0Al/fJJV6Z6uyrZ/im XutTdxHTcRlvyq15xccQdRi/K+VeyCoBhm8nhlitZgxbRLTbidMYfQfo37NYseiTM3vL C+M440zc59gnh3aebC+MbGtBA6PDJYnGzg5SfqxjFgNDlBFNMfGZ3Cu0sW0fl65dCHyj dk9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=gx9m20Qk3Xew8B11Dz3yhd4x1rylP5kEcfScufxzD+g=; b=gtiy1T/s8LouPknPCylMW9ivVMjOzv42ttcDJ+iSisfQQd3eKij5kNDTiiNjnWnB51 z5o6Swo9dCzfB3CiEURI7UUFS5MACOaLyHkjX2SpuOjK6uExbnFbfTjTw8o0zRuYoImn RG1hcMLqb0XS93Bl15JsLy9pPN9wSDlEmXWv9y1hlOkzcXvUz5d6TGFjDlZKNjX7GLiP lRAtEGWfGTooHZ69yBV5i+Igq9I8uhmc4STervAIPjCO/LN7UkO2WBv10Nu2ujfSmmCJ CXXmkyWQKl0ih66dp1l2/5ewhOA9LmZDY7o+6VgoZf9JSUhDvrTj2tLbY+5+zE9fXr8J cUCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="RBov/H/j"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w202si5577352oiw.194.2020.01.13.09.08.37; Mon, 13 Jan 2020 09:08:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="RBov/H/j"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728688AbgAMRHZ (ORCPT + 99 others); Mon, 13 Jan 2020 12:07:25 -0500 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:27679 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727331AbgAMRHZ (ORCPT ); Mon, 13 Jan 2020 12:07:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1578935242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=gx9m20Qk3Xew8B11Dz3yhd4x1rylP5kEcfScufxzD+g=; b=RBov/H/jkFNElYMRA2J0DXi/GnGTyFNjWKK4KfeRwphylq6jkbWQ02SfKaHU2ZHDrn8roY zAU53vWK5aI/nd2UL84cPghnBDrD4jnNsi5oEAYIHdSrRpdyAPQCH8wD/yMJF/9XLAp04Z FtevAy7PT0IW9IKoIUhokJFRtrwxzuY= Received: from mail-io1-f72.google.com (mail-io1-f72.google.com [209.85.166.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-298-VxsBXtw7OSOTi4Tv4t57Kw-1; Mon, 13 Jan 2020 12:07:21 -0500 X-MC-Unique: VxsBXtw7OSOTi4Tv4t57Kw-1 Received: by mail-io1-f72.google.com with SMTP id u6so6158892iog.21 for ; Mon, 13 Jan 2020 09:07:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gx9m20Qk3Xew8B11Dz3yhd4x1rylP5kEcfScufxzD+g=; b=J/cefjMMQhk3hdmOss208YSbJU3+L9udFqWbDXPIVihzR4kAceMI4sSlIJIzTSG3Ff NeFEghj3wj3zOaM86l9sricDXvnSRs/QTIxGBhcVuRgLd8tx5qDndGIYoYQASl67NO1u DvCl6MGxegi8sebFsckEVrytVDLInAOUAsyftNsowF1ioRsYtm+A/Ldl3ziqGvYvRZ3Q Jix2iOzM118+mJaBKlFhWatd9PHet0nVk008Jt1upizRAPdbzFPJYItUGforLeYzqeQF EoWJ5s+yuNgPT5KrAzT5O0QoNIS9//QAorYbtdwAKUioaDSYGL7bp+varSPwdq2YQw44 hvAQ== X-Gm-Message-State: APjAAAX9P3c/ZmTdRTNPwyrqswMCuDHgHZQCmWm+/qQIe55M3goXWPG7 +Bq6zZU90dGGhLbmX52JguwK1+pwBQ3uCJ2CHKSFxeBoBp13A3R6KAB9vGMujj56T3vbermnU/H E+uobwWIVT0Yy0XIDZikI5i6aNUhaou2fuL9WddAU X-Received: by 2002:a92:3a95:: with SMTP id i21mr15325106ilf.249.1578935241109; Mon, 13 Jan 2020 09:07:21 -0800 (PST) X-Received: by 2002:a92:3a95:: with SMTP id i21mr15325082ilf.249.1578935240847; Mon, 13 Jan 2020 09:07:20 -0800 (PST) MIME-Version: 1.0 References: <20200110214217.GA88274@google.com> <20200110230003.GB1875851@anatevka.americas.hpqcorp.net> <20200111005041.GB19291@MiWiFi-R3L-srv> In-Reply-To: From: Kairui Song Date: Tue, 14 Jan 2020 01:07:09 +0800 Message-ID: Subject: Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel To: Deepa Dinamani Cc: Khalid Aziz , Baoquan He , Jerry Hoemann , Bjorn Helgaas , linux-pci@vger.kernel.org, kexec@lists.infradead.org, Linux Kernel Mailing List , Randy Wright Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 12, 2020 at 2:33 AM Deepa Dinamani wrote: > > > Hi, there are some previous works about this issue, reset PCI devices > > in kdump kernel to stop ongoing DMA: > > > > [v7,0/5] Reset PCIe devices to address DMA problem on kdump with iommu > > https://lore.kernel.org/patchwork/cover/343767/ > > > > [v2] PCI: Reset PCIe devices to stop ongoing DMA > > https://lore.kernel.org/patchwork/patch/379191/ > > > > And didn't get merged, that patch are trying to fix some DMAR error > > problem, but resetting devices is a bit too destructive, and the > > problem is later fixed in IOMMU side. And in most case the DMA seems > > harmless, as they targets first kernel's memory and kdump kernel only > > live in crash memory. > > I was going to ask the same. If the kdump kernel had IOMMU on, would > that still be a problem? It will still fail, doing DMA is not a problem, it only go wrong when a device's upstream bridge is mistakenly shutdown before the device shutdown. > > > Also, by the time kdump kernel is able to scan and reset devices, > > there are already a very large time window where things could go > > wrong. > > > > The currently problem observed only happens upon kdump kernel > > shutdown, as the upper bridge is disabled before the device is > > disabledm so DMA will raise error. It's more like a problem of wrong > > device shutting down order. > > The way it was described earlier "During this time, the SUT sometimes > gets a PCI error that raises an NMI." suggests that it isn't really > restricted to kexec/kdump. > Any attached device without an active driver might attempt spurious or > malicious DMA and trigger the same during normal operation. > Do you have available some more reporting of what happens during the > PCIe error handling? Let me add more info about this: On the machine where I can reproduce this issue, the first kernel always runs fine, and kdump kernel works fine during dumping the vmcore, even if I keep the kdump kernel running for hours, nothing goes wrong. If there are DMA during normal operation that will cause problem, this should have exposed it. The problem only occur when kdump kernel try to reboot, no matter how long the kdump kernel have been running (few minutes or hours). The machine is dead after printing: [ 101.438300] reboot: Restarting system^M [ 101.455360] reboot: machine restart^M And I can find following logs happend just at that time, in the "Integrated Management Log" from the iLO web interface: 1254 OS 12/25/2019 09:08 12/25/2019 09:08 1 User Remotely Initiated NMI Switch 1253 System Error 12/25/2019 09:08 12/25/2019 09:08 1 An Unrecoverable System Error (NMI) has occurred (Service Information: 0x00000000, 0x00000000) 1252 PCI Bus 12/25/2019 09:07 12/25/2019 09:07 1 Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 2, Function 2, Error status 0x00100000) 1251 System Error 12/25/2019 09:07 12/25/2019 09:07 1 Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible 1250 PCI Bus 12/25/2019 09:07 12/25/2019 09:07 1 PCI Bus Error (Slot 0, Bus 0, Device 2, Function 2) And the topology is: [0000:00]-+-00.0 Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 +-01.0-[02]-- +-01.1-[05]-- +-02.0-[06]--+-00.0 Emulex Corporation OneConnect NIC (Skyhawk) | +-00.1 Emulex Corporation OneConnect NIC (Skyhawk) | +-00.2 Emulex Corporation OneConnect NIC (Skyhawk) | +-00.3 Emulex Corporation OneConnect NIC (Skyhawk) | +-00.4 Emulex Corporation OneConnect NIC (Skyhawk) | +-00.5 Emulex Corporation OneConnect NIC (Skyhawk) | +-00.6 Emulex Corporation OneConnect NIC (Skyhawk) | \-00.7 Emulex Corporation OneConnect NIC (Skyhawk) +-02.1-[0f]-- +-02.2-[07]----00.0 Hewlett-Packard Company Smart Array Gen9 Controllers It's a bridge reporting the error. It should be an unsupported request error, bacause downstream device is still alive and sending request, but the port have bus mastering off. If I manually shutdown the "Smart Array" (HPSA) device before kdump reboot, it will always reboot just fine. And as the patch descriptions said, the HPSA is used in first kernel, but didn't get reset in kdump kernel because driver is not loaded. When shutting down a bridge, kernel should shutdown downstream device first, and then shutdown and clear bus master bit of the bridge. But in kdump case, kernel skipped some device shutdown due to driver not loaded issue, and kernel don't know they are enabled. This problem is not limited to HPSA, the NIC listed in above topology maybe also make the bridge error out, if HPSA get loaded in kdump kernel and NIC get ignored. > > "The reaction to the NMI that the kdump kernel takes is problematic." > Or the NMI should not have been triggered to begin with? Where does that happen? The NMI is triggered by firmware upon the bridge error mentioned above, it should have triggered a kernel panic, but on the test system, it just hanged and no longer give any output, so I can't post any log about it.